The `tbz2` extension, a shorthand for `tar.bz2`, is supported for downloading and handling compressed archives in go-getter, which already treats both `tbz2` and `tar.bz2` as using the same bzip2 decompressor. This commit adds support for the `tbz2` decompressor when fetching modules from a source.
Signed-off-by: Bruno Schaatsbergen <git@bschaatsbergen.com>
We've been gradually chipping away at how much we use go-getter for source
packages, because it's generally been a bit of a nightmare and sharing it
with other codebases means that any time someone wants to change something
we end up needing to find some way to prevent it breaking Terraform's
compatibility promises.
Here we make one further step: Terraform owns the "detectors" idea that
deals with source address normalization, and now always produces
fully-qualified addresses for go-getter to chew on only for the getting
and decompressing steps.
Retaining go-getter for the actual getting part is helpful because we can
then benefit from security fixes upstream, but Terraform owning the first
layer of parsing means that we can fix in place the definition of what
"module source address" syntax means, and thus we can avoid having
everything in this codebase indirectly depend on go-getter just because it
wants to parse module source addresses.
Now only the module installer actually depends indirectly on go-getter,
which finally disconnects go-getter's subtree from all of the remote state
backend dependency graphs.
Our package addrs ends up getting imported from just about ever other
package in Terraform, because it contains the types we use to talk about
various different kinds of objects. Therefore we typically try to keep its
transitive dependency graph relatively small, because anything it depends
on becomes an indirect dependency of nearly everything else.
A while back we moved the module source address models into package addrs,
which also brought with them the code for parsing strings to produce those
addresses. Unfortunately, remote module source addresses are defined using
the external dependency go-getter, which is pretty heavy itself and also
brings with it numerous other external dependencies such as the AWS SDK,
the Google Cloud Platform SDK, etc.
Since only relatively few packages actually need to _parse_ source
addresses -- with most either not caring about them at all or only
consuming addresses that were already parsed by someone else -- we'll
move the parser functions into their own package, while keeping the
resulting address types in package addrs.
This does still retain the package addrs dependency on external module
github.com/hashicorp/terraform-registry-address, which is not ideal but
that one at least has a relatively shallow dependency subgraph, so there's
not so much urgency to tidy that one.
There was an unintended regression in go-getter v1.5.9's GitGetter which
caused us to temporarily fork that particular getter into Terraform to
expedite a fix. However, upstream v1.5.10 now includes a
functionally-equivalent fix and so we can heal that fork by upgrading.
We'd also neglected to update the Module Sources docs when upgrading to
go-getter v1.5.9 originally and so we were missing documentation about the
new "depth" argument to enable shadow cloning, which I've added
retroactively here along with documenting its restriction of only
supporting named refs.
This new go-getter release also introduces a new credentials-passing
method for the Google Cloud Storage getter, and so we must incorporate
that into the Terraform-level documentation about module sources.
Earlier versions of this code allowed "ref" to take any value that would
be accepted by "git checkout" as a valid target of a symbolic ref. We
inadvertently accepted a breaking change to upstream go-getter that broke
that as part of introducing a shallow clone optimization, because shallow
clone requires selecting a single branch.
To restore the previous capabilities while retaining the "depth" argument,
here we accept a compromise where "ref" has the stronger requirement of
being a valid named ref in the remote repository if and only if "depth"
is set to a value greater than zero. If depth isn't set or is less than
one, we will do the old behavior of just cloning all of the refs in the
remote repository in full and then switching to refer to the selected
branch, tag, or naked commit ID as a separate step.
This includes a heuristic to generate an additional error message hint if
we get an error from "git clone" and it looks like the user might've been
trying to use "depth" and "ref=COMMIT" together. We can't recognize that
error accurately because it's only reported as human-oriented git command
output, but this heuristic should hopefully minimize situations where we
show it inappropriately.
For now this is a change in the Terraform repository directly, so that we
can expedite the fix to an already-reported regression. After this is
released I tend to also submit a similar set of changes to upstream
go-getter, at which point we can revert Terraform to using the upstream
getter.GitGetter instead of our own local fork.
This is a pragmatic temporary solution to allow us to more quickly resolve
an upstream regression in go-getter locally within Terraform, so that the
work to upstream it for other callers can happen asynchronously and with
less time pressure.
This commit doesn't yet include any changes to address the bug, and
instead aims to be functionally equivalent to getter.GitGetter. A
subsequent commit will then address the regression, so that the diff of
that commit will be easier to apply later to the upstream to get the same
effect there.
Earlier work to make "terraform init" interruptible made the getproviders
package context-aware in order to allow provider installation to be cancelled.
Here we make a similar change for module installation, which is now also
cancellable with SIGINT. This involves plumbing context through initwd and
getmodules. Functions which can make network requests now include a context
parameter whose cancellation cancels those requests.
Since the module installation code is shared, "terraform get" is now
also interruptible during module installation.
It's been a long while since we gave close attention to the codepaths for
module source address parsing and external module package installation.
Due to their age, these codepaths often diverged from our modern practices
such as representing address types in the addrs package, and encapsulating
package installation details only in a particular location.
In particular, this refactor makes source address parsing a separate step
from module installation, which therefore makes the result of that parsing
available to other Terraform subsystems which work with the configuration
representation objects.
This also presented the opportunity to better encapsulate our use of
go-getter into a new package "getmodules" (echoing "getproviders"), which
is intended to be the only part of Terraform that directly interacts with
go-getter.
This is largely just a refactor of the existing functionality into a new
code organization, but there is one notable change in behavior here: the
source address parsing now happens during configuration loading rather
than module installation, which may cause errors about invalid addresses
to be returned in different situations than before. That counts as
backward compatible because we only promise to remain compatible with
configurations that are _valid_, which means that they can be initialized,
planned, and applied without any errors. This doesn't introduce any new
error cases, and instead just makes a pre-existing error case be detected
earlier.
Our module registry client is still using its own special module address
type from registry/regsrc for now, with a small shim from the new
addrs.ModuleSourceRegistry type. Hopefully in a later commit we'll also
rework the registry client to work with the new address type, but this
commit is already big enough as it is.
This new package aims to encapsulate all of our interactions with
go-getter to fetch remote module packages, to ensure that the rest of
Terraform will only use the small subset of go-getter functionality that
our modern module installer uses.
In older versions of Terraform, go-getter was the entire implementation
of module installation, but along the way we found that several aspects of
its design are poor fit for Terraform's needs, and so now we're using it
as just an implementation detail of Terraform's handling of remote module
packages only, hiding it behind this wrapper API which exposes only
the services that our module installer needs.
This new package isn't actually used yet, but in a later commit we will
change all of the other callers to go-getter to only work indirectly
through this package, so that this will be the only package that actually
imports the go-getter packages.