Programs that monitor Terraform's output to report panics might
make the reasonable assumption that the string "panic:" is always
included and is therefore safe to monitor for. Our custom panic
output unfortunately breaks these assumptions at the moment.
Instead of asking consumers to add their own handling to deal with
this problem, let's add that greppable string to our custom panic
output.
Our goal with this panic-interception was to largely mimic how the Go
runtime would normally report panics except for two intentional
exceptions: an extra prompt explaining to the user that Terraform crashed,
and exiting with status code 11 instead of 2.
Unfortunately we accidentally deviated in a different way: we're reporting
to whatever os.Stderr happens to refer to, instead of to the real process
stderr. It seems like that shouldn't really matter, but unfortunately
go-plugin intentionally changes os.Stderr to refer to a totally separate
stream that it manages, causing the captured panic messages to be routed
over a grpc-based channel to the plugin client.
This deviation makes the panic messages not visible to usual strategies
for trying to heuristically detect that a Go program has panicked. Without
a special interception like Terraform is doing here, the Go runtime
writes directly to the stderr file descriptor without going through the
os.Stderr abstraction, and so to achieve consistent behavior we need to
do a little hoop-jumping to approximate that result.
In particular, this makes the behavior now consistent with what happens
when a provider plugin running as a child of Terraform Core panics, and
so a system which tries to sniff stderr for content that seems like a
panic message will be able to handle both situations equally and avoid
making a special case for Terraform Core/CLI's own panics.
When logging is turned on, panicwrap will still see provider crashes and
falsely report them as core crashes, hiding the formatted provider
error. We can trick panicwrap by slightly obfuscating the error line.
Create a logger that will record any apparent crash output for later
processing.
If the cli command returns with a non-zero exit status, check for any
recorded crashes and add those to the output.
Now that hclog can independently set levels on related loggers, we can
separate the log levels for different subsystems in terraform.
This adds the new environment variables, `TF_LOG_CORE` and
`TF_LOG_PROVIDER`, which each take the same set of log level arguments,
and only applies to logs from that subsystem. This means that setting
`TF_LOG_CORE=level` will not show logs from providers, and
`TF_LOG_PROVIDER=level` will not show logs from core. The behavior of
`TF_LOG` alone does not change.
While it is not necessarily needed since the default is to disable logs,
there is also a new level argument of `off`, which reflects the
associated level in hclog.
Use a separate log sink to always capture trace logs for the panicwrap
handler to write out in a crash log.
This requires creating a log file in the outer process and passing that
path to the child process to log to.