The issue is that std::promise internally also used thread local
storage, in a call to `std::call_once` in `std::promise::set_value()`.
The theory is that since all paths in `Send()` run this `std::call_once`
routine and from then on, then Coroutine function looks like a normal
function, the compiler inlined `set_value()` and moved the common parts
of it to a common location for all paths before the suspension point in
WriteMessage(yc).
When finally the coroutine is resumes, it is likely that that happens
under a different thread, which still has `__once_callable` in
`std::call_once` set as `nullptr`, leading to the segmentation fault.
The fix is to not use std::promise across coroutine suspension points
and instead reimplement the functionality we required from it in a small
helper class `SyncResult` that does not require any thread local storage.