In similar vein to commit 3c6e8c123, the ARMv8 cryptography extension
has 64x64 -> 128-bit carryless multiplication instructions suitable
for computing CRC. This was tested to be around twice as fast as
scalar CRC instructions for longer inputs.
We now do a runtime check, even for builds that target "armv8-a+crc",
but those builds can still use a direct call for constant inputs,
which we assume are short.
As for x86, the MIT-licensed implementation was generated with the
"generate" program from
https://github.com/corsix/fast-crc32/
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CANWCAZaKhE+RD5KKouUFoxx1EbUNrNhcduM1VQ=DkSDadNEFng@mail.gmail.com