This source was manually transcribed from Go's assembly syntax into
FreeBSD's. Some differences exist (e.g. around stack frame allocation,
but also some upstream LEAL instructions were replaced with ADDL here as
getting the 64-bit super-registers of 32-bit isn't so doable, unlike Go)
that were intended, but a few errors crept in. Fix these, found by
comparing post-processed disassembly[1] (handling the ADDL difference
above, and due to Go's assembler not optimising VP[X]OR encoding by
commuting operands when it would give rise to a 2-byte VEX prefix) of a
built copy of the corresponding Go source against ours.
[1] In Vim:
%g/\<vpx\?or\>/s/\(%ymm\([89]\|1[0-5]\)\), %ymm\([0-7]\), %ymm/%ymm\3, \1, %ymm/g
(to commute the VP[X]OR operands as LLVM does)
%s/\<leal\>\([[:space:]]\+\)(%r\(..\),%r\(..\)), %e\2/addl\1%e\3, %e\2/
(to convert LEAL to ADDL in the cases we do)
%s/%e12\>/%r12d/g
(as the previous conversion turns %r12 into %e12 not %r12d)
Fixes: 8b4684afcd ("lib/libmd: add optimised SHA1 implementations for amd64")
Seems like there is a bug lurking somewhere in the code. This was not
caught during my testing. Disable the affected kernel for now while I
figure out what is wrong with it.
To reproduce, run
jot -s '' -b 'a' -n 1000000 | sha1
This should yield 34aa973cd4c4daa4f61eeb2bdbad27316534016f, but gives
fe161a71d7941e3d63a9cacadc4f20716a721944 with the broken code. Only the
amd64/avx2 kernel is affected, the others seem to operate correctly.
Reported by: olivier
Fixes: 8b4684afcd
This provides a scalar implementation and one using the SHA1
instruction set extensions.
For the scalar implementation, the w array is kept in registers,
speeding up the whole operations. For a 10 GiB file on my Windows
2023 Dev Kit (ARM Cortex A78C / ARM Cortex X1C):
Performance core:
pre 43.1s (238 MB/s)
generic 41.3s (247 MB/s)
scalar 35.0s (293 MB/s)
sha1 12.8s (800 MB/s)
Efficiency core:
pre 54.2s (189 MB/s)
generic 55.9s (183 MB/s)
scalar 43.0s (238 MB/s)
sha1 16.2s (632 MB/s)
Reviewed by: getz
Differential Revision: https://reviews.freebsd.org/D45444
Three implementations are provided: one using just scalar
instructions, one using AVX2, and one using the SHA instructions
(SHANI). The AVX2 version uses a complicated multi-block carry
scheme described in an Intel whitepaper; the code was
carefully transcribed from the implementatio shipped with the
Go runtime. The performance is quite good. From my Tiger Lake
based NUC:
old: 16.7s ( 613 MB/s)
scalar: 14.5s ( 706 MB/s)
avx2: 10.5s ( 975 MB/s)
shani: 5.6s (1829 MB/s)
Reviewed by: getz
Obtained from: b0dfcb7465/src/crypto/sha1/sha1block_amd64.s
Differential Revision: https://reviews.freebsd.org/D45444
This lays the foundation for importing Go's excellent assembly
implementations of the SHA-1 block function.
No performance changes observed on amd64.
Reviewed by: getz
Differential Revision: https://reviews.freebsd.org/D45444
Visibility can get complicated when, e.g., ifuncs are involved. In
particular, SHA256/SHA512 on aarch64 use ifuncs for their _Transform
implementations, which then exposes global symbols of the same name that
break things trying to statically link both libcrypto and libmd.
Revert this part of the _Transform removal to fix the pkg-static build
on aarch64.
Fixes: 81de655acd ("libmd: stop exporting Transform() symbols")
These are reportedly likely to be specific to SSLeay's implementation
and likely not needed here. At the very least they shouldn't be used
by consumers, so let's kick them out now while we're already prepared
for a libmd soversion bump.
Requested by: delphij, fuz
They're not documented in libmd and we don't have any consumers. It's
problematic to keep them exported, as we don't currently export their
implementations. Make them all private.
PR: 280784 (exp-run)
Reviewed by: fuz
Differential Revision: https://reviews.freebsd.org/D34503
The drivers just had a small issue, passing a literal string as
non-const. Fix it and lift WARNS.
PR: 280784 (exp-run)
Reviewed by: delphij, emaste
Differential Revision: https://reviews.freebsd.org/D34501
Make us a little less reliant on individuals running the tests, we'll
start running them as part of CI.
PR: 280784 (exp-run)
Reviewed by: delphij
Differential Revision: https://reviews.freebsd.org/D34500
The primary benefit sought is exporting _libmd_* symbols in a private
namespace, and avoiding export of some other implementation details that
are shared amongst TUs.
PR: 280784 (exp-run)
Reviewed by: fuz
Differential Revision: https://reviews.freebsd.org/D34499
These are needed across compilation units so we can keep the _libmd_
prefixing bits (though I suspect we're not likely to collide), but we
don't need to be exporting the unprefixed versions of these; it's an
implementation detail.
PR: 280784 (exp-run)
Reviewed by: delphij, fuz
Differential Revision: https://reviews.freebsd.org/D34498
Reduce the number of md5c.c between the three of these from two to one
by just reaching into the kernel build for both userland builds. The
precedent for this already exists for sha2 in both cases.
_libmd_ symbol privatization bits have been moved to sys/md5.h and
md5.h remains to #include <sys/md5.h> for compatibility.
This stops exporting MD5Pad() in the process because the kernel stopped
exporting it in 502a35d60f. soversion is bumped accordingly.
This also renames the libc version of stack_protector.c; it previously
only worked by coincidence because .PATH ordering worked out such that
we got the right one, but this is not the case anymore. Remove the
landmine.
PR: 280784 (exp-run)
Reviewed by: allanjude, delphij
Differential Revision: https://reviews.freebsd.org/D34497
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
While there, remove .Tn from man pages.
Also remove an obsolete comment about the 80386.
MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: kevans, allanjude
Differential Revision: https://reviews.freebsd.org/D38373
Summary:
This knob can be used to make buildsystem prefer generic C implentations of
various functions, instead of machine-specific assembler ones.
Test Plan: `make buildworld` on amd64
Reviewed by: imp, emaste
Differential Revision: https://reviews.freebsd.org/D36076
MFC after: 3 days
As with sha256 add support for accelerated sha512 support to libmd on
arm64. This depends on clang 13+ to build as this is the first release
with the needed intrinsics. Gcc should also support them, however from
a currently unknown release.
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33373
We don't have ifunc support in static arm64 binaries. Until we do
disable the accelerated sha256 code in a static libmd as it uses an
ifunc.
Reported by: brd
Sponsored by: The FreeBSD Foundation
Summary:
When running on a CPU that supports the arm64 sha256 intrinsics use them
to improve perfromance of sha256 calculations.
With this changethe following improvement has been seen on an Apple M1
with FreeBS running under Parallels, with similar results on a
Neoverse-N1 r3p1.
x sha256.orig
+ sha256.arm64
+--------------------------------------------------------------------+
|++ x x|
|+++ xxx|
||A |A||
+--------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 3.41 3.5 3.46 3.458 0.042661458
+ 5 0.47 0.54 0.5 0.504 0.027018512
Difference at 95.0% confidence
-2.954 +/- 0.0520768
-85.4251% +/- 0.826831%
(Student's t, pooled s = 0.0357071)
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31284
r366344 fixed and reenabled the assembly optimized skein implementation,
but skein_block objects were not being rebuilt in no-clean builds. This
resulted in failing no-clean builds. SKEIN_USE_ASM controls which
routines come from C vs assembly, and with no explicit dependency
r366344's change to SKEIN_USE_ASM did not cause skein_block.{o,pico}
to be rebuilt.
Add a dependency on this Makefile for the skein_block objects. This
dependency is broader in scope than absolutely required (that is, the
skein_block objects will now be rebuilt on any change to this Makefile).
There are ways this could be addressed, but it is probably not worth the
additional effort or testing time to pursue them.
PR: 248221
Reported by: kevans, Jeremy Faulkner
Discussed with: kevans
Sponsored by: The FreeBSD Foundation
The assembly implementation incorrectly used logical AND instead of
bitwise AND. Fix, and re-enable in libmd.
Submitted by: Yang Zhong <yzhong@freebsdfoundation.org>
Reviewed by: cem (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26614
It is apparently broken when assembled by contemporary GNU as as well as
Clang IAS (which is used in the default configuration).
PR: 248221
Reported by: pizzamig
Sponsored by: The FreeBSD Foundation
Comparing the object files produced by GNU as 2.17.50 and Clang IAS
shows many immaterial changes in strtab etc., and one material change
in .text:
1bac: 4c 8b 4f 18 mov 0x18(%rdi),%r9
1bb0: eb 0e jmp 1bc0 <Skein1024_block_loop>
- 1bb2: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
- 1bb9: 00 00 00 00
- 1bbd: 0f 1f 00 nopl (%rax)
+ 1bb2: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
+ 1bb9: 00 00 00
+ 1bbc: 0f 1f 40 00 nopl 0x0(%rax)
0000000000001bc0 <Skein1024_block_loop>:
Skein1024_block_loop():
1bc0: 4c 8b 47 10 mov 0x10(%rdi),%r8
1bc4: 4c 03 85 c0 00 00 00 add 0xc0(%rbp),%r8
That is, GNU as and Clang's integrated assembler use different multi-
byte NOPs for alignment (GNU as emits an 11 byte NOP + a 3 byte NOP,
while Clang IAS emits a 10 byte NOP + a 4 byte NOP).
Dependency cleanup hacks are not required, because we do not create
.depend files from GNU as.
Reviewed by: allanjude, arichardson, cem, tsoome
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8434
Update a bunch of Makefile.depend files as
a result of adding Makefile.depend.options files
Reviewed by: bdrewery
MFC after: 1 week
Sponsored by: Juniper Networks
Differential Revision: https://reviews.freebsd.org/D22494
All of them are needed to be able to boot to single user and be able
to repair a existing FreeBSD installation so put them directly into
FreeBSD-runtime.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D21503