The code in this file runs before the sanitizer can initialize its
shadow map.
Fixes: ad2fac552c ("lib/libc/amd64: add archlevel-based simd dispatch framework")
(cherry picked from commit 4dedcb1bb54cbbe8043c79ad733f966b6ffc6972)
The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.
The baseline implementation is similar to timingsafe_memcmp. It's
slightly slower than memchr() due to the more complicated main
loop, but I don't think that can be significantly improved.
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42925
(cherry picked from commit fb197a4f7751bb4e116989e57ba7fb12a981895f)
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced
implementation of memccpy for amd64. A scalar implementation calling
into memchr and memcpy to do the job is provided, too.
Please note that this code does not behave exactly the same as the C
implementation of memccpy for overlapping inputs. However, overlapping
inputs are not allowed for this function by ISO/IEC 9899:1999 and neither
has the C implementation any code to deal with the possibility. It just
proceeds byte-by-byte, which may or may not do the expected thing for
some overlaps. We do not document whether overlapping inputs are
supported in memccpy(3).
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42902
(cherry picked from commit fc0e38a7a67a6d43095efb00cf19ee5f95dcf710)
This should pick up our optimised memchr(), strlen(), and strlcpy()
when strlcat() is called.
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42863
(cherry picked from commit 2b7b03b7ae179db465c1ef19a5007f729874916a)
Somewhat similar to stpncpy, but different in that we need to compute
the full source length even if the buffer is shorter than the source.
strlcat is implemented as a simple wrapper around strlcpy. The scalar
implementation of strlcpy just calls into strlen() and memcpy() to do
the job.
Perf-wise we're very close to stpncpy. The code is slightly slower as
it needs to carry on with finding the source string length even if the
buffer ends before the string.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42863
(cherry picked from commit 74d6cfad54d676299ee5e4695139461876dfd757)
This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it just calls into
our optimised memchr(), memcpy(), and memset() routines to carry out its
job.
I'm quite happy with the performance. glibc only beats us for very long
strings, likely due to the use of AVX-512. The scalar implementation
just calls into our optimised memchr(), memcpy(), and memset() routines,
so it has a high overhead to begin with but then performs ok for the
amount of effort that went into it. Still beats the old C code, except
for very short strings.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42519
(cherry picked from commit 90253d49db09a9b1490c448d05314f3e4bbfa468)
The strsep() function is basically strcspn() with extra steps.
On amd64, we now have an optimised implementation of strcspn(),
so instead of implementing the inner loop manually, just call
into the optimised routine.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42346
(cherry picked from commit fd2ecd91aeeeab579c769c9a39f90b4bd4a493a9)
The baseline implementation is very straightforward, while the scalar
implementation suffers from register pressure and the need to use SWAR
techniques similar to those used for strchr().
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42217
(cherry picked from commit 2ed514a220edbac6ca5ec9f40a3e0b3f2804796d)
The scalar implementation is fairly straightforward and merely unrolled
four times. The baseline implementation closely follows D41971 with
appropriate extensions and extra code paths to pay attention to string
length.
Performance is quite good. We beat both glibc (except for very long
strings, but they likely use AVX which we don't) and Bionic (except for
medium-sized aligned strings, where we are still in the same ballpark).
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42122
(cherry picked from commit 14289e973f5c941e4502cc2b11265e4b3072839a)
This lets us use our optimised strcspn() routine for strpbrk() calls.
Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D41980
(cherry picked from commit f4fc317c364f2c81ad3d36763d8e5a60393ddbd1)
Conceptually very similar to timingsafe_bcmp(), but with comparison
logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE)
implementation was omitted this time as I was not able to get it to
perform adequately. Best I got was 8% over the scalar version for
long inputs, but slower for short inputs.
Sponsored by: The FreeBSD Foundation
Approved by: security (cperciva)
Inspired by: https://github.com/moon-chilled/fancy-memcmp
Differential Revision: https://reviews.freebsd.org/D41696
(cherry picked from commit 5048c1b85506c5e0f441ee7dd98dd8d96d0a4a24)
Very straightforward and similar to memcmp(3). The code has
been written to use only instructions specified as having
data operand independent timing by Intel.
Sponsored by: The FreeBSD Foundation
Approved by: security (cperciva)
Differential Revision: https://reviews.freebsd.org/D41673
(cherry picked from commit 76c2b331bcd9f73c5c8c43a06e328fa0c7b8c39a)
Now that we have an optimised memchr(3), we can use it to implement
strnlen(3) with better perofrmance.
Sponsored by: The FreeBSD Foundation
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
Differential Revision: https://reviews.freebsd.org/D41598
(cherry picked from commit 331737281c1929c29e679e48783055351ac4fbd9)
This is conceptually similar to strchr(3), but there are
slight changes to account for the buffer having an explicit
buffer length.
this includes the bug fix from b2618b6.
Sponsored by: The FreeBSD Foundation
Reported by: yuri, des
Tested by: des
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
PR: 273652
Differential Revision: https://reviews.freebsd.org/D41598
(cherry picked from commit de12a689fad271f5a2ba7c188b0b5fb5cabf48e7)
(cherry picked from commit b2618b651b28fd29e62a4e285f5be09ea30a85d4)
This is conceptually very similar to the strcspn(3) implementations
from D41557, but we can't do the fast paths the same way.
Sponsored by: The FreeBSD Foundation
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
Differential Revision: https://reviews.freebsd.org/D41567
(cherry picked from commit 7084133cde6a58412d86bae9f8a55b86141fb304)
This changeset adds both a scalar and an x86-64-v2 implementation
of the strcspn(3) function to libc. A baseline implementation does not
appear to be feasible given the requirements of the function.
The scalar implementation is similar to the generic libc implementation,
but expands the bit set into a byte set to reduce latency, improving
performance. This approach could probably be backported to the generic
C version to benefit other platforms.
The x86-64-v2 implementation is built around the infamous pcmpistri
instruction. An alternative implementation based on the Muła/Langdale
algorithm [1] was prototyped, but performed worse than the pcmpistri
approach except for sets of more than 16 characters with long input
strings.
All implementations provide special cases for the empty set (reduces to
strlen as well as single-character sets (reduces to strchr). The
x86-64-v2 kernel falls back to the scalar implementation for sets of
more than 32 characters. This limit could be raised by additional
multiples of 16 through the use of additional pcmpistri code paths, but
I consider this case to be too rare to be of importance.
This includes the bug fix from 52d4a4d.
[1]: http://0x80.pl/articles/simd-byte-lookup.html
Sponsored by: The FreeBSD Foundation
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
Differential Revision: https://reviews.freebsd.org/D41557
(cherry picked from commit 474408bb7933f0383a0da2b01e717bfe683ae77c)
(cherry picked from commit 52d4a4d4e0dedc72bc33082a3f84c2d0fd6f2cbb)
This commit adds a baseline implementation of stpcpy(3) for amd64.
It performs quite well in comparison to the previous scalar implementation
as well as agains bionic and glibc (though glibc is faster for very long
strings). Fiddle with the Makefile to also have strcpy(3) call into the
optimised stpcpy(3) code, fixing an oversight from D9841.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp ngie emaste
Approved by: mjg kib
Fixes: D9841
Differential Revision: https://reviews.freebsd.org/D41349
Add a framework for selecting from one of multiple implementations
of a function based on amd64 architecture level (cf. amd64 SysV
ABI supplement).
Sponsored by: The FreeBSD Foundation
Approved by: kib
Reviewed by: jrtc27
Differential Revision: https://reviews.freebsd.org/D40693
Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to
bcmp calls.
Reviewed by: emaste (previous version), jhb (previous version)
Differential Revision: https://reviews.freebsd.org/D34673
Preferably bcmp would just alias memcmp but there is build magic which
makes this problematic.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28846
The function is of limited use and is an almost a direct clone of
memmove/memcpy (with arguments swapped). Introduction of ERMS variants
of string routines would mean avoidable growth of libc.
bcopy will get redefined to a __builtin_memmove later on with this
symbol only left for compatibility.
Reviewed by: kib
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17539
- Remove .c files which duplicate entries in MISRCS.
- Use the same, less merge conflict prone style in all cases.
- Use MDSRCS for mips (.c and .S files both ended up in SRCS).
- Remove pointless sparc64 Makefile.inc.
- Remove uninformative foreign VCS ID entries.
Reviewed by: emaste, imp, jhb
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9841