opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-15 14:29:58 -04:00

Author	SHA1	Message	Date
Robert Clausecker	644d814471	lib/libc/amd64/string: fix overread condition in memccpy An overread condition in memccpy(dst, src, c, len) would occur if src does not cross a 16 byte boundary and there is no instance of c between *src and the next 16 byte boundary. This could cause a read fault if src is just before the end of a page and the next page is unmapped or unreadable. The bug is a consequence of basing memccpy() on the strlcpy() code: whereas strlcpy() assumes that src is a nul-terminated string and hence a terminator is always present, c may not be present at all in the source string. It was not caught earlier due to insufficient unit test design. As a part of the fix, the function is refactored such that the runt case (buffer length from last alignment boundary between 1 and 32 B) is handled separately. This reduces the number of conditional branches on all code paths and simplifies the handling of early matches in the non-runt case. Performance is improved slightly. os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ memccpy.unfixed.out │ memccpy.fixed.out │ │ sec/op │ sec/op vs base │ Short 66.76µ ± 0% 62.45µ ± 1% -6.44% (p=0.000 n=20) Mid 7.938µ ± 0% 7.967µ ± 0% +0.36% (p=0.001 n=20) Long 3.577µ ± 0% 3.577µ ± 0% ~ (p=0.429 n=20) geomean 12.38µ 12.12µ -2.08% │ memccpy.unfixed.out │ memccpy.fixed.out │ │ B/s │ B/s vs base │ Short 1.744Gi ± 0% 1.864Gi ± 1% +6.89% (p=0.000 n=20) Mid 14.67Gi ± 0% 14.61Gi ± 0% -0.36% (p=0.001 n=20) Long 32.55Gi ± 0% 32.55Gi ± 0% ~ (p=0.429 n=20) geomean 9.407Gi 9.606Gi +2.12% Reported by: getz Reviewed by: getz Approved by: mjg (blanket, via IRC) See also: D46051 MFC: stable/14 Event: GSoC 2024 Differential Revision: https://reviews.freebsd.org/D46052	2024-08-07 16:18:40 +02:00
Mark Johnston	06fc42c0cb	libc/amd64: Disable ASAN for amd64_archlevel.c The code in this file runs before the sanitizer can initialize its shadow map. Fixes: `ad2fac552c` ("lib/libc/amd64: add archlevel-based simd dispatch framework") (cherry picked from commit 4dedcb1bb54cbbe8043c79ad733f966b6ffc6972)	2024-01-30 13:01:58 -05:00
Robert Clausecker	ff7799e003	lib/libc/amd64/string: add memrchr() scalar, baseline implementation The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be improved by using the same algorithm as for memchr, but it would have been a lot more complicated. The baseline implementation is similar to timingsafe_memcmp. It's slightly slower than memchr() due to the more complicated main loop, but I don't think that can be significantly improved. Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42925 (cherry picked from commit fb197a4f7751bb4e116989e57ba7fb12a981895f)	2024-01-24 20:39:31 +01:00
Robert Clausecker	ddab9e6461	lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy() This picks up the accelerated implementation of memccpy(). Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42902 (cherry picked from commit ea7b13771cc9d45bf1bc6c6edad8d1b7bce12990)	2024-01-24 20:39:30 +01:00
Robert Clausecker	a3ce82e5b8	lib/libc/amd64/string: add memccpy scalar, baseline implementation Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced implementation of memccpy for amd64. A scalar implementation calling into memchr and memcpy to do the job is provided, too. Please note that this code does not behave exactly the same as the C implementation of memccpy for overlapping inputs. However, overlapping inputs are not allowed for this function by ISO/IEC 9899:1999 and neither has the C implementation any code to deal with the possibility. It just proceeds byte-by-byte, which may or may not do the expected thing for some overlaps. We do not document whether overlapping inputs are supported in memccpy(3). Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42902 (cherry picked from commit fc0e38a7a67a6d43095efb00cf19ee5f95dcf710)	2024-01-24 20:39:30 +01:00
Robert Clausecker	3045c0f198	lib/libc/amd64/string: implement strlcat() through strlcpy() This should pick up our optimised memchr(), strlen(), and strlcpy() when strlcat() is called. Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42863 (cherry picked from commit 2b7b03b7ae179db465c1ef19a5007f729874916a)	2024-01-24 20:39:29 +01:00
Robert Clausecker	903cb811ff	lib/libc/amd64/string: add strlcpy scalar, baseline implementation Somewhat similar to stpncpy, but different in that we need to compute the full source length even if the buffer is shorter than the source. strlcat is implemented as a simple wrapper around strlcpy. The scalar implementation of strlcpy just calls into strlen() and memcpy() to do the job. Perf-wise we're very close to stpncpy. The code is slightly slower as it needs to carry on with finding the source string length even if the buffer ends before the string. Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42863 (cherry picked from commit 74d6cfad54d676299ee5e4695139461876dfd757)	2024-01-24 20:39:28 +01:00
Robert Clausecker	7a605ba8f7	lib/libc/amd64/string/strcat.S: enable use of SIMD strcat has a bespoke scalar assembly implementation we inherited from NetBSD. While it performs well, it is better to call into our SIMD implementations if any SIMD features are available at all. So do that and implement strcat() by calling into strlen() and strcpy() if these are available. Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Reviison: https://reviews.freebsd.org/D42600 (cherry picked from commit aff9143a242c0012b0195b3666e03fa3b7cd33e8)	2024-01-24 20:39:28 +01:00
Robert Clausecker	76f9afcdcf	lib/libc/amd64/string: implement strncpy() by calling stpncpy() Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519 (cherry picked from commit e19d46c808267f53455e96a28ff7654211523d2c)	2024-01-24 20:39:27 +01:00
Robert Clausecker	7527fecbfe	lib/libc/amd64/string: add stpncpy scalar, baseline implementation This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it just calls into our optimised memchr(), memcpy(), and memset() routines to carry out its job. I'm quite happy with the performance. glibc only beats us for very long strings, likely due to the use of AVX-512. The scalar implementation just calls into our optimised memchr(), memcpy(), and memset() routines, so it has a high overhead to begin with but then performs ok for the amount of effort that went into it. Still beats the old C code, except for very short strings. Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519 (cherry picked from commit 90253d49db09a9b1490c448d05314f3e4bbfa468)	2024-01-24 20:39:27 +01:00
Robert Clausecker	265fb89aba	lib/libc/amd64/string: implement strsep() through strcspn() The strsep() function is basically strcspn() with extra steps. On amd64, we now have an optimised implementation of strcspn(), so instead of implementing the inner loop manually, just call into the optimised routine. Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42346 (cherry picked from commit fd2ecd91aeeeab579c769c9a39f90b4bd4a493a9)	2024-01-24 20:39:26 +01:00
Robert Clausecker	9b1a851e1e	lib/libc/amd64/string: add strrchr scalar, baseline implementation The baseline implementation is very straightforward, while the scalar implementation suffers from register pressure and the need to use SWAR techniques similar to those used for strchr(). Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42217 (cherry picked from commit 2ed514a220edbac6ca5ec9f40a3e0b3f2804796d)	2024-01-24 20:39:26 +01:00
Robert Clausecker	3a19fcb9fd	lib/libc/amd64/string: add strncmp scalar, baseline implementation The scalar implementation is fairly straightforward and merely unrolled four times. The baseline implementation closely follows D41971 with appropriate extensions and extra code paths to pay attention to string length. Performance is quite good. We beat both glibc (except for very long strings, but they likely use AVX which we don't) and Bionic (except for medium-sized aligned strings, where we are still in the same ballpark). Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42122 (cherry picked from commit 14289e973f5c941e4502cc2b11265e4b3072839a)	2024-01-24 20:39:25 +01:00
Robert Clausecker	309b30ce84	lib/libc/amd64/string: implement strpbrk() through strcspn() This lets us use our optimised strcspn() routine for strpbrk() calls. Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D41980 (cherry picked from commit f4fc317c364f2c81ad3d36763d8e5a60393ddbd1)	2024-01-24 20:39:24 +01:00
Robert Clausecker	37728967ee	lib/libc/amd64/string/strcmp.S: add baseline implementation This is the most complicated one so far. The basic idea is to process the bulk of the string in aligned blocks of 16 bytes such that one string runs ahead and the other runs behind. The string that runs ahead is checked for NUL bytes, the one that runs behind is compared with the corresponding chunk of the string that runs ahead. This trades an extra load per iteration for the very complicated block-reassembly needed in the other implementations (bionic, glibc). On the flip side, we need two code paths depending on the relative alignment of the two buffers. The initial part of the string is compared directly if it is known not to cross a page boundary. Otherwise, a complex slow path to avoid crossing into unmapped memory commences. Performance-wise we beat bionic for misaligned strings (i.e. the strings do not share an alignment offset) and reach comparable performance for aligned strings. glibc is a bit better as it has a special kernel for AVX-512, where this stuff is a bit easier to do. Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D41971 (cherry picked from commit bca25680b91b3bea7faef615765806a04634eb23)	2024-01-24 20:39:24 +01:00
Robert Clausecker	9a6a587e67	lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation Conceptually very similar to timingsafe_bcmp(), but with comparison logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE) implementation was omitted this time as I was not able to get it to perform adequately. Best I got was 8% over the scalar version for long inputs, but slower for short inputs. Sponsored by: The FreeBSD Foundation Approved by: security (cperciva) Inspired by: https://github.com/moon-chilled/fancy-memcmp Differential Revision: https://reviews.freebsd.org/D41696 (cherry picked from commit 5048c1b85506c5e0f441ee7dd98dd8d96d0a4a24)	2023-12-28 18:02:41 +01:00
Robert Clausecker	1347ec5d58	lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations Very straightforward and similar to memcmp(3). The code has been written to use only instructions specified as having data operand independent timing by Intel. Sponsored by: The FreeBSD Foundation Approved by: security (cperciva) Differential Revision: https://reviews.freebsd.org/D41673 (cherry picked from commit 76c2b331bcd9f73c5c8c43a06e328fa0c7b8c39a)	2023-12-28 18:02:41 +01:00
Robert Clausecker	cec0236976	lib/libc/amd64/string/strcspn.S: always return earliest match in 17--32 char case When matching against a set of 17--32 characters, strcspn() uses two invocations of PCMPISTRI to match against the first 16 characters of the set and then the remaining characters. If a match was found in the first half of the set, the code originally immediately returned that match. However, it is possible for a match in the second half of the set to occur earlier in the vector, leading to that match being overlooked. Fix the code by checking if there is a match in the second half of the set and taking the earlier of the two matches. The correctness of the function has been verified with extended unit tests and test runs against the glibc test suite. Approved by: mjg (implicit, via IRC) MFC after: 1 week MFC to: stable/14 (cherry picked from commit c91cd7d03a9dee649ba3a1b9b4014df9de111bb8)	2023-12-28 18:02:41 +01:00
Warner Losh	4025b5b527	libc: Purge unneeded cdefs.h These sys/cdefs.h are not needed. Purge them. They are mostly left-over from the $FreeBSD$ removal. A few in libc are still required for macros that cdefs.h defines. Keep those. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D42385 (cherry picked from commit 559a218c9b257775fb249b67945fe4a05b7a6b9f)	2023-11-26 21:20:09 -07:00
Robert Clausecker	0666c6fc03	lib/libc/amd64/string/memcmp.S: harden against phony buffer lengths When memcmp(a, b, len) (or equally, bcmp) is called with a phony length such that a + len < a, the code would malfunction and not compare the two buffers correctly. While such arguments are illegal (buffers do not wrap around the end of the address space), it is neverthless conceivable that people try things like memcmp(a, b, SIZE_MAX) to compare a and b until the first mismatch, in the knowledge that such a mismatch exists, expecting memcmp() to stop comparing somewhere around the mismatch. While memcmp() is usually written to confirm to this assumption, no version of ISO/IEC 9899 guarantees this behaviour (in contrast to memchr() for which it is). Neverthless it appears sensible to at least not grossly misbehave on phony lengths. This change hardens memcmp() against this case by comparing at least until the end of the address space if a + len overflows a 64 bit integer. Sponsored by: The FreeBSD Foundation Approved by: mjg (blanket, via IRC) See also: b2618b651b28fd29e62a4e285f5be09ea30a85d4 MFC after: 1 week (cherry picked from commit 953b93cf24d8871c62416c9bcfca935f1f1853b6)	2023-09-23 14:21:42 -04:00
Robert Clausecker	62f73a711e	lib/libc/amd64/string: implement strnlen(3) trough memchr(3) Now that we have an optimised memchr(3), we can use it to implement strnlen(3) with better perofrmance. Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41598 (cherry picked from commit 331737281c1929c29e679e48783055351ac4fbd9)	2023-09-23 14:21:37 -04:00
Robert Clausecker	3f78bde932	lib/libc/amd64/string: add memchr(3) scalar, baseline implementation This is conceptually similar to strchr(3), but there are slight changes to account for the buffer having an explicit buffer length. this includes the bug fix from b2618b6. Sponsored by: The FreeBSD Foundation Reported by: yuri, des Tested by: des Approved by: mjg MFC after: 1 week MFC to: stable/14 PR: 273652 Differential Revision: https://reviews.freebsd.org/D41598 (cherry picked from commit de12a689fad271f5a2ba7c188b0b5fb5cabf48e7) (cherry picked from commit b2618b651b28fd29e62a4e285f5be09ea30a85d4)	2023-09-23 14:20:28 -04:00
Robert Clausecker	39d500190b	lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation This is conceptually very similar to the strcspn(3) implementations from D41557, but we can't do the fast paths the same way. Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41567 (cherry picked from commit 7084133cde6a58412d86bae9f8a55b86141fb304)	2023-09-23 14:20:28 -04:00
Robert Clausecker	feda2297b7	lib/libc/amd64/string: add strcspn(3) scalar, x86-64-v2 implementation This changeset adds both a scalar and an x86-64-v2 implementation of the strcspn(3) function to libc. A baseline implementation does not appear to be feasible given the requirements of the function. The scalar implementation is similar to the generic libc implementation, but expands the bit set into a byte set to reduce latency, improving performance. This approach could probably be backported to the generic C version to benefit other platforms. The x86-64-v2 implementation is built around the infamous pcmpistri instruction. An alternative implementation based on the Muła/Langdale algorithm [1] was prototyped, but performed worse than the pcmpistri approach except for sets of more than 16 characters with long input strings. All implementations provide special cases for the empty set (reduces to strlen as well as single-character sets (reduces to strchr). The x86-64-v2 kernel falls back to the scalar implementation for sets of more than 32 characters. This limit could be raised by additional multiples of 16 through the use of additional pcmpistri code paths, but I consider this case to be too rare to be of importance. This includes the bug fix from 52d4a4d. [1]: http://0x80.pl/articles/simd-byte-lookup.html Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41557 (cherry picked from commit 474408bb7933f0383a0da2b01e717bfe683ae77c) (cherry picked from commit 52d4a4d4e0dedc72bc33082a3f84c2d0fd6f2cbb)	2023-09-23 14:19:28 -04:00
Robert Clausecker	8b81116755	lib/libc/amd64/string/strchrnul.S: fix edge case in scalar code When the buffer is immediately preceeded by the character we are looking for and begins with one higher than that character, and the buffer is misaligned, a match was errorneously detected in the first character. Fix this by changing the way we prevent matches before the buffer from being detected: instead of removing the corresponding bit from the 0x80..80 mask, set the LSB of bytes before the buffer after xoring with the character we look for. The bug only affects amd64 with ARCHLEVEL=scalar (cf. simd(7)). The change comes at a 2% performance impact for short strings if ARCHLEVEL is set to scalar. The default configuration is not affected. os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ strchrnul.scalar.0.out │ strchrnul.scalar.2.out │ │ sec/op │ sec/op vs base │ Short 57.89µ ± 2% 59.08µ ± 1% +2.07% (p=0.030 n=20) Mid 19.24µ ± 0% 19.73µ ± 0% +2.53% (p=0.000 n=20) Long 11.03µ ± 0% 11.03µ ± 0% ~ (p=0.547 n=20) geomean 23.07µ 23.43µ +1.53% │ strchrnul.scalar.0.out │ strchrnul.scalar.2.out │ │ B/s │ B/s vs base │ Short 2.011Gi ± 2% 1.970Gi ± 1% -2.02% (p=0.030 n=20) Mid 6.049Gi ± 0% 5.900Gi ± 0% -2.47% (p=0.000 n=20) Long 10.56Gi ± 0% 10.56Gi ± 0% ~ (p=0.547 n=20) geomean 5.045Gi 4.969Gi -1.50% MFC to: stable/14 MFC after: 3 days Approved by: mjg (blanket, via IRC), re (gjb) Sponsored by: The FreeBSD Foundation (cherry picked from commit 3d8ef251aa9dceabd57f7821a0e6749d35317db3)	2023-08-28 19:45:51 +02:00
Robert Clausecker	8803f01e93	lib/libc/amd64/string/memcmp.S: add baseline implementation This changeset adds a baseline implementation of memcmp and bcmp for amd64. The same code is used for both functions with conditional code were the behaviour differs (we need more precise output for the memcmp case). FreeBSD documents that memcmp returns the difference between the mismatching characters. Slightly faster code would be possible could we relax this requirement to the ISO/IEC 9899:1999 requirement of merely returning a negative/positive integer or zero. Performance is better than bionic and glibc, except for long strings were the two are 13% faster. This could be because they use SSE4 ptest which we cannot use in a baseline kernel. Sponsored by: The FreeBSD Foundation Approved by: mjg Differential Revision: https://reviews.freebsd.org/D41442	2023-08-21 21:19:46 +02:00
Robert Clausecker	9fbea87028	lib/libc/amd64/string/stpcpy.S: add baseline implementation This commit adds a baseline implementation of stpcpy(3) for amd64. It performs quite well in comparison to the previous scalar implementation as well as agains bionic and glibc (though glibc is faster for very long strings). Fiddle with the Makefile to also have strcpy(3) call into the optimised stpcpy(3) code, fixing an oversight from D9841. Sponsored by: The FreeBSD Foundation Reviewed by: imp ngie emaste Approved by: mjg kib Fixes: D9841 Differential Revision: https://reviews.freebsd.org/D41349	2023-08-21 20:59:38 +02:00
Warner Losh	d0b2dbfa0e	Remove $FreeBSD$: one-line sh pattern Remove /^\s#[#!]?\s\$FreeBSD\$.*$\n/	2023-08-16 11:55:03 -06:00
Warner Losh	1d386b48a5	Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/	2023-08-16 11:54:42 -06:00
Warner Losh	2a63c3be15	Remove $FreeBSD$: one-line .c comment pattern Remove /^/[/]\s\$FreeBSD\$.*\n/	2023-08-16 11:54:29 -06:00
Warner Losh	b3e7694832	Remove $FreeBSD$: two-line .h pattern Remove /^\s\\n \*\s+\$FreeBSD\$$\n/	2023-08-16 11:54:16 -06:00
Robert Clausecker	d7302cabc0	lib/libc/amd64/string/strchrnul.S: fix wrong indentation Uses spaces instead of tabs for this line by accident. Reported by: jrtc27, kib Approved by: kib	2023-08-07 14:03:28 +02:00
Robert Clausecker	61f4c4d3dd	lib/libc/amd64/string: add strchrnul implementations (scalar, baseline) A lot better than the generic (pre) implementaion. We do not beat glibc for long strings, likely due to glibc switching to AVX once the input is sufficiently long. X86-64-v3 and v4 implementations may be added at a future time. os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ strchrnul_pre.out │ strchrnul_scalar.out │ strchrnul_baseline.out │ │ sec/op │ sec/op vs base │ sec/op vs base │ Short 129.68µ ± 3% 59.91µ ± 1% -53.80% (p=0.000 n=20) 44.37µ ± 1% -65.79% (p=0.000 n=20) Mid 21.15µ ± 0% 19.30µ ± 0% -8.76% (p=0.000 n=20) 12.30µ ± 0% -41.85% (p=0.000 n=20) Long 13.772µ ± 0% 11.028µ ± 0% -19.92% (p=0.000 n=20) 3.285µ ± 0% -76.15% (p=0.000 n=20) geomean 33.55µ 23.36µ -30.37% 12.15µ -63.80% │ strchrnul_pre.out │ strchrnul_scalar.out │ strchrnul_baseline.out │ │ B/s │ B/s vs base │ B/s vs base │ Short 919.3Mi ± 3% 1989.7Mi ± 1% +116.45% (p=0.000 n=20) 2686.8Mi ± 1% +192.28% (p=0.000 n=20) Mid 5.505Gi ± 0% 6.033Gi ± 0% +9.60% (p=0.000 n=20) 9.466Gi ± 0% +71.97% (p=0.000 n=20) Long 8.453Gi ± 0% 10.557Gi ± 0% +24.88% (p=0.000 n=20) 35.441Gi ± 0% +319.26% (p=0.000 n=20) geomean 3.470Gi 4.983Gi +43.62% 9.584Gi +176.22% For comparison, glibc on the same machine: │ strchrnul_glibc.out │ │ sec/op │ Short 49.73µ ± 0% Mid 14.60µ ± 0% Long 1.237µ ± 0% geomean 9.646µ │ strchrnul_glibc.out │ │ B/s │ Short 2.341Gi ± 0% Mid 7.976Gi ± 0% Long 94.14Gi ± 0% geomean 12.07Gi Sponsored by: The FreeBSD Foundation Approved by: mjg Differential Revision: https://reviews.freebsd.org/D41333	2023-08-06 15:58:27 +02:00
Robert Clausecker	d8385768fb	lib/libc/amd64/string/strlen.S: add amd64 baseline kernel This performs very well. x86-64-v3 and x86-64-v4 kernels were written, too, but performed worse than the baseline kernel on short strings. These may be added at a future point in time if the performance issues can be fixed. os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ strlen_scalar.out │ strlen_baseline.out │ │ B/s │ B/s vs base │ Short 1.667Gi ± 1% 2.676Gi ± 1% +60.55% (p=0.000 n=20) Mid 5.459Gi ± 1% 8.756Gi ± 1% +60.39% (p=0.000 n=20) Long 15.34Gi ± 0% 52.27Gi ± 0% +240.64% (p=0.000 n=20) geomean 5.188Gi 10.70Gi +106.24% Sponsored by: The FreeBSD Foundation Approved by: kib Reviewed by: mjg jrtc27 Differential Revision: https://reviews.freebsd.org/D40693	2023-08-04 01:54:23 +03:00
Robert Clausecker	ad2fac552c	lib/libc/amd64: add archlevel-based simd dispatch framework Add a framework for selecting from one of multiple implementations of a function based on amd64 architecture level (cf. amd64 SysV ABI supplement). Sponsored by: The FreeBSD Foundation Approved by: kib Reviewed by: jrtc27 Differential Revision: https://reviews.freebsd.org/D40693	2023-08-04 01:53:43 +03:00
Warner Losh	4d846d260e	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix	2023-05-12 10:44:03 -06:00
Konstantin Belousov	ae507c25de	amd64 libc: add missed GNU-stack annotation to memmove/memcpy Sponsored by: The FreeBSD Foundation MFC after: 1 week	2022-11-18 15:31:38 +02:00
Alexander Motin	f22068d91b	amd64: Stop using REP MOVSB for backward memmove()s. Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes REP MOVSB the fastest way to copy memory in most of cases. However Intel Optimization Reference Manual says: "setting the DF to force REP MOVSB to copy bytes from high towards low addresses will expe- rience significant performance degradation". Measurements on Intel Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs. This patch keeps ERMS use for forward ordered memory copies, but removes it for backward overlapped moves where it does not work. This is just a cosmetic sync with kernel, since libc does not use ERMS at this time. Reviewed by: mjg MFC after: 2 weeks	2022-06-16 14:51:50 -04:00
Mateusz Guzik	fbc002cb72	amd64: bring back asm bcmp, shared with memcmp Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to bcmp calls. Reviewed by: emaste (previous version), jhb (previous version) Differential Revision: https://reviews.freebsd.org/D34673	2022-03-26 09:10:03 +00:00
Mateusz Guzik	f0f0f2abf3	amd64: remove bcmp.S Fixes: `5fc3cc2713` ("amd64: make bcmp in libc just call memcmp")	2022-03-25 14:57:51 +00:00
Mateusz Guzik	5fc3cc2713	amd64: make bcmp in libc just call memcmp Preferably bcmp would just alias memcmp but there is build magic which makes this problematic. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28846	2022-03-12 14:59:14 +00:00
Mateusz Guzik	7f06b217c5	amd64: import asm strlen into libc Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28845	2021-02-23 00:09:55 +00:00
Mateusz Guzik	f1be262ec1	amd64: move memcmp checks upfront This is a tradeoff which saves jumps for smaller sizes while making the 8-16 range slower (roughly in line with the other cases). Tested with glibc test suite. For example size 3 (most common with vfs namecache) (ops/s): before: 407086026 after: 461391995 The regressed range of 8-16 (with 8 as example): before: 540850489 after: 461671032	2021-01-31 16:07:20 +00:00
Mateusz Guzik	0db6aef407	amd64: add a note about simd to libc memset, memmove and memcmp	2021-01-31 16:07:19 +00:00
Mateusz Guzik	164c3b8184	amd64: add missing ALIGN_TEXT to loops in memset and memmove	2021-01-30 00:01:44 +00:00
Mateusz Guzik	8291e88748	amd64: sync up libc memcmp with the kernel version (r357309)	2020-01-30 19:57:05 +00:00
Mateusz Guzik	4846152a08	amd64: sync up libc memcmp with the kernel version (r357208)	2020-01-29 01:57:07 +00:00
Mateusz Guzik	ddf6571230	amd64: align target memmove buffer to 16 bytes before using rep movs See the review for sample test results. Reviewed by: kib (kernel part) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18401	2018-12-01 14:20:32 +00:00
Mateusz Guzik	94243af2da	amd64: handle small memmove buffers with overlapping stores Handling sizes of > 32 backwards will be updated later. Reviewed by: kib (kernel part) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18387	2018-11-30 20:58:08 +00:00
Mateusz Guzik	2847cfce54	amd64: remove stale attribution for memmove work While the routine started as expanded bcopy, it is now entirely rewritten. Sponsored by: The FreeBSD Foundation	2018-11-30 00:47:36 +00:00

1 2

78 commits