opnsense-src/lib/libc/regex
Tim J. Robbins e5996857ad Make regular expression matching aware of multibyte characters. The general
idea is that we perform multibyte->wide character conversion while parsing
and compiling, then convert byte sequences to wide characters when they're
needed for comparison and stepping through the string during execution.

As with tr(1), the main complication is to efficiently represent sets of
characters in bracket expressions. The old bitmap representation is replaced
by a bitmap for the first 256 characters combined with a vector of individual
wide characters, a vector of character ranges (for [A-Z] etc.), and a vector
of character classes (for [[:alpha:]] etc.).

One other point of interest is that although the Boyer-Moore algorithm had
to be disabled in the general multibyte case, it is still enabled for UTF-8
because of its self-synchronizing nature. This greatly speeds up matching
by reducing the number of multibyte conversions that need to be done.
2004-07-12 07:35:59 +00:00
..
grot Fix the style of the SCM ID's. 2002-03-22 23:42:05 +00:00
cclass.h Fix the style of the SCM ID's. 2002-03-22 23:42:05 +00:00
cname.h Fix the style of the SCM ID's. 2002-03-22 23:42:05 +00:00
COPYRIGHT
engine.c Make regular expression matching aware of multibyte characters. The general 2004-07-12 07:35:59 +00:00
Makefile.inc libc_r wasn't so tied to libc for 22 months. 2002-11-18 09:50:57 +00:00
re_format.7 Mechanically kill hard sentence breaks. 2004-07-02 23:52:20 +00:00
regcomp.c Make regular expression matching aware of multibyte characters. The general 2004-07-12 07:35:59 +00:00
regerror.c Add a new error code, REG_ILLSEQ, to indicate that a regular expression 2004-07-12 06:07:26 +00:00
regex.3 Add a new error code, REG_ILLSEQ, to indicate that a regular expression 2004-07-12 06:07:26 +00:00
regex2.h Make regular expression matching aware of multibyte characters. The general 2004-07-12 07:35:59 +00:00
regexec.c Make regular expression matching aware of multibyte characters. The general 2004-07-12 07:35:59 +00:00
regfree.c Make regular expression matching aware of multibyte characters. The general 2004-07-12 07:35:59 +00:00
utils.h Fix the style of the SCM ID's. 2002-03-22 23:42:05 +00:00
WHATSNEW