postgresql/src/common/unicode
Jeff Davis a02b37fc08 Additional unicode primitive functions.
Introduce unicode_version(), icu_unicode_version(), and
unicode_assigned().

The latter requires introducing a new lookup table for the Unicode
General Category, which is generated along with the other Unicode
lookup tables.

Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com
Reviewed-by: Peter Eisentraut
2023-11-01 22:47:06 -07:00
..
.gitignore Update display widths as part of updating Unicode 2021-08-26 10:53:56 -04:00
category_test.c Additional unicode primitive functions. 2023-11-01 22:47:06 -07:00
generate-norm_test_table.pl Pre-beta mechanical code beautification. 2023-05-19 17:24:48 -04:00
generate-unicode_category_table.pl Additional unicode primitive functions. 2023-11-01 22:47:06 -07:00
generate-unicode_east_asian_fw_table.pl Make Unicode script fit for future versions 2023-09-18 07:25:46 +02:00
generate-unicode_nonspacing_table.pl Update copyright for 2023 2023-01-02 15:00:37 -05:00
generate-unicode_norm_table.pl Pre-beta mechanical code beautification. 2023-05-19 17:24:48 -04:00
generate-unicode_normprops_table.pl Pre-beta mechanical code beautification. 2023-05-19 17:24:48 -04:00
generate-unicode_version.pl Additional unicode primitive functions. 2023-11-01 22:47:06 -07:00
Makefile Additional unicode primitive functions. 2023-11-01 22:47:06 -07:00
meson.build Additional unicode primitive functions. 2023-11-01 22:47:06 -07:00
norm_test.c Additional unicode primitive functions. 2023-11-01 22:47:06 -07:00
README Add support for automatically updating Unicode derived files 2020-01-09 10:08:14 +01:00

This directory contains tools to generate the tables in
src/include/common/unicode_norm.h, used for Unicode normalization. The
generated .h file is included in the source tree, so these are normally not
needed to build PostgreSQL, only if you need to re-generate the .h file
from the Unicode data files for some reason, e.g. to update to a new version
of Unicode.

Generating unicode_norm_table.h
-------------------------------

Run

    make update-unicode

from the top level of the source tree and commit the result.

Tests
-----

The Unicode consortium publishes a comprehensive test suite for the
normalization algorithm, in a file called NormalizationTest.txt. This
directory also contains a perl script and some C code, to run our
normalization code with all the test strings in NormalizationTest.txt.
To download NormalizationTest.txt and run the tests:

    make normalization-check

This is also run as part of the update-unicode target.