Commit graph

342 commits

Author SHA1 Message Date
Heikki Linnakangas
12b4bdcb43 Add KOI8-U map files to Makefile.
These were left out by mistake back when support for KOI8-U encoding was
added.

Extracted from Kyotaro Horiguchi's larger patch.
2017-02-02 14:14:01 +02:00
Noah Misch
a0104e0807 Encoding PG_UHC is code page 949.
This fixes presentation of non-ASCII messages to the Windows event log
and console in rare cases involving Korean locale.  Processes like the
postmaster and checkpointer, but not processes attached to databases,
were affected.  Back-patch to 9.4, where MessageEncoding was introduced.
The problem exists in all supported versions, but this change has no
effect in the absence of the code recognizing PG_UHC MessageEncoding.

Noticed while investigating bug #13427 from Dmitri Bourlatchkov.
2015-08-14 20:23:42 -04:00
Noah Misch
f988da9539 Restore old pgwin32_message_to_UTF16() behavior outside transactions.
Commit 49c817eab7 replaced with a hard
error the dubious pg_do_encoding_conversion() behavior when outside a
transaction.  Reintroduce the historic soft failure locally within
pgwin32_message_to_UTF16().  This fixes errors when writing messages in
less-common encodings to the Windows event log or console.  Back-patch
to 9.4, where the aforementioned commit first appeared.

Per bug #13427 from Dmitri Bourlatchkov.
2015-08-14 20:23:41 -04:00
Bruce Momjian
0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Bruce Momjian
5db55c6bbc Remove wchar.c Asserts that were stricter than the main code
Assert errors were thrown for functions being passed invalid encodings,
while the main code handled it just fine.

Also document that libpq's PQclientEncoding() returns -1 for an encoding
lookup failure.

Per report from Peter Geoghegan
2014-03-24 15:59:38 -04:00
Bruce Momjian
242c2737fb C comments: remove odd blank lines after #ifdef WIN32 lines
A few more
2014-03-13 01:42:24 -04:00
Tom Lane
769065c1b2 Prefer pg_any_to_server/pg_server_to_any over pg_do_encoding_conversion.
A large majority of the callers of pg_do_encoding_conversion were
specifying the database encoding as either source or target of the
conversion, meaning that we can use the less general functions
pg_any_to_server/pg_server_to_any instead.

The main advantage of using the latter functions is that they can make use
of a cached conversion-function lookup in the common case that the other
encoding is the current client_encoding.  It's notationally cleaner too in
most cases, not least because of the historical artifact that the latter
functions use "char *" rather than "unsigned char *" in their APIs.

Note that pg_any_to_server will apply an encoding verification step in
some cases where pg_do_encoding_conversion would have just done nothing.
This seems to me to be a good idea at most of these call sites, though
it partially negates the performance benefit.

Per discussion of bug #9210.
2014-02-23 16:59:05 -05:00
Tom Lane
49c817eab7 Plug some more holes in encoding conversion.
Various places assume that pg_do_encoding_conversion() and
pg_server_to_any() will ensure encoding validity of their results;
but they failed to do so in the case that the source encoding is SQL_ASCII
while the destination is not.  We cannot perform any actual "conversion"
in that scenario, but we should still validate the string according to the
destination encoding.  Per bug #9210 from Digoal Zhou.

Arguably this is a back-patchable bug fix, but on the other hand adding
more enforcing of encoding checks might break existing applications that
were being sloppy.  On balance there doesn't seem to be much enthusiasm
for a back-patch, so fix in HEAD only.

While at it, remove some apparently-no-longer-needed provisions for
letting pg_do_encoding_conversion() "work" outside a transaction ---
if you consider it "working" to silently fail to do the requested
conversion.

Also, make a few cosmetic improvements in mbutils.c, notably removing
some Asserts that are certainly dead code since the variables they
assert aren't null are never null, even at process start.  (I think
this wasn't true at one time, but it is now.)
2014-02-23 15:22:50 -05:00
Tom Lane
0d79c0a8cc Make various variables const (read-only).
These changes should generally improve correctness/maintainability.
A nice side benefit is that several kilobytes move from initialized
data to text segment, allowing them to be shared across processes and
probably reducing copy-on-write overhead while forking a new backend.
Unfortunately this doesn't seem to help libpq in the same way (at least
not when it's compiled with -fpic on x86_64), but we can hope the linker
at least collects all nominally-const data together even if it's not
actually part of the text segment.

Also, make pg_encname_tbl[] static in encnames.c, since there seems
no very good reason for any other code to use it; per a suggestion
from Wim Lewis, who independently submitted a patch that was mostly
a subset of this one.

Oskari Saarenmaa, with some editorialization by me
2014-01-18 16:04:32 -05:00
Bruce Momjian
7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Tom Lane
d9f37e6661 Add checks for valid multibyte character length in UtfToLocal, LocalToUtf.
This is mainly to suppress "uninitialized variable" warnings from very
recent versions of gcc.  But it seems like a good robustness thing anyway,
not to mention that we might someday decide to support 6-byte UTF8.

Per report from Karol Trzcionka.  No back-patch since there's no reason
at the moment to think this is more than cosmetic.
2013-07-18 21:55:38 -04:00
Noah Misch
5f538ad004 Renovate display of non-ASCII messages on Windows.
GNU gettext selects a default encoding for the messages it emits in a
platform-specific manner; it uses the Windows ANSI code page on Windows
and follows LC_CTYPE on other platforms.  This is inconvenient for
PostgreSQL server processes, so realize consistent cross-platform
behavior by calling bind_textdomain_codeset() on Windows each time we
permanently change LC_CTYPE.  This primarily affects SQL_ASCII databases
and processes like the postmaster that do not attach to a database,
making their behavior consistent with PostgreSQL on non-Windows
platforms.  Messages from SQL_ASCII databases use the encoding implied
by the database LC_CTYPE, and messages from non-database processes use
LC_CTYPE from the postmaster system environment.  PlatformEncoding
becomes unused, so remove it.

Make write_console() prefer WriteConsoleW() to write() regardless of the
encodings in use.  In this situation, write() will invariably mishandle
non-ASCII characters.

elog.c has assumed that messages conform to the database encoding.
While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL.
Introduce MessageEncoding to track the actual encoding of message text.
The present consumers are Windows-specific code for converting messages
to UTF16 for use in system interfaces.  This fixes the appearance in
Windows event logs and consoles of translated messages from SQL_ASCII
processes like the postmaster.  Note that SQL_ASCII inherently disclaims
a strong notion of encoding, so non-ASCII byte sequences interpolated
into messages by %s may yet yield a nonsensical message.  MULE_INTERNAL
has similar problems at present, albeit for a different reason: its lack
of libiconv support or a conversion to UTF8.

Consequently, one need no longer restart Windows with a different
Windows ANSI code page to broadly test backend logging under a given
language.  Changing the user's locale ("Format") is enough.  Several
accounts can simultaneously run postmasters under different locales, all
correctly logging localized messages to Windows event logs and consoles.

Alexander Law and Noah Misch
2013-06-26 11:17:33 -04:00
Bruce Momjian
9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Bruce Momjian
bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Andrew Dunstan
3717f0837b Tidy up from frontend Assert change.
Quiet compiler warnings noted by Peter Eisentraut.
2012-12-16 12:22:57 -05:00
Bruce Momjian
381a9ed66d Remove configure flag --disable-shared, as it is no longer used by any
port.  The last use was QNX, per Peter Eisentraut.
2012-08-30 16:26:53 -04:00
Tom Lane
60e9c224a1 Fix ASCII case in pg_wchar2mule_with_len.
Also some cosmetic improvements for wchar-to-mblen patch.
2012-07-10 15:59:39 -04:00
Robert Haas
f6a05fd973 Fix failure of new wchar->mb functions to advance from pointer.
Bug spotted by Tom Lane.
2012-07-05 23:47:53 -04:00
Bruce Momjian
042d9ffc28 Run newly-configured perltidy script on Perl files.
Run on HEAD and 9.2.
2012-07-04 21:47:49 -04:00
Robert Haas
72dd6291f2 Add wchar -> mb conversion routines.
This is infrastructure for Alexander Korotkov's work on indexing regular
expression searches.

Alexander Korotkov, with a bit of further hackery on the MULE conversion
by me
2012-07-04 17:10:10 -04:00
Tom Lane
09022de1f5 Improve documentation about MULE encoding.
This commit improves the comments in pg_wchar.h and creates #define symbols
for some formerly hard-coded values.  No substantive code changes.

Tatsuo Ishii and Tom Lane
2012-07-04 00:29:57 -04:00
Bruce Momjian
927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Robert Haas
5d4b60f2f2 Lots of doc corrections.
Josh Kupershmidt
2012-04-23 22:43:09 -04:00
Bruce Momjian
e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane
eb5834d5af Further improvement of make_greater_string.
Make sure that it considers all the possibilities that the old code did,
instead of trying only one possibility per character position.  To keep the
runtime in bounds, instead tweak the character incrementers to not try
every possible multibyte character code.  Remove unnecessary logic to
restore the old character value on failure.  Additional comment and
formatting cleanup.
2011-10-30 12:22:11 -04:00
Robert Haas
78d523b633 Improve make_greater_string() with encoding-specific incrementers.
This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.

Kyotaro Horiguchi, with various adjustments by me.
2011-10-29 14:22:20 -04:00
Peter Eisentraut
1b81c2fe6e Remove many -Wcast-qual warnings
This addresses only those cases that are easy to fix by adding or
moving a const qualifier or removing an unnecessary cast.  There are
many more complicated cases remaining.
2011-09-11 21:54:32 +03:00
Tom Lane
623f77e9d1 Avoid possibly accessing off the end of memory in SJIS2004 conversion.
The code in shift_jis_20042euc_jis_2004() would fetch two bytes even when
only one remained in the string.  Since conversion functions aren't
supposed to assume null-terminated input, this poses a small risk of
fetching past the end of memory and incurring SIGSEGV.  No such crash has
been identified in the field, but we've certainly seen the equivalent
happen in other code paths, so patch this one all the way back.

Report and patch by Noah Misch.
2011-09-06 14:50:28 -04:00
Peter Eisentraut
a2a5ce6826 Improve "invalid byte sequence for encoding" message
It used to say

ERROR:  invalid byte sequence for encoding "UTF8": 0xdb24

Change this to

ERROR:  invalid byte sequence for encoding "UTF8": 0xdb 0x24

to make it clear that this is a byte sequence and not a code point.

Also fix the adjacent "character has no equivalent" message that has
the same issue.
2011-09-05 23:38:27 +03:00
Tom Lane
2ab0796d7a Fix char2wchar/wchar2char to support collations properly.
These functions should take a pg_locale_t, not a collation OID, and should
call mbstowcs_l/wcstombs_l where available.  Where those functions are not
available, temporarily select the correct locale with uselocale().

This change removes the bogus assumption that all locales selectable in
a given database have the same wide-character conversion method; in
particular, the collate.linux.utf8 regression test now passes with
LC_CTYPE=C, so long as the database encoding is UTF8.

I decided to move the char2wchar/wchar2char functions out of mbutils.c and
into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus
don't really belong with the mbutils.c functions.  Keeping them where they
were would have required importing pg_locale_t into pg_wchar.h somehow,
which did not seem like a good plan.
2011-04-23 12:35:41 -04:00
Tom Lane
88dc6fa7a1 foreach() and list_delete() don't mix.
Fix crash when releasing duplicate entries in the encoding conversion cache
list, caused by releasing the current entry of the list being chased by
foreach().  We have a standard idiom for handling such cases, but this
loop wasn't using it.

This got broken in my recent rewrite of GUC assign hooks.  Not sure how
I missed this when testing the modified code, but I did.  Per report from
Peter.
2011-04-17 13:37:39 -04:00
Bruce Momjian
bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane
2594cf0e8c Revise the API for GUC variable assign hooks.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment.  And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server".  There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead).  This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
2011-04-07 00:12:02 -04:00
Itagaki Takahiro
ca9cf85d54 Fix pg_server_to_client, that was broken in the previous commit. 2011-02-21 16:27:57 +09:00
Itagaki Takahiro
3cba8240a1 Add ENCODING option to COPY TO/FROM and file_fdw.
File encodings can be specified separately from client encoding.
If not specified, client encoding is used for backward compatibility.

Cases when the encoding doesn't match client encoding are slower
than matched cases because we don't have conversion procs for other
encodings. Performance improvement would be be a future work.

Original patch by Hitoshi Harada, and modified by me.
2011-02-21 14:32:40 +09:00
Peter Eisentraut
414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Bruce Momjian
5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Itagaki Takahiro
fd223c7407 Remove unnecessary string null-termination in pg_convert.
We can directly verify the unterminated input with pg_verify_mbstr_len.
2010-12-03 12:00:27 +09:00
Peter Eisentraut
fc946c39ae Remove useless whitespace at end of lines 2010-11-23 22:34:55 +02:00
Peter Eisentraut
19e231bbda Improved parallel make support
Replace for loops in makefiles with proper dependencies.  Parallel
make can now span across directories.  Also, make -k and make -q work
properly.

GNU make 3.80 or newer is now required.
2010-11-12 22:15:16 +02:00
Magnus Hagander
fe9b36fd59 Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Magnus Hagander
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Tom Lane
f679cfe97b Replace last remaining $Id$ with $PostgreSQL$. 2010-09-19 16:27:17 +00:00
Peter Eisentraut
3f11971916 Remove extra newlines at end and beginning of files, add missing newlines
at end of files.
2010-08-19 05:57:36 +00:00
Tom Lane
2d8314bd43 Rename utf2ucs() to utf8_to_unicode(), and export it so it can be used
elsewhere.

Similarly rename the version in mbprint.c, not because this affects anything
but just to keep the two copies in exact sync.  There was some discussion of
having only one copy in src/port/ instead, but this function is so small
and unlikely to change that that seems like overkill.

Slightly editorialized version of a patch by Joseph Adams.  (The bug-fix
aspect of his patch was applied separately, and back-patched.)
2010-08-18 19:54:01 +00:00
Tom Lane
5b1b3ef742 Adjust mbutils.c so it won't get broken by future pgindent runs.
To do that, replace L'\0' by (WCHAR) 0.  Perhaps someday we should teach
pgindent about wide-character literals, but so long as this is the only
use-case in the entire Postgres sources, a workaround seems easier.
2010-07-07 15:13:21 +00:00
Tom Lane
3f12653b73 Undo pgindent breakage (again). Per buildfarm. 2010-07-06 21:09:00 +00:00
Bruce Momjian
239d769e7e pgindent run for 9.0, second run 2010-07-06 19:19:02 +00:00
Tom Lane
5667a53b78 Undo some more pgindent breakage. Per buildfarm. 2010-02-27 03:55:52 +00:00
Bruce Momjian
65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00