postgresql/src
Tom Lane 5e724125af Fix recently-introduced performance problem in ts_headline().
The new hlCover() algorithm that I introduced in commit c9b0c678d
turns out to potentially take O(N^2) or worse time on long documents,
if there are many occurrences of individual query words but few or no
substrings that actually satisfy the query.  (One way to hit this
behavior is with a "common_word & rare_word" type of query.)  This
seems unavoidable given the original goal of checking every substring
of the document, so we have to back off that idea.  Fortunately, it
seems unlikely that anyone would really want headlines spanning all of
a long document, so we can avoid the worse-than-linear behavior by
imposing a maximum length of substring that we'll consider.

For now, just hard-wire that maximum length as a multiple of max_words
times max_fragments.  Perhaps at some point somebody will argue for
exposing it as a ts_headline parameter, but I'm hesitant to make such
a feature addition in a back-patched bug fix.

I also noted that the hlFirstIndex() function I'd added in that
commit was unnecessarily stupid: it really only needs to check whether
a HeadlineWordEntry's item pointer is null or not.  This wouldn't make
all that much difference in typical cases with queries having just
a few terms, but a cycle shaved is a cycle earned.

In addition, add a CHECK_FOR_INTERRUPTS call in TS_execute_recurse.
This ensures that hlCover's loop is cancellable if it manages to take
a long time, and it may protect some other TS_execute callers as well.

Back-patch to 9.6 as the previous commit was.  I also chose to add the
CHECK_FOR_INTERRUPTS call to 9.5.  The old hlCover() algorithm seems
to avoid the O(N^2) behavior, at least on the test case I tried, but
nonetheless it's not very quick on a long document.

Per report from Stephen Frost.

Discussion: https://postgr.es/m/20200724160535.GW12375@tamriel.snowman.net
2020-07-31 11:43:12 -04:00
..
backend Fix recently-introduced performance problem in ts_headline(). 2020-07-31 11:43:12 -04:00
bin Switch pg_test_fsync to use binary mode on Windows 2020-07-16 15:53:04 +09:00
common Replace use of sys_siglist[] with strsignal(). 2020-07-15 22:05:12 -04:00
fe_utils Fix translation of special characters in psql's LaTeX output modes. 2018-11-26 17:32:51 -05:00
include Avoid direct C access to possibly-null pg_subscription_rel.srsublsn. 2020-07-21 11:40:47 -04:00
interfaces Stamp 10.13. 2020-05-11 17:12:38 -04:00
makefiles Select CFLAGS_SL at configure time, not in platform-specific Makefiles. 2019-10-21 12:32:36 -04:00
pl Translation updates 2020-05-11 13:26:52 +02:00
port Replace use of sys_siglist[] with strsignal(). 2020-07-15 22:05:12 -04:00
template Select CFLAGS_SL at configure time, not in platform-specific Makefiles. 2019-10-21 12:32:36 -04:00
test Fix construction of updated-columns bitmap in logical replication. 2020-07-20 13:40:16 -04:00
timezone Ensure that distributed timezone abbreviation files are plain ASCII. 2020-07-17 11:04:44 -04:00
tools Avoid need for valgrind suppressions for pg_atomic_init_u64 on some platforms. 2020-06-08 20:03:09 -07:00
tutorial Phase 2 of pgindent updates. 2017-06-21 15:19:25 -04:00
.gitignore
DEVELOPERS
Makefile Build src/test/isolation during "make" and "make install". 2017-11-22 20:18:52 -08:00
Makefile.global.in Select CFLAGS_SL at configure time, not in platform-specific Makefiles. 2019-10-21 12:32:36 -04:00
Makefile.shlib makefile: use proper linker flags for C++ compiles 2020-03-31 22:26:11 -04:00
nls-global.mk nls-global.mk: search build dir for source files, too 2016-06-07 18:55:18 -04:00