postgresql/src/backend/snowball
Peter Eisentraut dbbca2cf29 Remove unused #include's from backend .c files
as determined by include-what-you-use (IWYU)

While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that.  In some cases, a more
specific #include replaces another less specific one.

Some manual adjustments of the automatic result:

- IWYU currently doesn't know about includes that provide global
  variable declarations (like -Wmissing-variable-declarations), so
  those includes are being kept manually.

- All includes for port(ability) headers are being kept for now, to
  play it safe.

- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
  patch from exploding in size.

Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.

As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.

Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-03-04 12:02:20 +01:00
..
libstemmer Update snowball 2021-12-07 07:04:05 +01:00
stopwords Sync our Snowball stemmer dictionaries with current upstream. 2018-09-24 17:29:38 -04:00
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Makefile Remove distprep 2023-11-06 15:18:04 +01:00
README Update snowball 2021-12-07 07:04:05 +01:00
dict_snowball.c Remove unused #include's from backend .c files 2024-03-04 12:02:20 +01:00
meson.build Update copyright for 2024 2024-01-03 20:49:05 -05:00
snowball.sql.in Update copyright for 2024 2024-01-03 20:49:05 -05:00
snowball_create.pl Add copyright notices to a few perl scripts that don't have them 2024-01-05 13:15:50 +00:00
snowball_func.sql.in Update copyright for 2024 2024-01-03 20:49:05 -05:00

README

src/backend/snowball/README

Snowball-Based Stemming
=======================

This module uses the word stemming code developed by the Snowball project,
http://snowballstem.org (formerly http://snowball.tartarus.org)
which is released by them under a BSD-style license.

The Snowball project does not often make formal releases; it's best
to pull from their git repository

git clone https://github.com/snowballstem/snowball.git

and then building the derived files is as simple as

cd snowball
make

At least on Linux, no platform-specific adjustment is needed.

Postgres' files under src/backend/snowball/libstemmer/ and
src/include/snowball/libstemmer/ are taken directly from the Snowball
files, with only some minor adjustments of file inclusions.  Note
that most of these files are in fact derived files, not original source.
The original sources are in the Snowball language, and are built using
the Snowball-to-C compiler that is also part of the Snowball project.
We choose to include the derived files in the PostgreSQL distribution
because most installations will not have the Snowball compiler available.

We are currently synced with the Snowball git commit
48a67a2831005f49c48ec29a5837640e23e54e6b (tag v2.2.0)
of 2021-11-10.

To update the PostgreSQL sources from a new Snowball version:

0. If you didn't do it already, "make -C snowball".

1. Copy the *.c files in snowball/src_c/ to src/backend/snowball/libstemmer
with replacement of "../runtime/header.h" by "header.h", for example

for f in .../snowball/src_c/*.c
do
    sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f`
done

Do not copy stemmers that are listed in libstemmer/modules.txt as
nonstandard, such as "german2" or "lovins".

2. Copy the *.c files in snowball/runtime/ to
src/backend/snowball/libstemmer, and edit them to remove direct inclusions
of system headers such as <stdio.h> --- they should only include "header.h".
(This removal avoids portability problems on some platforms where <stdio.h>
is sensitive to largefile compilation options.)

3. Copy the *.h files in snowball/src_c/ and snowball/runtime/
to src/include/snowball/libstemmer.  At this writing the header files
do not require any changes.

4. Check whether any stemmer modules have been added or removed.  If so, edit
the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the
stemmer_modules[] table in dict_snowball.c, as well as the list in the
documentation in textsearch.sgml.  You might also need to change
the LANGUAGES list in Makefile and tsearch_config_languages in initdb.c.

5. The various stopword files in stopwords/ must be downloaded
individually from pages on the snowballstem.org website.
Be careful that these files must be stored in UTF-8 encoding.