StemmerLV: Latvian lemmatizer and stemmer for Java

Version 1.1 (October 2, 2020)
Developed by Gints Jkabsons (gints.jekabsons@rtu.lv)
Available at: http://www.cs.rtu.lv/jekabsons/nlp.html
Licensed under the GNU Lesser General Public License.

StemmerLV uses Hunspell affix and dictionary files created by Jnis Eisaks (available at http://dict.dv.lv/home.php?prj=lv).

StemmerLV was initially developed as a pure Java substitute for the HunspellJNA library to do lemmatization for Latvian but quickly got some additional functionality.

What StemmerLV can do:

* Lemmatize a word according to the affix and dictionary files. The result is the same as with HunspellJNA (but unfortunately it works on average about 15% slower).
* Save time by returning as soon as the first lemma is found. In this mode it works on average almost 3 times faster than HunspellJNA but never returns more than one lemma.
* Stem a word by either finding or guessing its lemma and then stemming the lemma. Lemma guessing allows finding consistent short stems for unknown words that are not included in the dictionary.
* List all word forms for a given lemma.
* List all lemmas included in the dictionary together with all their word forms.

How StemmerLV does stemming:

1. Uses the affix file to generate lemma candidates for a given word.
2. Checks if any of the lemma candidates exist in the dictionary. If at least one candidate is there, discards all the candidates that are not there. If none of candidates are there and guessing is disabled, just returns the original word.
3. If none of the lemmas exist in the dictionary, filters out those with weird unnatural endings but keeps all the rest of the candidates as guesses for the lemma. There can be up to about 20 different guesses. (This step is skipped if guessing is disabled.)
4. Stems all lemmas using four simple character removal rules designed specifically for stemming Latvian lemmas and returns the shortest stem.

To use any of the functionality of StemmerLV, add the .jar file to your project, create a StemmerLV object, and see the list of available functions - their names are pretty self-explanatory. Source code is included in the .jar file.
