StemmerLV: Latvian lemmatizer and stemmer for Java
Gints Jēkabsons

StemmerLV was initially developed as a pure Java substitute for the HunspellJNA library to do lemmatization for Latvian but quickly got some additional functionality. What StemmerLV can do: * Lemmatize a word according to the affix and dictionary files. The result is the same as with HunspellJNA (but unfortunately it works on average about 15% slower). * Save time by returning as soon as the first lemma is found. In this mode it works on average almost 3 times faster than HunspellJNA but never returns more than one lemma. * Stem a word by either finding or guessing its lemma and then stemming the lemma. Lemma guessing allows finding consistent short stems for unknown words that are not included in the dictionary. * List all word forms for a given lemma. * List all lemmas included in the dictionary together with all their word forms.


Date
06.11.2020.
Keywords
lemmatizēšana, celmošana, Hunspell, Java
Hyperlink
http://www.cs.rtu.lv/jekabsons/nlp.html
Department for Research Coordination and Information.
E-mail: elza.vecpuise@rtu.lv; Phone: +371 26013889