StemmerLV: Latvian lemmatizer and stemmer for Java
Gints Jēkabsons

StemmerLV was initially developed as a pure Java substitute for the HunspellJNA library to do lemmatization for Latvian but quickly got some additional functionality. What StemmerLV can do: * Lemmatize a word according to the affix and dictionary files. The result is the same as with HunspellJNA (but unfortunately it works on average about 15% slower). * Save time by returning as soon as the first lemma is found. In this mode it works on average almost 3 times faster than HunspellJNA but never returns more than one lemma. * Stem a word by either finding or guessing its lemma and then stemming the lemma. Lemma guessing allows finding consistent short stems for unknown words that are not included in the dictionary. * List all word forms for a given lemma. * List all lemmas included in the dictionary together with all their word forms.


Pieteikuma datums
06.11.2020.
Atslēgas vārdi
lemmatizēšana, celmošana, Hunspell, Java
Hipersaite
http://www.cs.rtu.lv/jekabsons/nlp.html
Zinātniskās darbības koordinācijas un informācijas nodaļa.
E-pasts: elza.vecpuise@rtu.lv; Tālr: +371 26013889