Elasticsearch with custom lemmatizer

Elasticsearch doesn’t offer a lemmatizer for following languages out of the box:

Bulgarian,
Czech,
Estonian,
French,
Hungarian,
Macedonian,
Persian,
Polish,
Romanian,
Russian,
Slovak,
Slovene,
Serbian,
Ukrainian.

There is a great plugin LemmaGen that solves this shortcoming. At the time of writing LemmaGen works with ElasticSearch 2.2.0 and older.

We show a simple python example, which connects to Elasticsearch server, initializes index and mapping, adds documents in Slovenian language and executes a search. We add documents with following titles: pes, psa, psi, pse, psovanje, pesem, pesmi, where first 4 titles are about dogs and last 3 have the same first letters, but different meaning. We show how to execute a search with a query pes (a dog) and retrieve only search results about dogs. Further reading: Efficient search in your local language

Install

Download elasticsearch, extract the zip and move the elasticsearch directory to some path.
Go to that path and install Lemmagen plugin: ./bin/plugin install https://github.com/vhyza/elasticsearch-analysis-lemmagen/releases/download/v2.2.0/elasticsearch-analysis-lemmagen-2.2.0-plugin.zip
Download this project and install requirements with: pip install -r requirements.txt

Run

Run bin/elasticsearch to start elasticsearch server. Run python main.py to execute the search.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.rst		README.rst
controller_elastic.py		controller_elastic.py
main.py		main.py
models_elastic.py		models_elastic.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.rst

README.rst

controller_elastic.py

controller_elastic.py

main.py

main.py

models_elastic.py

models_elastic.py

requirements.txt

requirements.txt

Repository files navigation

Elasticsearch with custom lemmatizer

Install

Run

About

Releases

Packages

Languages

romanorac/elastic_localized_search

Folders and files

Latest commit

History

Repository files navigation

Elasticsearch with custom lemmatizer

Install

Run

About

Resources

Stars

Watchers

Forks

Languages