- Retrieves data from ElasticSearch
- Analyzes the data
- Reorganizes ES indices and content based on the analysis
- Extract data from ElasticSearch
- Pipeline performs multiple highly configurable steps to process the data analysis
- Post process applies the analysis to ElasticSearchn documents and updates ElasticSearch accordingly
- Launches with TextAnalyzerLaunch.py
- Reads process instructions from
resources.mermtools.ini
- Extracts data from ElasticSearch (ES) and converts to Pandas DataFrame
- DataFrame enters into processing pipeline.
- Pipeline prepares data for analysis (e.g., tokenization and lemmatization). Each function in the pipeline is performed in its own class. Classes with related functions may be in the same script.
- Pipeline performs analyzes (e.g., TF-IDF, LDA, k-means etc.)
- Post Pipeline processes the analysis results and reorganize ElasticSearch data according to results.
This project is dockerized.