GitHub - dlukes/kontext: An alternative web front-end for the Manatee corpus search engine

Important note

Please note that due to Python 2 EOL, KonText version 0.13.x is the last one running in Python 2. It means that the next release (planned for Q1 2020) will run only in Python 3. For the master branch users - the last commit supporting Python 2 is tagged py2_last_version and the first one supporting Python 3 is tagged py3_initial_version. To upgrade, please refer to doc/py2to3.md for details. For new installations, please follow doc/INSTALL.md.

Introduction

KonText is an advanced corpus query interface and corpus data integration middleware built around corpus search engine Manatee-open. The development is maintained by the Institute of the Czech National Corpus.

Features

notable end-user features

fully editable query chain
- any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
advanced CQL editor with syntax highlighting and attribute recognition
support for spoken corpora
- defined concordance segments can be played back as audio
- KWIC detail provides a custom rendering with easily distinguishable speeches
support for user-defined line groups
- user can define custom numeric tags attached to concordance lines, filter out other lines, review groups ratios
improved subcorpus creation
- user can easily examine corpus structure by selecting some text types and see how other text type attributes availability changed ("which publishers are there in case only fiction is selected?")
- a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
- a sub-corpus can be created by a custom CQL expression
- a sub-corpus can be published so other users can access it
- subcorpora are backed up as CQL queries which makes further modification/restoring possible
frequency distribution
- 2-dimensional frequency distribution for both positional and structural attributes
- result caching decreases time required to navigate between pages
- on the multilevel frequency distribution page, starting word can be specified for multi-word KWICs
persistent URL for any query - you can send a link to someone even if the query string was megabytes long
access to previous queries, named queries
access to favorite corpora (subcorpora, aligned corpora)
interactive PoS tag tool - in case of positional PoS tag formats an interactive tool can be used to write tag queries
a concordance/frequency/collocation listing can be saved in Excel format (xlsx)
concordance tokens and KWICs can be connected to external data services (e.g. dictionaries, encyclopedias)
a correct (i.e. the one calculating only with selected text types) i.p.m. can be calculated on-demand for ad-hoc subcorpora
integrability with external data resources (e.g. dictionaries, media libraries)

internal features

server-side written as a WSGI application
modern client-side application (event stream architecture, React components, extensible, written in TypeScript)
modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)
fully decoupled background concordance/frequency/collocation calculation based on the Celery task queue (alternatively, Python's multiprocessing package can be used)
improved logging, error processing and debugging support
improved code documentation

Requirements

Rerverse proxy server
- Nginx (recommended), Apache (tested)
Python 3.6 (or newer) and:
- WSGI-compatible server
  - Gunicorn (recommended)
  - or uWsgi (tested)
- Werkzeug web application library
- Jinja2 template engine
- lxml library
- PyICU library (optional but preferred)
- markdown library (optional, for formatted corpora references)
- openpyxl library (optional, for XLSX export)
corpus search engine Manatee
- versions 2.167.8 and newer are supported by KonText 0.15 and newer
- versions from 2.83.3 to 2.158.8 are supported by KonText 0.13 and older
a key-value storage
- any custom implementation (Redis and SQLite backends are available by default)
Celery task queue task queue for (asynchronous) background calculations and maintenance tasks

Note: KonText versions up to 0.13.x (incl.) run on Python 2. To use Python 3, 0.15.x and newer versions of KonText must be used.

Build and installation

KonText provides a script for automatic installation to an existing Ubuntu system. The easiest way to install KonText is to create an LXC/LXD container, clone the repository there and run the script. On a decently fast network, the whole process takes only a couple of seconds. Please refer to the doc/INSTALL.md file for details.

Customization and contribution

Please refer to our Wiki.

Notable users

Institute of the Czech National Corpus
LINDAT
Clarin-PL
Інститут української
Serbski Institut (API version of KonText)

Name		Name	Last commit message	Last commit date
Latest commit History 6,977 Commits
.github/workflows		.github/workflows
build-scripts		build-scripts
conf		conf
doc		doc
lib		lib
locale		locale
public		public
scripts		scripts
templates		templates
test-data/tags		test-data/tags
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
Makefile		Makefile
README.md		README.md
apt-requirements.txt		apt-requirements.txt
dev-requirements.txt		dev-requirements.txt
ecosystem.config.js		ecosystem.config.js
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json
tslint.json		tslint.json
webpack.dev.js		webpack.dev.js
webpack.prod.js		webpack.prod.js
worker.py		worker.py

License

dlukes/kontext

Folders and files

Latest commit

History

Repository files navigation

Important note

Contents

Introduction

Features

notable end-user features

internal features

Requirements

Build and installation

Customization and contribution

Notable users

About

Resources

License

Stars

Watchers

Forks

Languages