Skip to content

anukat2015/kontext

 
 

Repository files navigation

KonText

Introduction

KonText is a fully featured corpus query interface for the Manatee open corpus search engine. It started as a fork of the Bonito 2.68 web interface and while still sharing a lot of code with the original Bonito (now bonito-open), KonText is gradually becoming more independent.

It is maintained by the Institute of the Czech National Corpus. Current version contains all the key features of the Bonito 2.98.3 (primarily a support for parallel corpora).

Features

internal changes

  • rewritten as a WSGI application (Bonito-open is CGI-based)
  • modular code design with dynamically loadable plug-ins providing custom functionality implementation
  • fully decoupled background concordance calculation based on the Celery task queue
  • completely rewritten client-side code (AMD modules, code separated from templates)
  • improved logging, error processing and debugging support
  • improved code documentation

new features

  • support for spoken corpora - defined segments can be played back as audio
  • support for user-defined line groups
  • persistent URLs for large queries - you can send a link to someone even if the query was in megabytes
  • access to previous queries
  • easy access to favorite corpora (subcorpora, aligned corpora)
  • interactive subcorpus selection - you can select text types and see how other attributes' available values changed
  • interactive PoS tag tool - in case of positional PoS tag formats an interactive tool can be used to write tag queries
  • a concordance/frequency/collocation listing can be saved in Excel format (xlsx)
  • a correct (i.e. the one calculating only with selected text types) i.p.m. can be calculated on-demand for ad-hoc subcorpora

enhanced user interface

  • improved user interface and design
  • extended corpora information (size, structures, attributes, citation information)
  • concordance results contain also the Average Reduced Frequency
  • sub-corpus can be created by a custom CQL expression
  • on the multilevel frequency distribution page, starting word can be specified for multi-word KWICs
  • result shuffling can be pre-set

Requirements

  • a WSGI-compatible server
    • recommended setup: Gunicorn + a reverse proxy (e.g. Nginx or Apache2)
    • supported setup: Apache2 with mod_wsgi
  • Python 2.7 and
    • lxml library
    • werkzeug library (provides WSGI middleware)
    • PyICU library (optional but preferred)
    • markdown library (optional, for formatted corpora references)
    • openpyxl library (optional, for XLSX export)
  • corpus search engine Manatee
    • versions from 2.83.3 to 2.129.2 are supported (the latest one is highly recommended); unless there is an incompatible change in Manatee, newer versions should work too
  • a key-value storage
    • any custom implementation (Redis and SQLite backends are available by default)
  • (optional) Celery task queue task queue for background concordance calculation and maintenance tasks

Build and installation

Please refer to the INSTALL.md file for details.

Customization and contribution

Please refer to the DEVELOPMENT.md file.

About

UI for Search Engine part of Sketch Engine -> An alternative web front-end for the Manatee corpus search engine

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 52.7%
  • TypeScript 26.3%
  • JavaScript 15.9%
  • CSS 5.0%
  • HTML 0.1%