Skip to content

schwa-lab/dr-apps-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This package provides a collection of tools for working with docrep data, driven by the dr command provided by libschwa, and using its Python API. It can be installed with:

pip install git+https://github.com/schwa-lab/dr-apps-python

This library provides tools that exploit Python schema models, or which uses Python to express arbitrary functions (e.g. format, grep-py, split), as well as a tool for easy use of the Python interactive shell to inspect or operate on docrep data (shell).

As this library preceded the tools implemented in libschwa in C++, it also overlaps in the latter's functionality; this may be deprecated over time.

It currently provides the following commands on top of libschwa's:

    dr conll            Writes documents in CONLL format, or a format which
                        similarly lists fields separated by some delimiter.
    dr count-py         Count the number of documents or annotations in named
                        stores.
    dr dump             Debug: unpack the stream and pretty-print it.
    dr format           Print out a formatted evaluation of each document.
    dr generate         Generate empty documents.
    dr grep-py          Filter the documents using an evaluator.
    dr hackheader       Debug: rewrite header components of given documents
                        using Python literal input
    dr ls               List the stores available in the corpus.
    dr offsets          List the byte offset at the start of each document.
    dr shell            Loads the given input file into a Python shell as the
                        variable `docs`
    dr sort             Sort the documents using an evaluator.
    dr split            Split a stream into k files, or a separate file for
                        each key determined per doc.
    dr srcgen           Generate source code for declaring types as
                        instantiated in a given corpus, assuming headers are
                        identical throughout.
    dr subset           Extract documents by non-negative index or slice (a
                        generalisation of head).
    dr upgrade          Upgrade wire format

Use dr <cmd> --help for more details.

About

A collection of Python tools for working with docrep data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages