`libfnl`™

Introduction

libfnl is an API and CLI facilitating data and text mining by providing a collection of easy-to-use tools. The library is designed to work with Python 3 (only). It is specifically tuned towards mining biomedical/scientific texts, but can be used in other contexts if need be, too. It is a complementary piece in the gnamed gene name repository daemon and the medic PubMed mirroring tool collection. In addtion, an (orphan) couchpy repository could provide a document storage facility.

The library contains the following packages:

fnl.nlp: tools to linguistically analyze text (tokenization, PoS tagging, phrase chunking, entity detection); modules to segment sentences (based on NLTK), and map text (strings) to entries in dictionaries this includes a Python wrapper for the GENIA Tagger, a Python wrapper for the NER Suite, and a handler for the GENIA corpus; furthermore, via NLTK 's wrapper for MegaM, a Maximum Entropy classifier is available, too;
fnl.stat: a module to evaluate inter-rater Kappa scores and a module to develop text classifiers based on Scikit-Learn
fnl.text: wrappers to work with text data (strings, tokens, segments, annotations, etc.)
fnl.utils: additional utilities and tools (currently, just for handling JSON)
scripts: the CLI scripts to manage data/text, representing the main value provided by this collection

The script directory provides the following command-line interfaces:

fnlclassi generate a classifier for [NER-tagged] text using Scikit-Learn.
fnlcorpus store corpora in JSON format in a CouchDB.
fnldgrep "grep" for tokens using a dictionary.
fnldictag tag semantic tokens from a dictionary in linguistically annotated text.
fnlgpcounter count gene/protein symbols in MEDLINE.
fnlkappa calculate inter-rater agreement scores.
fnlsegment segment text into sentences using NLTK (PunktSentenceTokenizer).
fnlsegtrain train a nltk.punkt.PunktSentenceTokenizer.
fnltok a fast, pure-Python, Unicode-aware string tokenizer.

Warning

This project is under "continuous development", better take your own snapshot.

Requirements

Python 3.2+
Numpy, SciPy, and Scikit-Learn 0.14+ (for fnlclassi)
NLTK 3.0+ (for the sentence segmenting tools fnlseg*)
DAWG (for fnlgpcounter; see Installation below)

Optional projects that work together with this project:

GENIA Tagger (optional, latest version)
NER Suite (optional, latest version, in turn requires CRF Suite)
MegaM - a MaxEnt classifier for NLTK with a (fast) L-BFGS optimizer
gnamed for creating gene/protein name repositories
medic for mirroring and handling PubMed citations
txtfnnl natural language processing tools based on Apache OpenNLP and UIMA

Installation

Into a Python 3 virtual environment:

pip install virtualenv # if virtualenv is not yet installed
git clone git://github.com/fnl/libfnl.git libfnl
virtualenv libfnl
cd libfnl
. bin/activate
pip install argparse # for python3 < 3.2
pip install numpy # because installing scipy fails if numpy isn't installed already
pip install -e . # installs all other dependencies

# if you prefer to install all other dependencies manually
# and/or prefer to use setup.py instead of pip:
# python setup.py install
pip install sqlalchemy
pip install sklearn
pip install matplotlib
pip install nltk --pre # to get 3.0

# if you want to install the test environment:
pip install pytest

# special steps to install DAWG
git clone git@github.com:fnl/DAWG.git
cd DAWG
python setup.py install
cd ..

License

All parts of this library are licensed under the GNU Affero GPL v3

See the attached LICENSE.txt file.

Name		Name	Last commit message	Last commit date
Latest commit History 368 Commits
doc		doc
scripts		scripts
src/fnl		src/fnl
var		var
.gitignore		.gitignore
CHANGES.txt		CHANGES.txt
LICENSE.txt		LICENSE.txt
README.rst		README.rst
profile		profile
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

scripts

scripts

src/fnl

src/fnl

var

var

.gitignore

.gitignore

CHANGES.txt

CHANGES.txt

LICENSE.txt

LICENSE.txt

README.rst

README.rst

profile

profile

setup.py

setup.py

Repository files navigation

`libfnl`™

Introduction

Requirements

Installation

License

Copyright

About

Releases

Packages

Languages

License

fnl/libfnl

Folders and files

Latest commit

History

Repository files navigation

libfnl™

Introduction

Requirements

Installation

License

Copyright

About

Resources

License

Stars

Watchers

Forks

Languages

`libfnl`™