IEPY

IEPY is a framework for doing information extraction on unstructured documents. It uses partially supervised machine learning techniques (i.e., there's a human helping the application, but the application generalizes what the human does and learns).

Typical applications have a set of text documents as input (for example a Wiki, or a database of patent applications). Those documents refer to some entities of different kinds (for example “Albert Einstein” may be an entity of kind “person” and “physicist” is an entity of kind “profession”). The application defines the relevant entity kinds and the relations between those kinds to extract (for example “person HAD-PROFESSION profession”). The other input required is a set of seed facts, which are facts known to be true, for example “Albert Einstein HAD-PROFESSION physicist”.

From that information, a IEPY application is able to find other examples of the relation in the input documents, between the provided entities (example: “Albert Einstein HAD-PROFESSION patent clerk”) or even between other unrelated entities (example: “Ernest Hemingway HAD-PROFESSION writer”). These extracted facts are also tagged with fragments of the original documents that are evidence of the fact (example: “In late 1919 Ernest Hemingway began as a freelancer, staff writer, and foreign correspondent for the Toronto Star Weekly.”). During the extraction process, a person helps the system by replying yes/no to questions of the form “Does this text fragment reflect this other fact?”

Installation

Please check docs/install.rst

Documentation

Available at http://iepy.readthedocs.org/en/latest/.

Contact Information

Rafael Carrascosa <rcarrascosa@machinalis.com> (rafacarrascosa at github)

Franco M. Luque <francolq@famaf.unc.edu.ar> (francolq at github)

Javier Mansilla <jmansilla@machinalis.com> (jmansilla at github)

Daniel Moisset <dmoisset@machinalis.com> (dmoisset at github)

You can follow the development of this project and report issues at http://github.com/machinalis/iepy

Licensing

This project has a BSD license, as stated in the LICENSE file.

Changelog

No stable releases yet. Coming soon.

The project is currently working, it has good testing coverage and a working example. We're still missing some API cleanup, documentation, packaging and a couple of large bugfixes.

Name		Name	Last commit message	Last commit date
Latest commit History 1,279 Commits
docs		docs
examples		examples
experimentation		experimentation
iepy		iepy
scripts		scripts
tests		tests
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
LICENSE.Metrics.txt		LICENSE.Metrics.txt
LICENSE_details.txt		LICENSE_details.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.py		setup.py

License

theblueskies/iepy

Folders and files

Latest commit

History

Repository files navigation

IEPY

Installation

Documentation

Contact Information

Licensing

Changelog

About

Resources

License

Stars

Watchers

Forks