GitHub - ivanjelinek/pyelly: A multifaceted natural language tool written in Python 2.7.*. This is available now in its v1.0 release.

ivanjelinek / pyelly Public

forked from prohippo/pyelly

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A multifaceted natural language tool written in Python 2.7.*. This is available now in its v1.0 release.

sites.google.com/site/pyellynaturallanguage/

0 stars 2 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
applcn		applcn
forTesting		forTesting
EDtbl.sl		EDtbl.sl
INGtbl.sl		INGtbl.sl
Ntbl.sl		Ntbl.sl
PyEllyManual.pdf		PyEllyManual.pdf
README.txt		README.txt
Stbl.sl		Stbl.sl
Ttbl.sl		Ttbl.sl
bad.main.key		bad.main.key
bad.main.txt		bad.main.txt
chinese.main.key		chinese.main.key
chinese.main.txt		chinese.main.txt
cognitiveDefiner.py		cognitiveDefiner.py
cognitiveProcedure.py		cognitiveProcedure.py
conceptualHierarchy.py		conceptualHierarchy.py
conceptualWeighting.py		conceptualWeighting.py
dateTransform.py		dateTransform.py
definitionLine.py		definitionLine.py
derivabilityMatrix.py		derivabilityMatrix.py
disambig.main.key		disambig.main.key
disambig.main.txt		disambig.main.txt
doTest		doTest
doctor.main.key		doctor.main.key
doctor.main.txt		doctor.main.txt
dumpEllyGrammar.py		dumpEllyGrammar.py
echo.main.key		echo.main.key
echo.main.txt		echo.main.txt
ellyBase.py		ellyBase.py
ellyBits.py		ellyBits.py
ellyBuffer.py		ellyBuffer.py
ellyBufferEN.py		ellyBufferEN.py
ellyChar.py		ellyChar.py
ellyCharInputStream.py		ellyCharInputStream.py
ellyConfiguration.py		ellyConfiguration.py
ellyDefinition.py		ellyDefinition.py
ellyDefinitionReader.py		ellyDefinitionReader.py
ellyException.py		ellyException.py
ellyMain.py		ellyMain.py
ellySentenceReader.py		ellySentenceReader.py
ellySession.py		ellySession.py
ellyStemmer.py		ellyStemmer.py
ellyToken.py		ellyToken.py
ellyWildcard.py		ellyWildcard.py
entityExtractor.py		entityExtractor.py
exoticPunctuation.py		exoticPunctuation.py
extractionProcedure.py		extractionProcedure.py
featureSpecification.py		featureSpecification.py
generativeDefiner.py		generativeDefiner.py
generativeProcedure.py		generativeProcedure.py
grammarRule.py		grammarRule.py
grammarTable.py		grammarTable.py
indexing.main.key		indexing.main.key
indexing.main.txt		indexing.main.txt
inflectionStemmerEN.py		inflectionStemmerEN.py
interpretiveContext.py		interpretiveContext.py
macroTable.py		macroTable.py
morphologyAnalyzer.py		morphologyAnalyzer.py
parseTest.py		parseTest.py
parseTree.py		parseTree.py
parseTreeBase.py		parseTreeBase.py
parseTreeBottomUp.py		parseTreeBottomUp.py
parseTreeWithDisplay.py		parseTreeWithDisplay.py
patternTable.py		patternTable.py
prefixTreeLogic.py		prefixTreeLogic.py
procedureTestFrame.py		procedureTestFrame.py
punctuationRecognizer.py		punctuationRecognizer.py
querying.main.key		querying.main.key
querying.main.txt		querying.main.txt
rest-tbl.sl		rest-tbl.sl
semanticCommand.py		semanticCommand.py
simpleTransform.py		simpleTransform.py
spec-tbl.sl		spec-tbl.sl
stemLogic.py		stemLogic.py
stemTest.py		stemTest.py
stopExceptions.py		stopExceptions.py
substitutionBuffer.py		substitutionBuffer.py
suffixTreeLogic.py		suffixTreeLogic.py
symbolTable.py		symbolTable.py
syntaxSpecification.py		syntaxSpecification.py
test.main.key		test.main.key
test.main.txt		test.main.txt
texting.main.key		texting.main.key
texting.main.txt		texting.main.txt
timeTransform.py		timeTransform.py
treeLogic.py		treeLogic.py
undb-tbl.sl		undb-tbl.sl
vocabularyElement.py		vocabularyElement.py
vocabularyTable.py		vocabularyTable.py

Repository files navigation

PyElly is a rule-based natural language processing tool that has existed
for over forty years in various incarnations. It speeds development of
many kinds of NLP applications by taking care of low-level language
details not central to a given solution. It is now freely available on
the web to people needing to process or pre-process text data.

PyElly provides flexible tokenization, syntax-driven parsing, English
inflectional and morphological stemming, macro substitutions, basic
and extended entity extraction, ambiguity handling, sentence recognition,
support for large external dictionaries, and a general procedural
framework for text translation from UTF-8 to UTF-8.

The latest version has been completely rewritten in mostly object-oriented
Python. It has now passed multiple stages of beta testing in 2014 and may
be downloaded from GitHub at https://github.com/prohippo/pyelly.git . The
current release is v1.0.

To learn how to use PyElly, see the PyEllyManual.pdf file in the same
directory as this README.txt file. The manual has over a hundred pages of
information, including an overview of some basic linguistics. Documentation
of individual Python source files can be generated as needed with the
Python pydoc utility.

At present, PyElly consists of 58 Python modules comprising about 16
thousand lines of source code. There are also various definition files
to support basic English-language capabilities and various sample
applications, including

* indexing - remove stopwords and get stems for content words from raw
             text input.
* texting  - readable text compression.
* doctor   - emulation of Weizenbaum's Doctor program.
* chinese  - basic translation of English to Chinese in simplified
             or traditional characters.
* querying - rewrite English questions as SQL queries for a Soviet
             military aircraft database.
* disambig - disambiguation of phrases with WordNet information.

These illustrate what you can do with PyElly and also serve as a basis for
comprehensive integration testing. Other applications will be added to the
PyElly package on GitHub in the future. You may use them as models for
building your own systems.

PyElly is intended mostly for educational use and is being released under
a BSD license. Be advised that the current software and documentation is still 
evolving, although the v1.0 release should be much more stable than previous
beta releases.

Release Notes:

 0.1    -  25dec2013  initial beta release
 0.2    -  16mar2014  increase number of syntactic categories to 64
                      add storing and reinserting of deleted output buffer text
                      fix bugs in DELETE TO generative semantic command
                      add unit testing input to PyElly distribution
                      save integration testing script doTest properly
                      eliminate inconsistencies in integration testing keys
                      improve output of unit test for generativeProcedure.py
 0.3    -  24apr2014  extend generative semantics to support new applications
                      add UNITE, INTERSECT, COMPLEMENT, UNCAPITALIZE
                      add QUEUE, UNQUEUE, SHOW
                      replace DELETE ALL code
                      make STORE more efficient and generalize, fix bugs
                      allow for initializing of global variables in grammar
                      strengthen unit testing, add querying integration test
 0.4    -  04jul2014  support conceptual hierarchies in cognitive semantics
                      separate lookup tables for syntactic and semantic features
                      fix bugs in loading vocabulary tables from text input
                      fix bugs in loading conceptual hierarchies from text input
                      improve unit testing
                      add core of disambig application for integration testing
 0.4.1  -  13aug2014  clean up and flesh out disambig application
                      fix bugs in cognitive semantics
                      fix bugs in conceptual hierarchies
                      miscellaneous cleanup of Python source files
                      improve unit testing of modules, parse tree dump
 0.5    -  01sep2014  simplify doTest and make parse tree dumps easier to filter
                      add audit on usage of grammar symbols for error checking
                      add version check when loading saved binary language files
                      define ellyException to handle errors in table loading
                      add error messages when generating language tables
                      simplify semantic feature check by generative semantics
                      extend generative semantic unit tests
                      add bad application to test PyElly error reporting
 0.5.1  -  12sep2014  fix residual problems with error reporting and recovery
                      extend bad application for integration testing
 0.6    -  12oct2014  more input checking in vocabulary table compilation
                      more information in disambig application translations
                      better English inflectional and morphological stemming
                      add English irregulars to stemming, update echo application
                      extend chinese application, better classifier assignments
 1.0    -  24dec2014  add comprehensive error reporting in inflectional stemming
                      add WordNet exceptions to cases handled by stemmers
                      upgrade pattern table matching and clean up code
                      fix bug in ellyWildcard with $ wildcard
                      update querying application
                      clean up various problems in chinese applications
                      clean up all modules with PyLint

New versions will reflect major changes in PyElly code. This typically will
require regeneration of any previously saved *.elly.bin files to ensure correct
operation. Changes only to PyElly sample application files, unit testing input
files, or documentation will be made from time to time, but these will leave
version numbers the same.

The PyElly website is at

    https://sites.google.com/site/pyellynaturallanguage/