HAL (Highlighting Assistant Legend) will be designed to make highlighting thousands of documents easy. HAL has a number of components:
-
Highlighter - Allow user to highlight any webpage and save the results to Annotator Store. This will include code to pull useful data out of the Annotator Store.
-
Mirror (edgar) - Make it easy to construct a local mirror of an external website. This will include code to pull pages of interest from Edgar and keep track of them in the database.
-
Learner - Built with NLTK and scikit-learn, let's see how well we can teach a computer to highlight documents.
Here's how I install HAL:
git clone git@bitbucket.org:amarder/hal.git
make install
The Makefile
assumes the conda package manager for python is available on your machine.
If you want to create, read, update, or delete highlights a few processes need to be running:
- Elasticsearch,
- Annotator Store,
- This Django project.
To fire up all three processes, I use the following command:
make start
And to shut everything down I use:
make stop
-
Admin action to sync highlights from Elasticsearch into PostgreSQL. Suppose there are multiple highlights tagged for the same director. How should I go about combining them into one biography? Let's go for the easiest solution and join them in the order they were created.
pull_highlights
-
Create an admin page so RAs can mark whether directorships are mentioned. This will likely use an inline admin so they can see the text of the biography and mark which directorships are mentioned.
Director
- individualDirectorship
- individual x filingDisclosure
- individual x filing x company x disclosed? -
Need to post this to Amazon Web Services.
Right now my random page could bring up a filing that may have been highlighted already.
Other + none tags Tag companies in bio segments not bios Include director_id and equilar_id in tags
Move Postgres to Linode Add director-id to existing entries in elastic search
Nertagger of all bio segments
public | companies | table | postgres public | crosswalk | table | postgres
public | equilar_proxies | table | postgres public | director | table | postgres
public | matched_director_ids | table | postgres public | mirror_biographysegment | table | postgres public | mirror_filing | table | postgres