Skip to content

vishalbelsare/unicorn

 
 

Repository files navigation

Unicorn

Unicorn is a python-based web framework used for exploratory analysis of text corpora. Unicorn leverages many existing open source software projects to ingest documents, extract information, provide full-text search, and visually display relevant content.

Quick Start Installation with Vagrant

You can get up and running with unicorn very quickly using Vagrant and the provided Vagrantfile to provision an Ubuntu Trusty Tahr virtual machine with all of the necessary software:

$ git clone https://github.com/giantoak/unicorn.git
$ cd unicorn
$ vagrant up # this will take a while, as vagrant will run the install script
$ vagrant ssh
> cd unicorn
> ./start_unicorn.sh

In your browser, go to http://localhost:5000/unicorn and use the default username (admin) and password (admin) to browse the default data: a small collection of state-department cables.

Prerequisites

The installer assumes that you have a Linux/BSD environment, sudo permissions, bash, an access to the apt-get package manager - in essence, Ubuntu. You should be able to swap out apt-get for your package manager of choice in order to make it friendly to other unices.

You should also have the following packages installed:

Customizing what's installed.

install.sh. will install several pieces of helper software

If you want to tweak any of these settings, you can do so by editing install.sh. Detailed instructions for doing so go beyond the scope of this README.

The default run and Postgres configurations

If install.sh doesn't see a copy of app/config.py, it will create one from 'app/config.py.default'). In doing so it will set a default username and password of admin and admin for Unicorn and use the default instance of Postgres for storage.

The default MySQL configuration

When installing MySQL, install.sh will set a a default password of geodict_root for root. If you want to use a different password you can. However, you will need to update the password stored in geodict_config.py in the cloned copy of geodict that install.sh will create.

The default Elasticsearch index

The database comes pre-loaded with 1,000 historical documents from the National Archives for demonstration purposes. If you want to clear them out, run:

> curl -XDELETE "http://localhost:9200/dossiers/"
> curl -XPUT "http://localhost:9200/dossiers/"
> curl -XPUT "http://localhost:9200/dossiers/_mapping/attachment" -d @dossiers_mapping.json

PDF Viewer

Unicorn requires a browser plug-in to render PDFs called PDF Viewer. It's available here for Google Chrome and here for Firefox. Unicorn should work natively with Safari.

License

Unicorn is under ongoing development and is freely available for download under The MIT License (MIT) open source licensing. Unlike GNU General Public License (GPL), MIT freely permits distribution of derivative work under proprietary license, without requiring the release of source code.

Acknowledgements

This project was funded by DARPA under part of the XDATA program.

About

Visualization and summarization of a collection of documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 52.5%
  • HTML 42.6%
  • CSS 3.9%
  • Other 1.0%