Aleph

Document-driven investigative tools

This is a collection of tools for ingesting, normalizing, indexing and tagging documents in the context of a journalistic investigation.

These tools are intended to be complementary to existing platforms such as DocumentCloud and analice.me.

Use cases

As a journalist, I want to store a list of documents that mention a person/org/topic so that I can sift through the documents.
As a journalist, I want to intersect sets of documents that mention people/orgs/topics so that I can drill down on the relationships between them.
As a journalist, I want to combine different types of facets which represent document and entity metadata.
As a data importer, I want to routinely crawl and import documents from a data source.
As a data importer, I want to associate metadata with documents and entities to allow advanced facets.

Basic ideas

An entity (such as a person, organisation, or topic) is always a search query; each entity can have multiple actual queries associated with it by means of aliases (tags?).
Documents can be anything, and there is no guarantee that dit will be able to display it - just index it. Document display is handled by DocumentCloud etc.
Documents matching an entity after that entity has been created yield notifications if a user is subscribed.

Existing tools

Installation

dit uses textract, which has external (i.e. non-Python) dependencies. See the install guide.

apt-get install python-dev libxml2-dev libxslt1-dev antiword poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox

License

aleph is licensed under a standard MIT license (included as LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
aleph		aleph
data/list_fixtures		data/list_fixtures
.bowerrc		.bowerrc
.gitignore		.gitignore
DESIGN.md		DESIGN.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bower.json		bower.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aleph

aleph

data/list_fixtures

data/list_fixtures

.bowerrc

.bowerrc

.gitignore

.gitignore

DESIGN.md

DESIGN.md

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

bower.json

bower.json

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Aleph

Use cases

Basic ideas

Existing tools

Installation

License

About

Releases

Packages

Languages

License

nightsh/aleph

Folders and files

Latest commit

History

Repository files navigation

Aleph

Use cases

Basic ideas

Existing tools

Installation

License

About

Resources

License

Stars

Watchers

Forks

Languages