Tools for organizing your PDFs are restrictive and proprietary (in the sense that information is kept hostage inside the software).
Article advocates
- use file system: helps avoid the “proprietary data problem”
- flat structure: keeping a single (flat) directory of all your papers
- tags over folders: well-known problem we all faced before gmail.
- conventions: naming conventions, tagging conventions
- simple scripts: use many simple programs rather large monolithic programs.
skid
stores all documents in a single directory called “the cache”. Each file
in the cache has a directory associated with it where metadata will live
(e.g. user notes, bibliographic information, text extraction). These directories
are named after the cached document with a simple .d
suffix (for example,
cached document Cookbook.pdf
has a directory Cookbook.pdf.d/
). The most
important file living in this directory is notes.org
. This file is where the
user annotates documents with bibliographic metadata (e.g. title, author, year),
and personal notes.
With the basic setup we can quick write tools for maintaining, accessing, and analyzing the repository of documents.
See also: a great post by GradHacker: Towards Better PDF Management with the Filesystem.
Manually install dependencies
$ apt-get install pdftotext
Install remaining python deps and this package
$ git clone git@github.com:timvieira/skid.git $ cd skid $ pip install -e .
Add your first document
$ skid add https://github.com/timvieira/skid
This will fire up your text editor with a new notes.org
file. Write some notes
about the file, which will help you find it later or remember why you bookmarked
it to begin with!
Update the skid index so that this document will show up in searches
$ skid update $ skid search timvieira
Further reading:
$ skid [command] -h
Add the following to bash configuration:
function _optcomplete { COMPREPLY=( $( \ COMP_LINE=$COMP_LINE COMP_POINT=$COMP_POINT \ COMP_WORDS="${COMP_WORDS[*]}" COMP_CWORD=$COMP_CWORD \ $1 ) ) } complete -F _optcomplete skid
Enable org-mode link directive (`skid:`), add the following to emacs configuration:
;;------------------------------------------------------------------------------ ;; Support for skid links ;; ;; USAGE: ;; ;; The link directive 'skid' ;; ;; [[skid:author:"Jason Eisner"][Jason Eisner]] ;; ;; Programmatic ;; ;; (skid-search "Jason Eisner tags:thesis") ;; ;; org-mode reference http://orgmode.org/org.html#Adding-hyperlink-types (require 'org) (org-add-link-type "skid" 'skid-search) (defun skid-search (query) "skid tag search." (interactive) (switch-to-buffer (make-temp-name "Skid")) (insert (shell-command-to-string (concat "python -m skid search --format org --limit 0 " query))) (beginning-of-buffer) (org-mode)) ;;------------------------------------------------------------------------------