###Work flow
- Server sends properly formatted json
- Json is parsed by the foreman
- Job is passed to rabbitmq
- 201 (created) and the container id are sent back to the server
- A celery worker begins the archival process
- Project is chunked up even more
- On completion a callback is fired and the celery worker pings the foreman
###Vocabulary
- Container
- Any given osf project that is not a registration
- Registration
- A "frozen" osf project
- Foreman
- The controlling Web application
- Worker
- The celery worker
- Service
- An arbitrary 3rd party service
###External Facing API
/api/v1/archives/
- POST
- Begins the archival process described by the posted json
- GET
- Returns a list of all Archives.
{ containers: { ... } }
- Returns a list of all Archives.
- POST
/api/v1/archives/callbacks
- POST
- Listed Because this route is externally available but is for internal use only
- POST
/api/v1/archives/<CID>/
- GET
- Returns all metadata for the container CID
- GET
/api/v1/archives/<CID>/files/
- GET
- Returns a list of files in container CID
- GET
/api/v1/archives/<CID>/files/<FID>
- GET
- Returns the file FID either as a redirect or direct download
- GET
###Registration structure
registration will have directory structure as such: (subject to change)
Directory Structures/
{See Below}
File Metadata/
{some sha256}.json
{some sha256}.par2.json
Files/
{some sha256}
Manifests/
{some container id}.manifest.json
{some container id}.{some 3rd party service}.manifest.json
Parities/
{some sha256}.par2
{some sha256}.vol00+xxx.par2
The directory structure of Directory Structures is as follows
{some container id id}/
manifest.json
children/
{child id}/
{container}
github/
{repo name}/
{repo contents}
s3/
{bucket name}/
{bucker contents}
figshare/
{id}/
{figshare contents}
dropbox/
{folder}/
{folder contents}
###Setting up Archiver to run locally
mv group_vars/archiver.example group_vars/archiver
- Fill out archiver with the proper information
- Minimally your S3 keys and bucket name
- Change directorys to vagrant
vagrant up
invoke provision
cd ..
invoke notebook
- From here you will need the API key from whatever service you wish to archiver.
- Fill out the cell defining
container
and run the notebook - ???
- profit
###Caveats
- The
list_directory
method of libcloud is rediculously slow; it does not support serverside filtering like s3 and rackspace otherwise do