Skip to content
This repository has been archived by the owner on Dec 5, 2018. It is now read-only.

chrisseto/Archiver

Repository files navigation

#Archiver Travis Coverage Status

###Work flow

  • Server sends properly formatted json
  • Json is parsed by the foreman
  • Job is passed to rabbitmq
  • 201 (created) and the container id are sent back to the server
  • A celery worker begins the archival process
  • Project is chunked up even more
  • On completion a callback is fired and the celery worker pings the foreman

###Vocabulary

  • Container
    • Any given osf project that is not a registration
  • Registration
    • A "frozen" osf project
  • Foreman
    • The controlling Web application
  • Worker
    • The celery worker
  • Service
    • An arbitrary 3rd party service

###External Facing API

  • /api/v1/archives/
    • POST
      • Begins the archival process described by the posted json
    • GET
      • Returns a list of all Archives.
        {
            containers: {
                ...
            }
        }
        
  • /api/v1/archives/callbacks
    • POST
      • Listed Because this route is externally available but is for internal use only
  • /api/v1/archives/<CID>/
    • GET
      • Returns all metadata for the container CID
  • /api/v1/archives/<CID>/files/
    • GET
      • Returns a list of files in container CID
  • /api/v1/archives/<CID>/files/<FID>
    • GET
      • Returns the file FID either as a redirect or direct download

###Registration structure

registration will have directory structure as such: (subject to change)

Directory Structures/
    {See Below}
File Metadata/
    {some sha256}.json
    {some sha256}.par2.json
Files/
    {some sha256}
Manifests/
    {some container id}.manifest.json
    {some container id}.{some 3rd party service}.manifest.json
Parities/
    {some sha256}.par2
    {some sha256}.vol00+xxx.par2

The directory structure of Directory Structures is as follows

{some container id id}/
    manifest.json
    children/
        {child id}/
            {container}
    github/
        {repo name}/
            {repo contents}
    s3/
        {bucket name}/
            {bucker contents}
    figshare/
        {id}/
            {figshare contents}
    dropbox/
        {folder}/
            {folder contents}

###Setting up Archiver to run locally

  • mv group_vars/archiver.example group_vars/archiver
  • Fill out archiver with the proper information
    • Minimally your S3 keys and bucket name
  1. Change directorys to vagrant
  2. vagrant up
  3. invoke provision
  4. cd ..
  5. invoke notebook
  6. From here you will need the API key from whatever service you wish to archiver.
  7. Fill out the cell defining container and run the notebook
  8. ???
  9. profit

###Caveats

  • The list_directory method of libcloud is rediculously slow; it does not support serverside filtering like s3 and rackspace otherwise do