Upload and modify in mass for Internet Archive :
- Upload several new items in Internet Archive with their metadata. Items are texts & images.
- Add or modify metadata to existing items.
> git clone https://github.com/SciencesPoDRIS/iamassaccess.git
> cd iamassaccess
> mkvirtualenv iamassaccess
> pip install -r requirements.txt
> cp server/conf/conf.default.json server/conf/conf.json
- Edit server/conf/conf.json to put your own access key which you get, once connected to Archive.org with your login from : http://archive.org/account/s3.php
> python iamassaccess_cli.py MODE [--metadata METADATA] [--folder FOLDER]
- where "MODE" can be either create or update.
- where "METADATA" is the path to the metadata file. Please see below how to write it.
- where "FOLDER" is the path to the folder containing the items. Please see below how to build it.
- ❗ Caution ❗ : if you use MODE create, you don't have to use METADATA. The script will look for a file
metadata.csv
into your folder.
> python iamassaccess_cli.py create --folder path/to/test/folder
> python iamassaccess_cli.py update --metadata path/to/metadata/file.csv
- The folder pointed by FOLDER arg in your command line. The folder's name can be whatever you want.
- One subfolder by item that you want to upload. The subfolder's name has to be the unique archive.org identifier see below. You can have as many subfolder as you want.
- The files belonging to your item. You can have as many files as you want for this specific item. Formats accepted by archive.org are jpg, jpeg, jpeg2000, pdf. The file's name can be whatever you want.
- Example with a PDF file.
- Same
- This is the file containing all the metadata. The file's name has to be metadata.csv. Please see below how to write it.
- The metadata has to be a CSV file.
- Data are separated by commas
,
. If your data contains a comma,
, it has to be surrounded by double quotes"
. - The first line has to be the list of the metadata keys (called headers).
- The metadata keys are not case sensitive.
- The metadata keys should not contain space or accent.
- The first column has to be the identifiers of the Internet Archive items. See below.
- Warning, if several lines in the metadata file have the same identifier, only the last line will be taken into consideration. Important : About identifiers
- The metadata values are case sensitive.
- If your metadata values is multi-valuated, the whole values has to be surrounded by double quotes :
"value_01;value_02"
. - For the "subject" metadata key, multiple values have to be separated by a semicolon
;
. - For the "date" metadata, the values have to be formatted as
YYYY
,YYYY-MM
orYYYY-MM-DD
. - To send a specific item into a collection, just add the column "collection" to your metadata file and specific the collection name.
> python server/server.py
Then the url of the server will be http://localhost:5000/ (Flask default one).
> cd front
> python -m SimpleHTTPServer
Then the url of the site will be http://localhost:8000.
http://internetarchive.readthedocs.io/en/latest/
https://blog.archive.org/2013/07/04/metadata-api/
- Archive identifier has to be UNIQUE on whole Internet Archive (strange but real) !!!
- Archive identifiers are case sensitive.
- You can't name your Archive identifier 'idX' where 'X' is in an integer.
- On InternetArchive, if you create at least 50 item you can have a collection for them. Just contact us then and we'll create it for you. Please send your request to info at archive dot org API
- An identifier is composed of any unique combination of alphanumeric characters, underscore (_) and dash (-).
- While there are no official limits it is strongly suggested that identifiers be between 5 and 80 characters in length.
- It seems that in a "create" mode, the metadata are not correctly sent. Only an update is reliable. So please first send the pictures in a "create" mode, then send the metadata in an "update" mode.