Skip to content

upenndigitalscholarship/omeka-export

 
 

Repository files navigation

omeka-export

Forked from the omekadd repo by Caleb McDaniel, http://wcm1.web.rice.edu, this repo includes Python scripts designed for use with the Omeka API.

One adds new items to Omeka from the command line using simple YAML documents. The other converts the content returned by GET API requests into a CSV file; it can be used as a sort of backup for Omeka metadata.

The former script has not been edited.

This repo updates the latter script (converting API content into a CSV file) for Python 3 and contains additional code for downloading/renaming files attached to items based on their Omeka ID numbers into the 'files' folder contained in this repository.

omeka-export-one-file.py will only download and document one item per file, while omeka-export-multiple-files.py will download all the items per file.

Installation

Make sure you have downloaded and installed Omeka 2.x so you can use the API features.

Clone this repo:

git clone https://github.com/wcaleb/omekadd.git

Both the omekadd.py and omekacsv.py scripts import the OmekaClient class from omekaclient.py, which is included in the repo. The development repo for the Omeka Client is maintained by Jim Safley, who wrote the original client.

omekadd.py

The Omeka API (available in Omeka 2.1) makes it possible to add new items to a database from the commandline, but to make the API request you have to represent the item in JSON. That is, to add a new item, you would have to construct a string that looks something like this:

{"element_texts": [{"text": "Quinoa McDaniel", "element_set": {"id": 0}, "html": true, "element": {"id": 45}}, {"text": "<p>This item was added to this site using the <a href=\"https://omeka.readthedocs.org/en/latest/Reference/api/index.html\">Omeka API</a>\nand a script called <a href=\"https://github.com/wcaleb/omekadd\"><code>omekadd</code></a>,\nwhich transforms YAML documents into JSON strings suitable for using\nwith the API. As this Description demonstrates, the script can also\nconvert <a href=\"http://daringfireball.net/projects/markdown/\">Markdown</a> into\nHTML for easy formatting. The other text in this item was taken from\nHipster Ipsum, a random generator of Hipsterese.</p>", "element_set": {"id": 0}, "html": true, "element": {"id": 41}}, {"text": "Hipsterese", "element_set": {"id": 0}, "html": true, "element": {"id": 44}}, {"text": "Open", "element_set": {"id": 0}, "html": true, "element": {"id": 47}}, {"text": "<p>Single-origin <em>coffee</em> drinking vinegar Bushwick, Echo Park 90's\nHelvetica McSweeney's. Church-key American Apparel selvage sustainable.\nSustainable mixtape cornhole direct trade church-key Wes Anderson. 8-bit\nsalvia gentrify small batch shabby chic, single-origin coffee occupy\nmessenger bag Helvetica ennui forage wayfarers. Jean shorts 8-bit bitters,\nsustainable retro VHS kogi Tumblr actually photo booth twee Terry\nRichardson squid cred fap. Polaroid Shoreditch keytar, Blue Bottle\nskateboard bitters twee letterpress VHS pour-over Schlitz master cleanse\nfood truck. Wolf pug freegan Tumblr pork belly locavore, Carles distillery\n<strong>stumptown</strong> PBR 8-bit.</p>", "element_set": {"id": 0}, "html": true, "element": {"id": 1}}, {"text": "Plain text, of course", "element_set": {"id": 0}, "html": true, "element": {"id": 42}}, {"text": "ca. 2013", "element_set": {"id": 0}, "html": true, "element": {"id": 40}}, {"text": "<a href=\"http://hipsteripsum.me\">Hipster Ipsum</a>\n", "element_set": {"id": 0}, "html": true, "element": {"id": 48}}, {"text": "Sample item added with omekadd", "element_set": {"id": 0}, "html": true, "element": {"id": 50}}, {"text": "Text", "element_set": {"id": 0}, "html": true, "element": {"id": 51}}], "item_type": {"id": 1}, "featured": false, "public": false, "collection": {"id": null}}

This script makes adding items easier by allowing you to instead represent the new item with a well-formed YAML file, which looks more like this:

---
Title: Sample item added with omekadd
Source: Hipster Ipsum
Publisher: Quinoa McDaniel
Date: ca. 2013
Rights: Open
Format: Plain text, of course
Language: Hipsterese
Type: Text
Description: |
    markdown> This item was added to this site using the Omeka API
    and a script called [`omekadd`](https://github.com/wcaleb/omekadd),
    which transforms YAML documents into JSON strings suitable for using
    with the API. As this Description demonstrates, the script can also
    convert [Markdown](http://daringfireball.net/projects/markdown/) into
    HTML for easy formatting. The other text in this item was taken from
    Hipster Ipsum, a random generator of Hipsterese.
Text: |
    <p>Single-origin <em>coffee</em> drinking vinegar Bushwick, Echo Park 90's
    Helvetica McSweeney's. Church-key American Apparel selvage sustainable.
    Sustainable mixtape cornhole direct trade church-key Wes Anderson. 8-bit
    salvia gentrify small batch shabby chic, single-origin coffee occupy
    messenger bag Helvetica ennui forage wayfarers. Jean shorts 8-bit bitters,
    sustainable retro VHS kogi Tumblr actually photo booth twee Terry
    Richardson squid cred fap. Polaroid Shoreditch keytar, Blue Bottle
    skateboard bitters twee letterpress VHS pour-over Schlitz master cleanse
    food truck. Wolf pug freegan Tumblr pork belly locavore, Carles distillery
    <strong>stumptown</strong> PBR 8-bit.</p>

The script turns the YAML document into the JSON string and posts it to your Omeka database using the API endpoint specified at the top of the script.

Usage

Edit the first two lines of omekadd.py so that they contain your API endpoint and your API key. See the Omeka documentation for help.

To test, make sure that you've changed the permissions on omekadd.py to make it executable, and then try posting the sample.yml to your database:

cd omekadd
./omekadd.py sample.yaml

If it worked, read on to discover more features, like uploading files to attach to the item!

In a text editor, type up your new item as a YAML document. The YAML document must be well-formed. You can validate your YAML here.

The keys in your YAML (the words to the left of the colons) should correspond to the item element texts in the Omeka database. These are the things that you typically see when you fill out the webform in Omeka, like the names of the Dublin Core fields or the Item Type Metadata fields.

Next, run the script, supplying the name of your input YAML file immediately after the script name.

For example, this item in my Omeka database was added using the sample.yaml file in this repository and this command:

./omekadd.py sample.yaml

Alternatively, you can pipe your YAML to the script as stdin instead.

Note: The script will treat every field as though you had clicked "Use HTML" when filling out the webform. That means that so long as you properly escape the HTML, you can include HTML in your YAML file. The easiest way to do this is to use YAML's literal blocks, as shown in the sample file.

Options

By default, omekadd creates an item that is a private, non-featured Document without a collection. But you can change these defaults using the command-line arguments:

  -h, --help            show this help message and exit
  -p, --public          Make item public
  -f, --featured        Make item featured
  -c COLLECTION, --collection COLLECTION
                        Add item to collection COLLECTION
  -t TYPE, --type TYPE  Specify item type using Omeka id; default is 1, for
                        "Document"

Markdown Functionality

If you have the Python markdown module, you can also use Markdown in your YAML. To do this, first install the module:

pip install markdown

Then, in your YAML file, prefix any string that you would like to be converted from Markdown to HTML with a special prefix. By default, as shown in the sample.yaml file, this is set to the string markdown>. However, you can change the prefix mark on the commandline by using the --mdmark option.

Note: If you process a string with Markdown, the output HTML will be wrapped in <p> and </p> tags. That's a feature of Markdown, but it may clutter your item metadata with unwanted tags.

File Uploading

You can also upload a file to attach to your new item using this script. Just use the argument -u, --upload at the command line, followed by the properly escaped name of the file you want to upload. After creating your new Omeka item, the script will get the new item's ID and then upload the new file, associating it with the new item.

omekacsv.py

The second main script in the repo makes a GET request to an Omeka API endpoint and resource and outputs a CSV file of the returned data.

Before running it, edit the lines at the top of the script for your desired endpoint and resource. Then simply execute the script:

./omekacsv.py

The script will print output to your terminal about its progress, and then create, in the same directory, an output CSV whose filename will begin with the resource you chose. You can open this CSV in Microsoft Excel for viewing and editing.

By default, the outputted CSV will only create columns for which at least one row has data, and the file will not contain any empty cells. Instead, the string "None" is put in cells where there was no content returned by the Omeka request. If you wish to output empty cells in such cases, see the comment near the bottom of the script.

Issues

Right now these scripts are mainly proofs-of-concept, so I'm certain there are bugs to be worked out. Please help test them and don't be shy about reporting issues or forking and improving. I've mainly made the scripts to help in my workflow, so I'm sure there are more pythonic ways to do this, and I'd be grateful for tips.

About

Python scripts for interacting with Omeka API via YAML and CSV

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%