Tranmogrifier pipelines for the analysis and export of Plone content from the Oxfam America website.
Based on collective.transmogrifier and related tools.
This package provides a transmogrifier export pipeline which will dump the field values of the schema of all of the items in the oxfam america website into a set of CSV files, organized by object type. Each type of object which has a schema will have its own csv file. Each file will have a header row which lists the fields output in alphabetical order. Each row after the header will contain the values for the given fields, if any, present on a single object within the site. In addition to schema fields, the path to the object within the plone site and the UID and UUID (if present) will be provided.
For file fields (images, files, etc), the value will be a simple string indicating that the field is a binary file field. No file data will be contained in the csv export. File data will be exported separately (see below).
For reference fields, the type and UID of the referenced object(s) will be listed. This information can be used to look up the referenced object. Simply find the row with the given UID value in the csv file named for the type.
You can control which fields are output (see Controlling the Pipeline below).
All binary file data (images, media, .pdf, .doc, etc.) will be written out to a filesystem structure that mirrors the Plone site structure. Each file will be located at the same path where it is found in the site. The sole exception to this is if a content object has more than one binary file field, in which case a folder named for the Plone object will be created and each of the files will be written to that folder.
As this package is unreleased, installing it requires the mr.developer buildout extension.
First, include the package in the [sources]
section of your buildout. The source is in a private repository, so unless you have ssh authentication for git set up on the machine where you are working, you'll need to use HTTP authentication:
oxfama.transmogrifier = git https://github.com/oxfamamerica/oxfama.transmogrifier.git
Next, include the package name in the eggs =
section of the buildout:
eggs =
...
oxfama.transmogrifier
Once this is done, you can re-run buildout. You should also see collective.transmogrifier
and quintagroup.transmogrifier
installed as dependencies of this package.
This package does not provide an installable Generic Setup profile, so you should not expect to see it listed either in the Add-ons panel in Plone Site Setup or in the list of available add-ons when you add a new Plone site.
To verify that the package is properly installed, go to the ZMI (Zope Management Interface). Find and click on the portal_setup
tool and then click on the import
tab. Look for the Oxfam America Site Dump
profile listed in the drop-down list of available Profiles or Snapshots at the top.
To run the export dump, you'll need to use the portal_setup
as described above.
- Go to the ZMI
- Find and click on the
portal_setup
tool - Click on the
Import
tab - Find and select the
Oxfam America Site Dump
profile in theSelect Profile or Snapshot
dropdown. - Find the
Run transmogrifier pipeline
step in the list of available import steps. Click the checkbox to select it. - At the bottom of the page, unselect the
Include dependencies
checkbox. - Click the
import selected steps
button.
The export will run for some time. You can see progress in the terminal if you are running the site in fg
mode.
After all items are dumped, the CSV files will be written to the destination
provided in the pipeline configuration (see Controlling the Pipeline below)
There are a number of settings you can control for the pipeline. These controls are available by editing the content_to_csv.cfg
configuration file in this package. This file is located in oxfama/transmogrifier/export
.
The file is organized into a series of sections, each delineated by a [name]
in square brackets.
The [transmogrifier]
section provides a list of the pipeline sections that make up the export pipeline as well as the local_destination
which should be an absolute filesystem path to the folder where the CSV output files will be written:
[transmogrifier]
# configure pipeline and other required information
pipeline =
source
schemadumper
writer
logger
local_destination = /absolute/path/to/folder
It is vital to ensure that the user running the Plone process have write access to the destination folder. If no local destination is provided, the system will attempt to use /tmp
.
The [source]
section provides settings for the SiteWalkerSection
that is used to walk the object graph of the site and read the contents. You can provide this section a starting path, if you wish to focus only on one section of the website. This path should be absolute, based from the ZMI (/
). The object identified by the path will be included. The walk can be limited using the limit
setting. At most this number of objects will be dumped. If the setting is omitted, or set to 0, all objects will be dumped:
[source]
blueprint = oxfama.transmogrifier.sitewalker
start-path = /oxfam
limit = 1000
The [schemadumper]
section controls the way in which the schema of objects is converted to a csv row. You may provide a list of fields to exclude. If the named field exists in an object's schema, it will be left out of the final csv file for that type. This can be used to control the volume of data dumped:
[schemadumper]
blueprint = oxfama.transmogrifier.schemadumper
exclude =
text
description
file
image
The [filewriter]
section controls the writing of binary file data to the filesystem. The path
variable (which defaults to the local_destination
value set above) determines the root filesystem path on the server to which all files will be written. This folder must exist and be writable by the user running Plone. No default location will be set if this value is omitted. The context
variable determines how the files will be exported. Leave this set to directory
to have all files written to the filesystem.
The [writer]
section controls the writing of csv files to the filesystem. There are no settings currently available for this section.
The [logger]
section controls the writing of log output for each object as it is passed through the pipeline. You can provide a list of the keys for each item that will be written to the log line.
To omit any section, simply remove its name from the pipeline
setting in the [transmogrifier]
section of the configuration file.