Copyright: © 2014 Computer- and Media Service, Humboldt-Universität zu Berlin License: Apache Licence 2.0 Status: Draft Version: 1.0 Authors: Dennis Zielke, Computer- and Media Service, Humboldt-Universität zu Berlin, Tino Schernickau Computer- and Media Service, Humboldt-Universität zu Berlin
The concept of this repository infrastructure software for a particular sub-discipline of linguistics (historical corpus linguistics) is modular. The modules of the repository correspond to the application features presentation, storage, indexing and retrieval, as well as management of Persistent Identifier (PID).
The LAUDATIO-Repository is available here: http://www.laudatio-repository.org
Version 3.6
Installation
apt-get install apache2
apt-get install php5, php5-mysql, php5-gd, php5-curl, php5-xsl
apt-get install mysql-server
apt-get install tomcat6
apt-get install imagemagick
Requirements
Download Java JDK 6
http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u27-download-440405.html
Installing Java JDK 6
Changing Enviroment Variables FEDORA_HOME - C:fedora PATH - %FEDORA_HOME%serverbin;%FEDORA_HOME%clientbin;%JAVA_HOME%bin
Firewall settings
ufw default deny
ufw allow ssh
ufw status
ufw allow proto tcp from any to any port 80
ufw allow proto tcp from any to any port 8080
ufw enable
Please follow the instructions of custom installation ...
MySQL database
Please note that the MySQL JDBC driver provided by the installer requires MySQL v3.23.x or higher.
The MySQL commands listed below can be run within the mysql program, which may be invoked as follows:
mysql -u root -p
Create the database. For example, to create a database named “fedora3”, enter:
CREATE DATABASE fedora3;
Set username, password and permissions for the database. For example, to set the permissions for user fedoraAdmin with password fedoraAdmin on database “fedora3”, enter:
GRANT ALL ON fedora3.* TO XXX@localhost IDENTIFIED BY ‘XXX’;
GRANT ALL ON fedora3.* TO XXX@’%’ IDENTIFIED BY ‘XXX’;
REST-API
We use fedoras REST API via CURL requests to store and access the teiHeaders and some other Files.
Additionally fedora can be administrated via the Fedora Web Administrator e.g. at http://localhost:8080/fedora/admin/.
Further information you can find here:
https://wiki.duraspace.org/display/FEDORA34/Fedora+Web+Administrator
Further details e.g. to fedora data model or which files we are stored in fedora can you find at here: http://www.laudatio-repository.org/repository/technical-documentation/software/fedora.html#files
We are using CakePHP, a PHP MVC Framework, in Version 2.4.5 For installation details see: http://book.cakephp.org/2.0/en/installation.html
Model/Controller
XMLObject
This is the main model of the laudatio-repository website. It represents the teiHeader by providing methods to manage the data stored in Fedora Commons.
User
User, Group, LDAP, CorpusCreator Controllers and Models are used to manage access rights to modify, view or delete corpora.
Configuration
The Configuration Controller provides methods for the repository administrators. That includes the configuration of the used scheme for uploading corpora, the view configuration, a form to upload new schemes, buttons to reindex all corpora in elasticsearch and to update all handle pids.
Version 0.9
Installation details can be found here:
API
The ElasticSearch API is online available at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html
Testing with ElasticSearch HEAD
ElasticSearch HEAD tool is online available at https://github.com/mobz/elasticsearch-head. It’s a web front end for browsing and interacting with ElasticSearch.
Indexing
The teiHeaders are xml files, but ElasticSearch uses JSON as data format. We use Java Bridge, a tomcat webapp, to convert the xml files to json.
The Java Bridge uses the XSLT Processor `SAXON http://saxon.sourceforge.net/’_ to convert the xml files. The xsl-file is located at /var/www/xsltjson/conf/.
Further details to indexing and mapping find here:
http://www.laudatio-repository.org/repository/technical-documentation/software/elasticsearch.html
Actually, we use the mapping Version 7 (https://github.com/DZielke/LAUDATIO-IndexMapping/tree/master/Schema7) to map all fields to the string type but the field that are used for range filters and use not analyzed multifields for the facets.
API
We are using the API provided by the EUROPEAN Persistent Identifier Consortium (EPIC) in Version 2.
Download and Installation details are here:
https://github.com/CatchPlus/EPIC-API-v2/wiki/Core-API
Handles are created via POST requests to http://pid.gwdg.de/handles/11022. The post data has the following structure: [{"type":"URL","parsed_data":"'.$objectURL.'","encoding":"xml"}]
Handle updates require a PUT request to http://pid.gwdg.de/handles/11022/”.$handle with the following data: '[{"type":"URL","parsed_data":"'.$objectURL.'"}]'
To handle deletion a DELETE request to http://pid.gwdg.de/handles/11022/”.$handlePID is sufficient.