VIVO2NotLD (not Linked Data)

[]

VIVO2NotLD provides tools to convert RDF that conforms to the VIVO-ISF Ontology to a more simplified form encoded in JSON, XML, YAML, or other. VIVO2NotLD can be used to create simplified form of individuals (e.g., a person) or lists (e.g., a list of people with positions in a specified department).

The goal of providing the simplified form is to make VIVO data more readily consumable by other applications. In particular, it lowers the barrier to re-use of VIVO data by:

structuring the data in a less complex way that does not require understanding VIVO-ISF ontology.
does not require understanding RDF, SPARQL, or other semantic web technologies.

Try it at http://gw-orcid2vivo.wrlc.org/vivo2notld

Commandline

python vivo2notld.py -h
usage: vivo2notld.py [-h] [--format {json,yaml,xml,nt,pretty-xml,trix}]
                     [--indent INDENT] [--file FILE] [--debug]
                     [--offset OFFSET] [--limit LIMIT]
                     {person_summary,person,document_summary,organization_summary,person_summary_with_positions_in}
                     subject_namespace subject_identifier endpoint username
                     password

positional arguments:
  {person_summary,person,document_summary,organization_summary,person_summary_with_positions_in}
  subject_namespace     For example, http://vivo.gwu.edu/individual/
  subject_identifier    For example, n115
  endpoint              Endpoint for SPARQL Query of VIVO instance,e.g.,
                        http://localhost/vivo/api/sparqlQuery.
  username              Username for VIVO root.
  password              Password for VIVO root.

optional arguments:
  -h, --help            show this help message and exit
  --format {json,yaml,xml,nt,pretty-xml,trix}
                        The format for serializing. Default is json.
  --indent INDENT       Number of spaces to use for indents.
  --file FILE           Filepath to which to serialize.
  --debug               Also output the query, result graph, and python
                        object.
  --offset OFFSET       Offset for lists.
  --limit LIMIT         Limit for lists.

For example:

python vivo2notld.py person http://vivo.gwu.edu/individual/ n115 http://192.168.99.100:8080/vivo/api/sparqlQuery vivo_root@gwu.edu password --debug --file=test.json

Web application

python vivo2notld_service.py -h
usage: vivo2notld_service.py [-h] [--debug] [--port PORT]
                             [--format {xml,yaml,json}] [--endpoint ENDPOINT]
                             [--username USERNAME] [--password PASSWORD]
                             [--namespace NAMESPACE] [--identifier IDENTIFIER]
                             [--list-namespace LIST_NAMESPACE]
                             [--list-identifier LIST_IDENTIFIER]
                             [--definition {person_summary,person,document_summary,organization_summary}]
                             [--list-definition {person_summary_with_positions_in}]
                             [--limit LIMIT] [--offset OFFSET]

optional arguments:
  -h, --help            show this help message and exit
  --debug
  --port PORT           The port the service should run on. Default is 5000.
  --format {xml,yaml,json}
                        The format for serializing. Default is json.
  --endpoint ENDPOINT   Endpoint for SPARQL Query of VIVO instance,e.g.,
                        http://localhost/vivo/api/sparqlUpdate.
  --username USERNAME   Username for VIVO root.
  --password PASSWORD   Password for VIVO root.
  --namespace NAMESPACE
                        Namespace for the subject. Default is
                        http://vivo.mydomain.edu/individual/.
  --identifier IDENTIFIER
                        Identifier for the subject, e.g., n123.
  --list-namespace LIST_NAMESPACE
                        Namespace for the list subject. Default is
                        http://vivo.mydomain.edu/individual/.
  --list-identifier LIST_IDENTIFIER
                        Identifier for the list subject, e.g., n123.
  --definition {person_summary,person,document_summary,organization_summary}
                        Default is person.
  --list-definition {person_summary_with_positions_in}
                        Default is person_summary_with_positions_in.
  --limit LIMIT         List limit.
  --offset OFFSET       List offset.

For example, to start:

python vivo2notld_service.py

or:

python vivo2notld_service.py --format json --endpoint http://192.168.99.100:8080/vivo/api/sparqlQuery --username vivo_root@gwu.edu --password password --namespace http://vivo.gwu.edu/individual/ --identifier n115 --list-namespace http://vivo.gwu.edu/individual/ --list-identifier n336 --limit 10  --debug

The web form will now be available at http://localhost:5000/.

Invoke using curl

For an individual:

curl --data "definition=person&subject_namespace=http://vivo.gwu.edu/individual/&subject_identifier=n115&format=json&definition_type=individual" http://localhost:5000/

For a list:

curl --data "definition=person_summary_with_positions_in&list_subject_namespace=http://vivo.mydomain.edu/individual/&list_subject_identifier=n3360&format=json&definition_type=list&is_limited=true&limit=10&is_offset=true&offset=2" http://localhost:5000/

Tests

python -m unittest discover

Transformation process

For individuals

Note that individuals can be anything, e.g., a person, document, organization, etc.

Specify a definition that describes the simplified data structure and maps to SPARQL clauses.
Generate a SPARQL CONSTRUCT query from the definition.
Execute the SPARQL CONSTRUCT query against a VIVO SPARQL API endpoint.
Transform the resulting RDF graph into a simplified (python) data structure.
Serialize the simplified data structure to JSON, XML, YAML, etc.

Example definition:

person_definition = {
  #?subj is the subject of the definition.  When the definition is processed, it
  #will be bound (with BIND) to a uri.
  #"where" are clauses that should be included in the WHERE. 
    "where": "?subj a foaf:Person .",
    #fields for the subject.
    "fields": {
        "name": {
            #where clauses for the field.
            #both ?subj and ?obj will be replaced with unique variable names.
            "where": "?subj rdfs:label ?obj ."
        },
        "researchArea": {
            #Multiple clauses can be provided.
            "where": """
                        ?subj vivo:hasResearchArea ?ra .
                        ?ra rdfs:label ?obj .
                     """,
            #If optional, where clauses will be wrapped in OPTIONAL.
            "optional": True,
            #Indicates that should be a list, even if there is only a single value.
            "list": True
        },
       "geographicFocus": {
            "where": """
                        ?subj vivo:geographicFocus ?gf .
                        ?gf rdfs:label ?obj .
                     """,
            "optional": True,
            "list": True
       	}, 
        "publications": {
            #A where that traversed multiple relationships.
            "where": """
                        ?subj vivo:relatedBy ?aship .
                        ?aship a vivo:Authorship .
                        ?aship vivo:relates ?obj .
                     """,
            #Definition specifies a definition for a child subject.
            #This is defined elsewhere.
            "definition": document_summary_definition,
            "optional": True,
            "list": True
        }
    }
}

results in (omitting namespace declaration):

CONSTRUCT
{
    ?v0 :type ?v1 .
    ?v0 :researchArea ?v2 .
    ?v0 :researchArea rdf:List .
    ?v0 :geographicFocus ?v3 .
    ?v0 :geographicFocus rdf:List .
    ?v0 :name ?v4 .
    ?v0 :publications ?v5 .
    ?v0 :publications rdf:List .
    ?v5 :type ?v6 .
    ?v5 :journal ?v7 .
    ?v7 :type ?v8 .
    ?v7 :title ?v9 .
    ?v5 :issue ?v10 .
    ?v5 :title ?v11 .
}
WHERE
{
    BIND ((subj-ns:n115)  AS ?v0 )
    {
        ?v0 a foaf:Person .
        ?v0 vitro:mostSpecificType ?v1 .
        ?v0 rdfs:label ?v4 .
    }
    OPTIONAL
    {
        ?v0 vivo:hasResearchArea ?ra .
        ?ra rdfs:label ?v2 .
    }
    OPTIONAL
    {
        ?v0 vivo:geographicFocus ?gf .
        ?gf rdfs:label ?v3 .
    }
    OPTIONAL
    {
        ?v0 vivo:relatedBy ?aship .
        ?aship a vivo:Authorship .
        ?aship vivo:relates ?v5 .
        ?v5 vitro:mostSpecificType ?v6 .
        ?v5 a bibo:Document .
        ?v5 rdfs:label ?v11 .
        OPTIONAL
        {
            ?v5 vivo:hasPublicationVenue ?v7 .
            ?v7 vitro:mostSpecificType ?v8 .
            ?v7 a bibo:Journal .
            ?v7 rdfs:label ?v9 .
        }
        OPTIONAL
        {
            ?v5 bibo:issue ?v10 .
        }
    }
}

and executed against a test data source results in the following (omitting namespaces):

  subj-ns:n115 :geographicFocus rdf:List,
        "New Jersey"@en ;
    :name "Littman, Justin "^^xsd:string ;
    :publications subj-ns:n6493,
        subj-ns:n7738,
        subj-ns:n886,
        rdf:List ;
    :researchArea rdf:List,
        "Economics"^^xsd:string,
        "Philosophy"^^xsd:string ;
    :type vivo:Librarian .

subj-ns:n5080 :title "Bar Journal"^^xsd:string ;
    :type bibo:Journal .

subj-ns:n5258 :title "Foo Journal"^^xsd:string ;
    :type bibo:Journal .

subj-ns:n6493 :journal subj-ns:n5080 ;
    :title "My Second Academic Article"^^xsd:string ;
    :type bibo:AcademicArticle .

subj-ns:n7738 :issue "3"^^xsd:string ;
    :journal subj-ns:n5258 ;
    :title "My First Academic Article"^^xsd:string ;
    :type bibo:AcademicArticle .

subj-ns:n886 :title "My blog posting"^^xsd:string ;
    :type vivo:BlogPosting .

and transformed to the simplified data structure:

{u'researchArea': [u'Philosophy', u'Economics'], u'name': u'Littman, Justin ', 'uri': 'http://vivo.gwu.edu/individual/n115', u'publications': [{u'type': u'BlogPosting', 'uri': 'http://vivo.gwu.edu/individual/n886', u'title': u'My blog posting'}, {u'journal': {u'type': u'', 'uri': 'http://vivo.gwu.edu/individual/n5080', u'title': u'Bar Journal'}, u'type': u'', 'uri': 'http://vivo.gwu.edu/individual/n6493', u'title': u'My Second Academic Article'}, {u'issue': u'3', u'journal': {u'type': u'', 'uri': 'http://vivo.gwu.edu/individual/n5258', u'title': u'Foo Journal'}, u'type': u'', 'uri': 'http://vivo.gwu.edu/individual/n7738', u'title': u'My First Academic Article'}], u'geographicFocus': [u'New Jersey'], u'type': u'Librarian'}

and serialized to JSON:

{
    "researchArea": [
        "Philosophy", 
        "Economics"
    ], 
    "name": "Littman, Justin ", 
    "uri": "http://vivo.gwu.edu/individual/n115", 
    "publications": [
        {
            "type": "BlogPosting", 
            "uri": "http://vivo.gwu.edu/individual/n886", 
            "title": "My blog posting"
        }, 
        {
            "journal": {
                "type": "", 
                "uri": "http://vivo.gwu.edu/individual/n5080", 
                "title": "Bar Journal"
            }, 
            "type": "", 
            "uri": "http://vivo.gwu.edu/individual/n6493", 
            "title": "My Second Academic Article"
        }, 
        {
            "issue": "3", 
            "journal": {
                "type": "", 
                "uri": "http://vivo.gwu.edu/individual/n5258", 
                "title": "Foo Journal"
            }, 
            "type": "", 
            "uri": "http://vivo.gwu.edu/individual/n7738", 
            "title": "My First Academic Article"
        }
    ], 
    "geographicFocus": [
        "New Jersey"
    ], 
    "type": "Librarian"
}

or yaml:

geographicFocus: [New Jersey]
name: 'Littman, Justin '
publications:
-   journal: {title: Bar Journal, type: '', uri: 'http://vivo.gwu.edu/individual/n5080'}
    title: My Second Academic Article
    type: ''
    uri: http://vivo.gwu.edu/individual/n6493
- {title: My blog posting, type: BlogPosting, uri: 'http://vivo.gwu.edu/individual/n886'}
-   issue: '3'
    journal: {title: Foo Journal, type: '', uri: 'http://vivo.gwu.edu/individual/n5258'}
    title: My First Academic Article
    type: ''
    uri: http://vivo.gwu.edu/individual/n7738
researchArea: [Economics, Philosophy]
type: Librarian
uri: http://vivo.gwu.edu/individual/n115

or xml:

<?xml version="1.0" ?>
<librarian uri="http://vivo.gwu.edu/individual/n115">
<researcharea>Economics</researcharea>
<researcharea>Philosophy</researcharea>
<name>Littman, Justin </name>
<publications>
    <blogposting uri="http://vivo.gwu.edu/individual/n886">
        <title>My blog posting</title>
    </blogposting>
    <academicarticle uri="http://vivo.gwu.edu/individual/n6493">
        <journal uri="http://vivo.gwu.edu/individual/n5080">
            <title>Bar Journal</title>
        </journal>
        <title>My Second Academic Article</title>
    </academicarticle>
    <academicarticle uri="http://vivo.gwu.edu/individual/n7738">
        <journal uri="http://vivo.gwu.edu/individual/n5258">
            <title>Foo Journal</title>
        </journal>
        <issue>3</issue>
        <title>My First Academic Article</title>
    </academicarticle>
</publications>
<geographicfocus>New Jersey</geographicfocus>
</librarian>

For lists

Note that:

individuals can be lists of anything with relationships to anything, e.g., persons with positions in a department.
lists support pagination (via offsets and limits).
lists support ordering.

Specify a definition that describes the simplified data structure and maps to SPARQL clauses and provide an optional limit and offset.
Generate a SPARQL CONSTRUCT, SELECT, and SELECT count query from the definition.
- The SELECT query is used to order the list since order is not preserved in the CONSTRUCT.
- The SELECT count is used to provide a total count for the list (i.e., without limits and offsets).
Execute the queries against a VIVO SPARQL API endpoint.
Based on the order of results from SELECT query, transform the RDF graph from the CONSTRUCT into a simplified (python) data structure and add the count returned by the SELECT count query.
Serialize the simplified data structure to JSON, XML, YAML, etc.

Example definition:

definition = {
    #"where" specifies how the list is selected, in this
    #case individuals that are related to a position.
    "where": """
                ?subj vivo:relatedBy ?pos .
                ?pos a vivo:Position .
                ?pos vivo:relates ?obj .
             """,
    #"list_definition" specifies the results, in this
    #case a summary for a person.
    "list_definition": {
        "where": "?subj a foaf:Person .",
        "fields": {
            "name": {
                "where": "?subj rdfs:label ?obj .",
                #"order" specifies the fields to be used
                #for ordering. Number the fields starting
                #with 1.
                "order": 1,
                #"order_asc" specifies whether the list is
                #ordered ascending or descending. If omitted,
                #the default is ascending.
                "order_asc": True
            }
        }
    }
}

results in (omitting namespace declaration):

CONSTRUCT
{
    ?v0 :result ?v1 .
    ?v1 :type ?v2 .
    ?v1 :name ?v3 .
}
WHERE
{
    {
        SELECT DISTINCT ?v0 ?v1 ?v2 ?v3
        WHERE
        {
            BIND ((subj-ns:n3360)  AS ?v0 )
            {
                ?v0 vivo:relatedBy ?pos .
                ?pos a vivo:Position .
                ?pos vivo:relates ?v1 .
                ?v1 a foaf:Person .
                ?v1 vitro:mostSpecificType ?v2 .
                ?v1 rdfs:label ?v3 .
            }
        }
        ORDER BY ASC(?v3)
        LIMIT 4
        OFFSET 2
    }
}

and

SELECT DISTINCT ?v1
WHERE
{
    BIND ((subj-ns:n3360)  AS ?v0 )
    {
        ?v0 vivo:relatedBy ?pos .
        ?pos a vivo:Position .
        ?pos vivo:relates ?v1 .
        ?v1 a foaf:Person .
        ?v1 vitro:mostSpecificType ?v2 .
        ?v1 rdfs:label ?v3 .
    }
}
ORDER BY ASC(?v3)
LIMIT 3
OFFSET 2

and

SELECT (COUNT(DISTINCT ?v1) as ?count)
WHERE
{
    BIND ((subj-ns:n3360)  AS ?v0 )
    {
        ?v0 vivo:relatedBy ?pos .
        ?pos a vivo:Position .
        ?pos vivo:relates ?v1 .
        ?v1 a foaf:Person .
        ?v1 vitro:mostSpecificType ?v2 .
        ?v1 rdfs:label ?v3 .
    }
}

and executed against a test data source results in the following (omitting namespaces):

 subj-ns:n3360 :result subj-ns:n1941,
        subj-ns:n716,
        <http://vivo.mydomain.edu/person-53e4096d5bf17f776300c4e2e8d237ee> .

subj-ns:n1941 :name "Kaplan, Beth "^^xsd:string ;
    :type vivo:Librarian .

subj-ns:n716 :name "Kerchner, Dan "^^xsd:string ;
    :type vivo:Librarian .

<http://vivo.mydomain.edu/person-53e4096d5bf17f776300c4e2e8d237ee> :name "Justin Littman" ;
    :type vivo:Librarian .

and transformed to the simplified data structure:

{'count': u'10', 'limit': 3, 'list': [{u'type': u'Librarian', 'uri': 'http://vivo.mydomain.edu/person-53e4096d5bf17f776300c4e2e8d237ee', u'name': u'Justin Littman'}, {u'type': u'Librarian', 'uri': 'http://vivo.mydomain.edu/individual/n1941', u'name': u'Kaplan, Beth '}, {u'type': u'Librarian', 'uri': 'http://vivo.mydomain.edu/individual/n716', u'name': u'Kerchner, Dan '}], 'offset': 2}

and serialized to JSON:

{
    "count": 10, 
    "limit": 3, 
    "list": [
        {
            "type": "Librarian", 
            "uri": "http://vivo.mydomain.edu/person-53e4096d5bf17f776300c4e2e8d237ee", 
            "name": "Justin Littman"
        }, 
        {
            "type": "Librarian", 
            "uri": "http://vivo.mydomain.edu/individual/n1941", 
            "name": "Kaplan, Beth "
        }, 
        {
            "type": "Librarian", 
            "uri": "http://vivo.mydomain.edu/individual/n716", 
            "name": "Kerchner, Dan "
        }
    ], 
    "offset": 2
}

or yaml:

count: 10
limit: 3
list:
- {name: Justin Littman, type: Librarian, uri: 'http://vivo.mydomain.edu/person-53e4096d5bf17f776300c4e2e8d237ee'}
- {name: 'Kaplan, Beth ', type: Librarian, uri: 'http://vivo.mydomain.edu/individual/n1941'}
- {name: 'Kerchner, Dan ', type: Librarian, uri: 'http://vivo.mydomain.edu/individual/n716'}
offset: 2

or xml:

<?xml version="1.0" ?>
<list count="10" limit="3" offset="2">
    <librarian uri="http://vivo.mydomain.edu/person-53e4096d5bf17f776300c4e2e8d237ee">
        <name>Justin Littman</name>
    </librarian>
    <librarian uri="http://vivo.mydomain.edu/individual/n1941">
        <name>Kaplan, Beth </name>
    </librarian>
    <librarian uri="http://vivo.mydomain.edu/individual/n716">
        <name>Kerchner, Dan </name>
    </librarian>
</list>

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
templates		templates
tests		tests
vivo2notld		vivo2notld
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
vivo2notld.py		vivo2notld.py
vivo2notld_service.py		vivo2notld_service.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

templates

templates

tests

tests

vivo2notld

vivo2notld

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE.txt

LICENSE.txt

README.md

README.md

requirements.txt

requirements.txt

vivo2notld.py

vivo2notld.py

vivo2notld_service.py

vivo2notld_service.py

Repository files navigation

VIVO2NotLD (not Linked Data)

Commandline

Web application

Invoke using curl

Tests

Transformation process

For individuals

For lists

About

Releases

Packages

Languages

License

gwu-libraries/vivo2notld

Folders and files

Latest commit

History

Repository files navigation

VIVO2NotLD (not Linked Data)

Commandline

Web application

Invoke using curl

Tests

Transformation process

For individuals

For lists

About

Resources

License

Stars

Watchers

Forks

Languages