Skip to content

helloworld163/DBLP-Parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DBLP-Parser

The source files in this repository are written in python. The parsers were written for the DBLP xml dataset (dblp.xml) provided at http://dblp.uni-trier.de/xml/

To use the parsers provided, the document type definition, DTD (also provided at the link above as 'dblp.dtd') must be copied into the xml document (dblp.xml) using the standard internal subset declaration format of xml (see http://en.wikipedia.org/wiki/Document_type_definition for example).

Parsers provided (8):

  • DBLP/conf-edges.py (outputs to )
  • DBLP/conf-nodes.py (outputs to )
  • DBLP/dblp_parser-edges.py
  • DBLP/dblp_parser-nodes.py
  • DBLP/edges-id.py
  • DBLP/node-id.py
  • DBLP/normalized-edges.py
  • DBLP/normalized-nodes.py

Note that the results outputted in the above files are not sorted in any particular order; you can use the unix 'sort' command to obtain alphabetical ordering of results.

To run the parsers from the command line, you may use the example command: python dblp-parser-edges.py /path-to-file/dblp.xml

NOTE: the dblp.xml file is very large and takes some time to parse with the python programs provided. You can test the programs against the util/test.xml file provided, which contains a small set of the data contained in dblp.xml. Run the test with the following command: python dblp-parser-edges.py /path-to-directory/utils/test.xml

About

The files in this repository are primarily written in python. The parsers were written for the DBLP xml dataset (dblp.xml) provided at http://dblp.uni-trier.de/xml/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%