ADSimportpipeline

Overview

Coordinates ingest of a full ADS record.

Parses "classic" bibcodes files defined in settings.py
Operates on any bibcode whose "timestamp" differs from the cooresponding "JSON_fingerprint" field in the mongodb
Uses ads.ADSExports.ADSRecords to consolidate data from classic based on bibcodes in 2.
Parses resulting xmlobject to python dict via xmltodict.py
Enforces type=list on any potentially repeated entries
Merges any repeated blocks having the same @type attribute
Insert (upsert=True) data to mongodb

Step 1 is initiated by invoking run.py.

Async workflow with rabbitMQ

Invoking run.py --async publishes the [(bibcode, fingerprint),...] records to rabbitmq.
Workers that consume these messages are defined in pipeline/psettings.py and pipeline/workers.py.
Workers are controlled via a master process in pipeline/ADSimportpipeliny.py.

Requirements (version numbers will come at release time)

pika
rabbitmq
ADSExports
pymongo + mongo
Note: The rabbitmq server should be configured for frame_max=512000
Note: pika should be configured with frame_max=512000 (seemingly must be changed in spec.py in addition to normal connection definition)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
lib		lib
logs		logs
pipeline		pipeline
rules		rules
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ast.txt		ast.txt
ast2.txt		ast2.txt
run.py		run.py
schema.json		schema.json
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib

lib

logs

logs

pipeline

pipeline

rules

rules

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

ast.txt

ast.txt

ast2.txt

ast2.txt

run.py

run.py

schema.json

schema.json

settings.py

settings.py

Repository files navigation

ADSimportpipeline

Overview

Async workflow with rabbitMQ

Requirements (version numbers will come at release time)

About

Releases

Packages

License

ehenneken/ADSimportpipeline

Folders and files

Latest commit

History

Repository files navigation

ADSimportpipeline

Overview

Async workflow with rabbitMQ

Requirements (version numbers will come at release time)

About

Resources

License

Stars

Watchers

Forks