Automated Ingest Report

It is important to closely monitor the state of ingest related data stores, especially Solr. This repo holds code that daily gathers the list of canonical bibcodes and current bibcodes in Solr to compute what is missing, what is new, what is deleted, etc.

Running

To gather all the needed data and compute state: python run.py --gather --compute

Fulltext Section

Errors are defined in the config file
- new errors can be added to this list
Results will only change if the pipeline has processed all.links since the last AIR
- we assume the location of all.links to be /proj/ads/abstracts/config/links/fulltext/all.links
- There is a date in the report indicating the date of the last fulltext extraction
- A timeframe of 15 hours is used to avoid pulling logs from a pipeline that is mid-process
  - This will fail if we force extraction (-e flag) as the pipeline takes much longer in this case

This directory structure needs to exist for files to be stored:

data
└── ft
    ├── Errno_2_No_such_file_or_directory
    ├── extraction_failed_for_bibcode
    ├── format_not_currently_supported_for_extraction
    ├── is_linked_to_a_non_existent_file
    └── is_linked_to_a_zero_byte_size_file

Maintainer

Steve McDonald

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
air		air
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
dev-requirements.txt		dev-requirements.txt
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

air

air

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.py

config.py

dev-requirements.txt

dev-requirements.txt

pytest.ini

pytest.ini

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

Automated Ingest Report

Running

Fulltext Section

Maintainer

About

Releases

Packages

Languages

License

spacemansteve/AutomatedIngestReport

Folders and files

Latest commit

History

Repository files navigation

Automated Ingest Report

Running

Fulltext Section

Maintainer

About

Resources

License

Stars

Watchers

Forks

Languages