Skip to content

Cataloging of millions of research publications

License

Notifications You must be signed in to change notification settings

openAccess/fatcat

 
 

Repository files navigation

  __       _            _   
 / _| __ _| |_ ___ __ _| |_ 
| |_ / _` | __/ __/ _` | __|
|  _| (_| | || (_| (_| | |_ 
|_|  \__,_|\__\___\__,_|\__|

                                    ... catalog all the things!

This repository contains source code for 'fatcat', an editable catalog of published written works (mostly journal articles), with a focus on tracking the location and status of full-text copies to ensure "perpetual access".

The RFC is the original design document, and the best place to start for technical background. There is a work-in-progress "guide" at https://guide.fatcat.wiki; the canonical public location of this repository is https://github.com/internetarchive/fatcat.

The public production web interface is https://fatcat.wiki.

See the LICENSE file for detailed permissions and licensing of both python and rust code. In short, the auto-generated client libraries are permissively released, while the API server and web interface are strong copyleft (AGPLv3).

Building and Tests

There are three main components:

  • backend API server and database (in Rust)
  • API client libraries and bots (in Python)
  • front-end web interface (in Python; built on API and library)

Automated integration tests run on Gitlab CI (see .gitlab-ci.yml) on the Internet Archive's internal (not public) infrastructure.

See ./python/README.md and ./rust/README.md for details on building, running, and testing these components.

The python client library, which is automatically generated from the API schema, lives under ./python_client/.

Status

  • SQL and HTTP API schemas
    • Basic entities
    • one-to-many and many-to-many entities
    • JSON(B) "extra" metadata fields
    • full rev1 schema for all entities
    • file sets and web captures
    • editgroup review: annotations
  • HTTP API Server
    • base32 encoding of UUID identifiers
    • inverse many-to-many helpers (files-by-release, release-by-creator)
    • Authentication (eg, accounts, OAuth2, JWT)
    • Authorization (aka, roles)
  • Web Interface
    • Migrate Python codebase
    • Creation and editing of all entities
  • Other
    • Elasticsearch schema
    • Basic logging
    • Swagger-UI
    • Bulk metadata exports
    • Sentry (error reporting)
    • Metrics

About

Cataloging of millions of research publications

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 58.7%
  • Python 35.2%
  • HTML 3.6%
  • PLpgSQL 1.6%
  • Shell 0.5%
  • TeX 0.4%