Skip to content
This repository has been archived by the owner on Apr 16, 2022. It is now read-only.

NYPL-Simplified/metadata_wrangler

Repository files navigation

Library Simplified Metadata Wrangler - DEPRECATED

Build Status

This module has been deprecated.

Default Branch: develop

This is the Metadata Wrangler for Library Simplified. The Metadata Wrangler server utilizes and intelligently amalgamates a wide variety of information sources for library ebooks and incorporates them into the reading experience for users by improving selection, search, and recommendations.

This application depends on Library Simplified Server Core as a git submodule.

Local Installation

Thorough deployment instructions, including essential libraries for Linux systems, can be found in the Library Simplified wiki. If this is your first time installing a Library Simplified server, please review those instructions.

Keep in mind that the Metadata Wrangler server requires unique database names and a data directory, as detailed below.

Once the database is running, run the application locally with python app.py and go to http://localhost:7000. If you rather run this server locally through Docker, read the "Docker" section below, though this option doesn't currently allow for local development.

Database

When installing and running a local Metadata Wrangler, you need to create the relevant databases in Postgres. If you are using Docker, you can skip this step since the Postgres database will be created in a container.

$ sudo -u postgres psql
CREATE DATABASE simplified_metadata_dev;
CREATE DATABASE simplified_metadata_test;

# Create users, unless you've already created them for another Library Simplified project
CREATE USER simplified with password '[password]';
CREATE USER simplified_test with password '[password]';

grant all privileges on database simplified_metadata_dev to simplified;
grant all privileges on database simplified_metadata_test to simplified_test;

Python setup

This project uses python 3 for development. Right now, Docker can't be used for local development. Instead, you will need to set up a local virtual environment to install packages and run the project. Start by creating the virutal environment:

$ python3 -m venv env

Then include the database URLS as environment variables at the end in /env/bin/activate:

export SIMPLIFIED_PRODUCTION_DATABASE="postgres://simplified:[password]@localhost:5432/simplified_metadata_dev"
export SIMPLIFIED_TEST_DATABASE="postgres://simplified_test:[password]@localhost:5432/simplified_metadata_test"

Activate the virtual environment:

$ source env/bin/activate

and install the dependencies:

$ pip install -r requirements-dev.txt

Data Directory

Clone the Library Simplified data directory to a location of your choice:

$ git clone https://github.com/NYPL-Simplified/data.git YOUR_DATA_DIRECTORY

In your content server configuration file, your specified "data_directory" should be YOUR_DATA_DIRECTORY.

Git Branch Workflow

Branch Python Version
develop Python 3
main Python 3
python2 Python 2

The default branch is develop and that's the working branch that should be used when branching off for bug fixes or new features. Once a feature branch pull request is merged into develop, the changes can be merged to main to create releases.

Python 2 stopped being supported after January 1st, 2020 but there is still a python2 branch which can be used. As of May 2021, development will be done in the develop and main branches.

Testing

The Github Actions CI service runs the pytest unit tests against Python 3.6, 3.7, 3.8 and 3.9 automatically using tox.

To run pytest unit tests locally, install tox. Make sure you're in the current virtual environment.

$ pip install tox

Then run tox to run the pytests in all Python versions.

$ tox

This uses the local Postgres database by default so that service should be running. If you rather depend on using Docker to spin up a Postgres container for testing, read more in the "Testing with Docker" section below.

Tox has an environment for each python version and an optional -docker factor that will automatically use docker to deploy service containers used for the tests. You can select the environment you would like to test with the tox -e flag. More on this in the following sections.

Environments

Environment Python Version
py36 Python 3.6
py37 Python 3.7
py38 Python 3.8
py39 Python 3.9

All of these environments are tested by default when running tox. To test one specific environment you can use the -e flag.

To run pytest only with Python 3.8, for example, run:

tox -e py38

You need to have the Python versions you are testing against installed on your local system. tox searches the system for installed Python versions, but does not install new Python versions. If tox doesn't find the Python version its looking for it will give an InterpreterNotFound errror.

Pyenv is a useful tool to install multiple Python versions, if you need to install missing Python versions in your system for local testing.

Testing with Docker

If you install tox-docker, tox will take care of setting up all the service containers necessary to run the unit tests and pass the correct environment variables to configure the tests to use these services. Using tox-docker is not required, but it is the recommended way to run the tests locally, since it runs the tests in the same way they are run on the Github Actions CI server.

$ pip install tox-docker

The docker functionality is included in a docker factor that can be added to the environment. To run an environment with a particular factor you add it to the end of the environment.

To test with Python 3.8 using docker containers for the services, run:

$ tox -e py38-docker

Local services

If you already have Postgres running locally, you can use that service instead by setting the following environment variable:

  • SIMPLIFIED_TEST_DATABASE

Make sure the ports and usernames are updated to reflect the local configuration.

# Set environment variables
$ export SIMPLIFIED_TEST_DATABASE="postgres://simplified_test:test@localhost:9005/simplified_metadata_test"

# Run tox
$ tox -e py38

Override pytest arguments

If you wish to pass additional arguments to pytest you can do so through tox. The default argument passed to pytest is tests, however you can override this. Every argument passed after a -- to the tox command will then be passed to pytest, overriding the default.

For example, when you only want to test changes in one test file, you can pass the path to the file after --. To run the test_content_cafe.py tests with Python 3.6 using docker, run:

$ tox -e py36-docker -- tests/test_content_cafe.py

To run specific tests within a file, pass in the test class and the optional function name in the following format:

$ tox -e py36-docker -- tests/test_content_cafe.py::TestContentCafeAPI::test_from_config

Docker

Docker is used to run the application server, a scripts server, and a database in containers that communicate with each other. This allows for easy deployment but can't currently be used for local development. This is because the current installation script installs the repo by cloning it from Github, and not the current local file system.

In the /docker directory, there are three Dockerfiles for each separate container service. Rather than running each container individually, use the ./docker-compose.yml file and the docker compose command to orchestrate building and running the containers.

Note: The docker compose command is "experimental" but will become the default command to use docker-compose.yml files. The existing command line tool docker-compose is still supported if that tool is preferred. Just replace the following docker compose commands with docker-compose.

Running the Containers

To build and start the three containers, run:

$ docker compose up -d

Once the base images are downloaded, the server images are built, and the servers are running, visit http://localhost for the Metadata Wrangler homepage. The -d flag runs the command in "detached" mode so they will run in the background. If you need to rebuild the images, add the --build flag.

It's possible to stop running the containers but not to remove them using:

$ docker compose stop

You can re-start the existing containers with:

$ docker compose start

If you want to stop and remove the containers, run:

$ docker compose down

Debugging Local Containers

It's possible to get access to a container to see local files and logs. First, find the container ID of the service you want to get access to. The following command will list all containers, running and stopped, on the machine:

$ docker ps --all

This will return a list of containers and the relevant containers created by this repo's docker-compose.yml file will be "metadata_wrangler_scripts", "metadata_wrangler_webapp", and "postgres12.0-alpine". Once you get the "Container ID" of the service you want access to, for example 56dcb9e3da3b, run:

$ docker exec -it 56dcb9e3da3b bash

This will give you bash access to the container to find logs, located at the given directory for each container's volumes directory configuration in docker-compose.yml. To exit, run exit.

Continuous Integration

This project runs all the unit tests through Github Actions for new pull requests and when merging into the default develop branch. The relevant file can be found in .github/workflows/test.yml. When contributing updates or fixes, it's required for the test Github Action to pass for all python 3 environments. Run the tox command locally before pushing changes to make sure you find any failing tests before committing them.

License

Copyright © 2015 The New York Public Library, Astor, Lenox, and Tilden Foundations

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.