This micro service saves sample information from external LIMS into a mongodb instance for easy querying.
-
python - install the required version specified in
Pipfile
:[requires] python_version = "<version>"
-
install the required packages using pipenv:
brew install pipenv pipenv install --dev
-
Optionally, to test SFTP, this Docker image is helpful.
-
mongodb
brew tap mongodb/brew brew install mongodb-community@4.2 brew services start mongodb-community@4.2
Once all the required packages are installed, enter the virtual environment with:
pipenv shell
The following runtime flags are available:
SETTINGS_MODULE=crawler.config.development python runner.py --help
usage: runner.py [-h] [--sftp] [--scheduled]
Store external samples in mongo.
optional arguments:
-h, --help show this help message and exit
--scheduled start scheduled execution, defaults to running once
--sftp use SFTP to download CSV files, defaults to using local files
--keep-files keeps centre csv files after runner has been executed
--add-to-dart add samples to DART, by default they are not
When the crawler process runs nightly it should be updating the MLWH lighthouse_sample table as it goes with records for all rows that are inserted into MongoDB. If that MLWH insert process fails you should see a critical exception for the file in Lighthouse-UI. This may be after records inserted correctly into MongoDB, and re-running the file will not re-attempt the MLWH inserts in that situation.
There is a manual migration task that can be run to fix this discrepancy (update_mlwh_with_legacy_samples) that allows
insertion of rows to the MLWH between two MongoDB created_at
datetimes.
NB: Both datetimes are inclusive: range includes those rows greater than or equal to start datetime, and less than or equal to end datetime.
Usage (inside pipenv shell):
python run_migration.py update_mlwh_with_legacy_samples 200115_1200 200116_1600
Where the time format is YYMMDD_HHmm. Both start and end timestamps must be present.
The process should not duplicate rows that are already present in MLWH, so you can be generous with your timestamp range.
This is a history of past and current rules by which positive samples are further filtered and identified as
'filtered positive'. Note that any rule change requires the update_filtered_positives
migration be run, as outlined
in the below relevant section.
The implementation of the current version can be found in FilteredPositiveIdentifier, with the implementation of previous versions (if any) in the git history.
A sample is filtered positive if:
- it has a positive RESULT
This is the pre-"fit-to-pick" implementation, without any extra filtering on top of the RESULT=Positive requirement.
A sample is filtered positive if:
- it has a positive RESULT
- it is not a control (ROOT_SAMPLE_ID does not start with 'CBIQA_')
- all of CH1_CQ, CH2_CQ and CH3_CQ are
None
, or one of these is less than or equal to 30
More information on this version can be found on this Confluence page.
A sample is filtered positive if:
- it has a 'Positive' RESULT
- it is not a control (ROOT_SAMPLE_ID does not start with 'CBIQA_', 'QC0', or 'ZZA000')
- all of CH1_CQ, CH2_CQ and CH3_CQ are
None
, or one of these is less than or equal to 30
More information on this version can be found on this Confluence page.
On changing the positive filtering version/definition, all unpicked samples stored in MongoDB, MLWH and DART need updating to determine whether they are still filtered positive under the new rules, and can therefore be cherrypicked. In order to keep the databases in sync, the update process for all is performed in a single manual migration (update_filtered_positives) which identifies unpicked samples, re-determines their filtered positive value, and updates the databases.
Usage (inside pipenv shell):
python run_migration.py update_filtered_positives
OR
python run_migration.py update_filtered_positives omit_dart
By default, the migration will attempt to use DART, as it will safely fail if DART cannot be accessed, hence warning
the user to reconsider what they are doing. However, using DART can be omitted by including the omit_dart
flag.
Neither process duplicates any data, instead updating existing entries.
When the Beckman robots come online, we need to populate the DART database with the filtered positive samples that are available physically. This can be achieved using the 'update_dart' migration.
This can also be used similarly to the existing MLWH migration: if a DART insert process fails, you will see a critical exception for the file in the Lighthouse-UI. After addressing reason for failure, run between relevant timestamps to re-insert/update data into DART.
In short, this migration performs the following steps:
- Get the
RESULT = positive
samples (which are not controls) from mongo between a start and end date - Removes samples from this list which have already been cherrypicked by inspecting the events in the MLWH
- Determining whether they are filtered positive samples using the latest rule
- Determining the plate barcode UUID
- Update mongo with the filtered positive and UUID values
- Update MLWH with the same filtered positive and UUID values
- Create/update the DART database with all the positive samples and setting the filtered positive samples as 'pickable'
To run the migration:
python run_migration.py update_mlwh_and_dart_with_legacy_samples 200115_1200 200116_1600
Where the time format is YYMMDD_HHmm. Both start and end timestamps must be present.
The tests require a connection to the 'lighthouse_sample' table in the Multi-LIMS Warehouse (MLWH). The credentials for
connecting to the MLWH are configured in the defaults.py
file, or in the relevant environment file, for example
test.py
. You can run the tests by connecting to the UAT instance of the MLWH, or an existing local copy you already
have. Or, you can create a basic local one containing just the relevant table by running the following from the top
level folder (this is what it does in the CI):
python setup_test_db.py
To run the tests, execute:
python -m pytest -vs
Black is used as a formatter, to format code before commiting:
black .
Mypy is used as a type checker, to execute:
mypy .
Flake8 is used for linting, to execute:
flake8
A little convenience script can be used to run the formatting, type checking and linting:
./forlint.sh
If you do not have root access pyodbc will not work if you use brew Using the docker compose you can set up the full stack and it will also set the correct environment variables
To build the containers:
docker-compose up
To run the tests:
You will need to find the id of the container with image name crawler_runner
docker exec -ti <container_id> python -m pytest -vs
There is now a volume for the runner so there is hot reloading i.e. changes in the code and tests will be updated when you rerun tests.
This post was used for the naming conventions within mongo.
MonkeyType is useful to automatically create type annotations. View it's README for updated instructions on how to use it.
Node is required to run npx:
npx markdown-toc -i README.md