KUNGFU.AI TA1 D3M Primitives

This repository contains all of the primitives developed by the teams at KUNGFU.AI, Yonder, and New Knowledge for the D3M program.

Installation

kf-d3m-primitives requires Python 3.6, and the easiest way to install it is via pip:

pip install kf-d3m-primitives

Development

The latest versions of D3M datasets can be downloaded by running the following script from inside the cloned directory. D3M Gitlab credentials are required.

python download_datasets.py

To make a docker image with kf-d3m-primitives installed on top of the D3M program image run:

make build

To download the large static volumes that are necessary to run and test some of the primitives run:

make volumes

To run the image with the downloaded datasets and static volumes mounted run:

make run

Tests

To test that each primitive's produce method, and, where applicable, its set_training_data, fit, get_params, and set_params methods can be called sucessfully within D3M pipelines, run the following command. This will also test that the predictions produced on test sets by each pipeline that can be scored by the D3M runtime.

make test

Submission

To generate json annotations for all primitives with the required directory structure for D3M submission run:

make annotations

To generate yml.gz pipeline run documents for all CPU-dependent pipelines with the required directory structure for D3M submission run:

make pipelines-cpu

To generate yml.gz pipeline run documents for all GPU-dependent pipelines with the required directory structure for D3M submission run:

make pipelines-gpu

Primitives

Data Preprocessing

DataCleaningPrimitive: wrapper of the data cleaning primitive based on the punk library.
DukePrimitive: wrapper of the Duke library in the D3M infrastructure.
SimonPrimitive: LSTM-FCN neural network trained on 18 different semantic types, which infers the semantic type of each column. Base library here.
GoatForwardPrimitive: geocodes names of locations into lat/long pairs with requests to photon geocoding server (based on OpenStreetMap).
GoatReversePrimitive: geocodes lat/long pairs into geographic names of varying granularity with requests to photon geocoding server (based on OpenStreetMap).

Clustering

HdbscanPrimitive: wrapper of HDBSCAN and DBSCAN.
StorcPrimitive: wrapper of tslearn's kmeans implementations.
SpectralClustering: wrapper of Spectral Clustering.

Feature Selection

PcaFeaturesPrimitive: wrapper of the Punk feature ranker into D3M infrastructure.
RfFeaturesPrimitive wrapper of the Punk punk rrfeatures library into D3M infrastructure.

Dimensionality Reduction

TsnePrimitive: wrapper of TSNE.

Natural Language Processing

Sent2VecPrimitive: converts sentences into numerical feature representations. Base library here.

Image Classification

GatorPrimitive: Inception V3 model pretrained on ImageNet finetuned for classification.

Object Detection

ObjectDetectionRNPrimitive: wrapper of the Keras implementation of Retinanet from this repo. The original Retinanet paper can be found here.

Time Series Classification

KaninePrimitive: wrapper of KNeighborsTimeSeriesClassifier.
LstmFcnPrimitive: wrapper of LSTM Fully Convolutional Networks for Time Series Classification.

Time Series Forecasting

DeepArPrimitive: wrapper of DeepAR - a recurrent, autoregressive, probabilistic time series forecasting method from GluonTS.
NBEATSPrimitive: wrapper of N-BEATS - Neural basis expansion analysis for interpretable time series forecasting from GluonTS.
VarPrimitive: wrapper of VAR for multivariate time series and auto_arima for univariate time series.

Interpretability

shap_explainers: wrapper of Lundberg's shapley values implementation for tree models. Currently integrated into d3m.primitives.learner.random_forest.DistilEnsembleForest as produce_shap_values().

Remote Sensing

RemoteSensingPretrainedPrimitive: featurizes remote sensing imagery using pre-trained models that were optimized with a self-supervised objective. There are two inference models that correspond to two pretext tasks: Augmented Multiscale Deep InfoMax and Momentum Contrast. The implementation of the inference models comes from this repo.
MlpClassifierPrimitive: trains a two-layer neural network classifier on featurized remote sensing imagery. Produces heatmap visualizations for predictions using gradient-based GradCam technique.
ImageRetrievalPrimitive: retrieves semantically similar images from an index of un-annotated images using heuristics. Supports an iterative, human-in-the-loop, retrieval pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
kf_d3m_primitives		kf_d3m_primitives
scripts		scripts
test_data		test_data
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
download_datasets.py		download_datasets.py
download_volumes.py		download_volumes.py
generate_annotations.py		generate_annotations.py
generate_pipelines.py		generate_pipelines.py
makefile		makefile
requirements.txt		requirements.txt
setup.py		setup.py

License

phorne-uncharted/d3m-primitives

Folders and files

Latest commit

History

Repository files navigation