Skip to content

phorne-uncharted/d3m-primitives

 
 

Repository files navigation

KUNGFU.AI TA1 D3M Primitives

This repository contains all of the primitives developed by the teams at KUNGFU.AI, Yonder, and New Knowledge for the D3M program.

Installation

kf-d3m-primitives requires Python 3.6, and the easiest way to install it is via pip:

pip install kf-d3m-primitives

Development

The latest versions of D3M datasets can be downloaded by running the following script from inside the cloned directory. D3M Gitlab credentials are required.

python download_datasets.py

To make a docker image with kf-d3m-primitives installed on top of the D3M program image run:

make build

To download the large static volumes that are necessary to run and test some of the primitives run:

make volumes

To run the image with the downloaded datasets and static volumes mounted run:

make run

Tests

To test that each primitive's produce method, and, where applicable, its set_training_data, fit, get_params, and set_params methods can be called sucessfully within D3M pipelines, run the following command. This will also test that the predictions produced on test sets by each pipeline that can be scored by the D3M runtime.

make test

Submission

To generate json annotations for all primitives with the required directory structure for D3M submission run:

make annotations

To generate yml.gz pipeline run documents for all CPU-dependent pipelines with the required directory structure for D3M submission run:

make pipelines-cpu

To generate yml.gz pipeline run documents for all GPU-dependent pipelines with the required directory structure for D3M submission run:

make pipelines-gpu

Primitives

Data Preprocessing

  1. DataCleaningPrimitive: wrapper of the data cleaning primitive based on the punk library.

  2. DukePrimitive: wrapper of the Duke library in the D3M infrastructure.

  3. SimonPrimitive: LSTM-FCN neural network trained on 18 different semantic types, which infers the semantic type of each column. Base library here.

  4. GoatForwardPrimitive: geocodes names of locations into lat/long pairs with requests to photon geocoding server (based on OpenStreetMap).

  5. GoatReversePrimitive: geocodes lat/long pairs into geographic names of varying granularity with requests to photon geocoding server (based on OpenStreetMap).

Clustering

  1. HdbscanPrimitive: wrapper of HDBSCAN and DBSCAN.

  2. StorcPrimitive: wrapper of tslearn's kmeans implementations.

  3. SpectralClustering: wrapper of Spectral Clustering.

Feature Selection

  1. PcaFeaturesPrimitive: wrapper of the Punk feature ranker into D3M infrastructure.

  2. RfFeaturesPrimitive wrapper of the Punk punk rrfeatures library into D3M infrastructure.

Dimensionality Reduction

  1. TsnePrimitive: wrapper of TSNE.

Natural Language Processing

  1. Sent2VecPrimitive: converts sentences into numerical feature representations. Base library here.

Image Classification

  1. GatorPrimitive: Inception V3 model pretrained on ImageNet finetuned for classification.

Object Detection

  1. ObjectDetectionRNPrimitive: wrapper of the Keras implementation of Retinanet from this repo. The original Retinanet paper can be found here.

Time Series Classification

  1. KaninePrimitive: wrapper of KNeighborsTimeSeriesClassifier.

  2. LstmFcnPrimitive: wrapper of LSTM Fully Convolutional Networks for Time Series Classification.

Time Series Forecasting

  1. DeepArPrimitive: wrapper of DeepAR - a recurrent, autoregressive, probabilistic time series forecasting method from GluonTS.

  2. NBEATSPrimitive: wrapper of N-BEATS - Neural basis expansion analysis for interpretable time series forecasting from GluonTS.

  3. VarPrimitive: wrapper of VAR for multivariate time series and auto_arima for univariate time series.

Interpretability

shap_explainers: wrapper of Lundberg's shapley values implementation for tree models. Currently integrated into d3m.primitives.learner.random_forest.DistilEnsembleForest as produce_shap_values().

Remote Sensing

  1. RemoteSensingPretrainedPrimitive: featurizes remote sensing imagery using pre-trained models that were optimized with a self-supervised objective. There are two inference models that correspond to two pretext tasks: Augmented Multiscale Deep InfoMax and Momentum Contrast. The implementation of the inference models comes from this repo.

  2. MlpClassifierPrimitive: trains a two-layer neural network classifier on featurized remote sensing imagery. Produces heatmap visualizations for predictions using gradient-based GradCam technique.

  3. ImageRetrievalPrimitive: retrieves semantically similar images from an index of un-annotated images using heuristics. Supports an iterative, human-in-the-loop, retrieval pipeline.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Other 0.6%