Multimodal AutoML Benchmark on Tables with Text Fields

This benchmark contains diverse tabular datasets, each containing numeric/categorical as well as text columns. The goal is to evaluate the performance of (automated) ML systems for supervised learning tasks (classification and regression) with such multimodal data. Python code is provided to run different variants of the AutoGluon AutoML tool on the benchmark.

Details about the Datasets

The datasets in our benchmark are described in the multimodal_text_benchmark folder, as well as example code to load a dataset into Python.

License

The versions of datasets in this benchmark are released under this license: CC BY-NC-SA. Note that the datasets in this benchmark are modified versions of previously publicly-available original copies and we do not own any of the datasets in the benchmark. Any data from this benchmark which has previously been published elsewhere falls under the original license from which the data originated. Please refer to the licenses of each original source linked in the multimodal_text_benchmark README.

Install the Benchmark Suite

cd multimodal_text_benchmark
# Install the benchmarking suite
python3 -m pip install -U -e .

You can do a quick test of the installation by going to the test folder

cd multimodal_text_benchmark/tests
python3 -m pytest test_datasets.py

To access one dataset, try to use the following code:

from auto_mm_bench.datasets import dataset_registry

print(dataset_registry.list_keys())
train_dataset = dataset_registry.create('product_sentiment_machine_hack', 'train')
test_dataset = dataset_registry.create('product_sentiment_machine_hack', 'test')
print(train_dataset.data)

Install AutoGluon

This repository contains a particular version of AutoGluon we previously reported benchmark performance for. We recommend installing it in a fresh virtualenv. To use this version, you will need to install MXNet first as a prerequisite. It is recommended to use MXNet 1.8 wheel with CUDA 11.0:

# CPU-only
python3 -m pip install https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0-py2.py3-none-manylinux2014_x86_64.whl

# CUDA 11 Version
python3 -m pip install https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl

Once you have MXNet, you can install our version of AutoGluon:

cd autogluon
bash full_install.sh

For more information or if you want to run a different version of AutoGluon, please refer to the AutoGluon website.

Run Experiments

Go to multimodal_text_benchmark/scripts/benchmark to see how to run different ML methods over the benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
autogluon		autogluon
competition_submissions		competition_submissions
multimodal_text_benchmark		multimodal_text_benchmark
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autogluon

autogluon

competition_submissions

competition_submissions

multimodal_text_benchmark

multimodal_text_benchmark

README.md

README.md

Repository files navigation

Multimodal AutoML Benchmark on Tables with Text Fields

Details about the Datasets

License

Install the Benchmark Suite

Install AutoGluon

Run Experiments

About

Releases

Packages

Contributors 3

Languages

submission001/anonymoussubmission_automl

Folders and files

Latest commit

History

Repository files navigation

Multimodal AutoML Benchmark on Tables with Text Fields

Details about the Datasets

License

Install the Benchmark Suite

Install AutoGluon

Run Experiments

About

Resources

Stars

Watchers

Forks

Languages