Getting Started

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem.

BlazingSQL is a GPU accelerated SQL engine built on top of the RAPIDS ecosystem. RAPIDS is based on the Apache Arrow columnar memory format, and cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

BlazingSQL is a SQL interface for cuDF, with various features to support large scale data science workflows and enterprise datasets.

Query Data Stored Externally - a single line of code can register remote storage solutions, such as Amazon S3.
Simple SQL - incredibly easy to use, run a SQL query and the results are GPU DataFrames (GDFs).
Interoperable - GDFs are immediately accessible to any RAPIDS library for data science workloads.

Try our 5-min Welcome Notebook to start using BlazingSQL and RAPIDS AI.

Getting Started

Here's two copy + paste reproducable BlazingSQL snippets, keep scrolling to find example Notebooks below.

Create and query a table from a cudf.DataFrame:

import cudf

df = cudf.DataFrame()

df['key'] = ['a', 'b', 'c', 'd', 'e']
df['val'] = [7.6, 2.9, 7.1, 1.6, 2.2]

from blazingsql import BlazingContext
bc = BlazingContext()

bc.create_table('game_1', df)

bc.sql('SELECT * FROM game_1 WHERE val > 4')

	Key	Value
0	a	7.6
1	b	7.1

Create and query a table from a AWS S3 bucket:

from blazingsql import BlazingContext
bc = BlazingContext()

bc.s3('blazingsql-colab', bucket_name='blazingsql-colab')

bc.create_table('taxi', 's3://blazingsql-colab/yellow_taxi/taxi_data.parquet')

bc.sql('SELECT passenger_count, trip_distance FROM taxi LIMIT 2')

	passenger_count	fare_amount
0	1.0	1.1
1	1.0	0.7

Examples

Notebook Title	Description	Try Now
Welcome Notebook	An introduction to BlazingSQL Notebooks and the GPU Data Science Ecosystem.
The DataFrame	Learn how to use BlazingSQL and cuDF to create GPU DataFrames with SQL and Pandas-like APIs.
Data Visualization	Plug in your favorite Python visualization packages, or use GPU accelerated visualization tools to render millions of rows in a flash.
Machine Learning	Learn about cuML, mirrored after the Scikit-Learn API, it offers GPU accelerated machine learning on GPU DataFrames.

Documentation

You can find our full documentation at docs.blazingdb.com.

Install Using Conda

BlazingSQL can be installed with conda (miniconda, or the full Anaconda distribution) from the blazingsql channel:

Note: BlazingSQL is supported only on Linux, and with Python version 3.6 and 3.7.

Stable Version

conda install -c blazingsql/label/cuda$CUDA_VERSION -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults blazingsql python=$PYTHON_VERSION

Where $CUDA_VERSION is 10.0, 10.1 or 10.2 and $PYTHON_VERSION is 3.6 or 3.7 For example for CUDA 10.0 and Python 3.7:

conda install -c blazingsql/label/cuda10.0 -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults blazingsql python=3.7

Nightly Version

conda install -c blazingsql-nightly/label/cuda$CUDA_VERSION -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=$PYTHON_VERSION

Where $CUDA_VERSION is 10.0, 10.1 or 10.2 and $PYTHON_VERSION is 3.6 or 3.7 For example for CUDA 10.0 and Python 3.7:

conda install -c blazingsql-nightly/label/cuda10.0 -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=3.7

Build/Install from Source (Conda Environment)

This is the recommended way of building all of the BlazingSQL components and dependencies from source. It ensures that all the dependencies are available to the build process.

Stable Version

Install build dependencies

conda create -n bsql python=$PYTHON_VERSION
conda activate bsql
conda install --yes -c conda-forge openjdk=8.0 maven cmake gtest gmock rapidjson cppzmq cython=0.29 jpype1 netifaces pyhive
conda install --yes -c conda-forge -c blazingsql bsql-toolchain
conda install --yes -c rapidsai -c nvidia -c conda-forge -c defaults cudf=0.14 dask-cudf=0.14 dask-cuda=0.14 cudatoolkit=$CUDA_VERSION

Where $CUDA_VERSION is 10.0, 10.1 or 10.2 and $PYTHON_VERSION is 3.6 or 3.7 For example for CUDA 10.0 and Python 3.7:

conda create -n bsql python=3.7
conda activate bsql
conda install --yes -c conda-forge openjdk=8.0 maven cmake gtest gmock rapidjson cppzmq cython=0.29 jpype1 netifaces pyhive
conda install --yes -c conda-forge -c blazingsql bsql-toolchain
conda install --yes -c rapidsai -c nvidia -c conda-forge -c defaults cudf=0.14 dask-cudf=0.14 dask-cuda=0.14 cudatoolkit=10.0

Build

The build process will checkout the BlazingSQL repository and will build and install into the conda environment.

cd $CONDA_PREFIX
git clone https://github.com/BlazingDB/blazingsql.git
cd blazingsql
git checkout master
export CUDACXX=/usr/local/cuda/bin/nvcc
./build.sh

NOTE: You can do ./build.sh -h to see more build options.

$CONDA_PREFIX now has a folder for the blazingsql repository.

Nightly Version

Install build dependencies

conda create -n bsql python=$PYTHON_VERSION
conda activate bsql

conda install --yes -c conda-forge google-cloud-cpp ninja
conda install --yes -c rapidsai-nightly -c nvidia -c conda-forge -c defaults dask-cuda=0.15 dask-cudf=0.15 cudf=0.15 python=3.7 cudatoolkit=$CUDA_VERSION
conda install --yes -c conda-forge cmake gtest gmock cppzmq cython=0.29 openjdk=8.0 maven thrift=0.13.0 jpype1 netifaces pyhive

Where $CUDA_VERSION is 10.0, 10.1 or 10.2 and $PYTHON_VERSION is 3.6 or 3.7 For example for CUDA 10.0 and Python 3.7:

conda create -n bsql python=3.7
conda activate bsql
conda install --yes -c conda-forge google-cloud-cpp ninja
conda install --yes -c rapidsai-nightly -c nvidia -c conda-forge -c defaults dask-cuda=0.15 dask-cudf=0.15 cudf=0.15 python=3.7 cudatoolkit=10.0
conda install --yes -c conda-forge cmake gtest gmock cppzmq cython=0.29 openjdk=8.0 maven thrift=0.13.0 jpype1 netifaces pyhive

Build

The build process will checkout the BlazingSQL repository and will build and install into the conda environment.

cd $CONDA_PREFIX
git clone https://github.com/BlazingDB/blazingsql.git
cd blazingsql
export CUDACXX=/usr/local/cuda/bin/nvcc
./build.sh

NOTE: You can do ./build.sh -h to see more build options.

$CONDA_PREFIX now has a folder for the blazingsql repository.

Community

Contributing

Have questions or feedback? Post a new github issue.

Please see our guide for contributing to BlazingSQL.

Contact

Feel free to join our channel (#blazingsql) in the RAPIDS-GoAi Slack: .

You can also email us at info@blazingsql.com or find out more details on BlazingSQL.com.

License

Apache License 2.0

RAPIDS AI - Open GPU Data Science

The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Apache Arrow on GPU

The GPU version of Apache Arrow is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.

Name		Name	Last commit message	Last commit date
Latest commit History 4,156 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
algebra		algebra
ci		ci
comms		comms
conda/recipes/blazingsql		conda/recipes/blazingsql
engine		engine
io		io
pyblazing		pyblazing
scripts		scripts
tests		tests
thirdparty		thirdparty
.clang-format		.clang-format
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
conda-build-docker.sh		conda-build-docker.sh
conda_build.jenkinsfile		conda_build.jenkinsfile
print_env.sh		print_env.sh
test.sh		test.sh

License

quasiben/blazingsql

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Examples

Documentation

Install Using Conda

Stable Version

Nightly Version

Build/Install from Source (Conda Environment)

Stable Version

Install build dependencies

Build

Nightly Version

Install build dependencies

Build

Community

Contributing

Contact

License

RAPIDS AI - Open GPU Data Science

Apache Arrow on GPU

About

Resources

License

Stars

Watchers

Forks

Languages