Skip to content
/ ibis Public
forked from wesm/ibis

Productivity-centric Python big data framework for high performance at Hadoop-scale, with first-class integration with Impala. Co-founded by the creator of pandas

License

Notifications You must be signed in to change notification settings

obswork/ibis

 
 

Repository files navigation

codecov.io

Ibis: Python data analysis framework for Hadoop and SQL engines

Install Ibis from PyPI with:

$ pip install ibis-framework

Ibis is a Python data analysis library with a handful of related goals:

  • Enable data analysts to translation analytics on SQL engines to Python code instead of the SQL code.
  • Provide high level analytics APIs and workflow tools to accelerate productivity.
  • Provide high performance extensions for the Impala MPP query engine to enable high performance Python code to operate in a scalable Hadoop-like environment
  • Abstract away database-specific SQL differences
  • Integrate with the Python data ecosystem using the above tools

At this time, Ibis supports the following SQL-based systems:

  • Impala (on HDFS)
  • SQLite

Ibis is being designed and led by the creator of pandas (github.com/pydata/pandas) and is intended to have a familiar user interface for folks used to small data on single machines in Python.

Architecturally, Ibis features:

  • A pandas-like domain specific language (DSL) designed specifically for analytics, aka Ibis expressions, that enable composable, reusable analytics on structured data. If you can express something with a SQL SELECT query, you can write it with Ibis.
  • A translation system that targets multiple SQL systems
  • Tools for wrapping user-defined functions in Impala and eventually other SQL engines

SQL engine support near on the horizon:

  • PostgreSQL
  • Redshift
  • Vertica
  • Spark SQL
  • Presto
  • Hive
  • MySQL / MariaDB

Read the project blog at http://blog.ibis-project.org.

Learn much more at http://ibis-project.org.

About

Productivity-centric Python big data framework for high performance at Hadoop-scale, with first-class integration with Impala. Co-founded by the creator of pandas

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 95.1%
  • C++ 3.8%
  • Shell 0.5%
  • C 0.3%
  • CMake 0.2%
  • Batchfile 0.1%