Skip to content

alanhdu/blaze

 
 

Repository files navigation

Build Status Coverage Status

Blaze extends the usability of NumPy and Pandas to distributed and out-of-core computing. Blaze provides an interface similar to that of the NumPy ND-Array or Pandas DataFrame but maps these familiar interfaces onto a variety of other computational engines like Postgres or Spark.

Example

Blaze separates the computations that we want to perform:

>>> accounts = Symbol('accounts', 'var * {id: int, name: string, amount: int}')

>>> deadbeats = accounts[accounts.amount < 0].name

From the representation of data

>>> L = [[1, 'Alice',   100],
...      [2, 'Bob',    -200],
...      [3, 'Charlie', 300],
...      [4, 'Denis',   400],
...      [5, 'Edith',  -500]]

Blaze enables users to solve data-oriented problems

>>> list(compute(deadbeats, L))
['Bob', 'Edith']

But the separation of expression from data allows us to switch between different backends.

Here we solve the same problem using Pandas instead of Pure Python.

>>> df = DataFrame(L, columns=['id', 'name', 'amount'])

>>> compute(deadbeats, df)
1      Bob
4    Edith
Name: name, dtype: object

Blaze doesn't compute these results, Blaze intelligently drives other projects to compute them instead. These projects range from simple Pure Python iterators to powerful distributed Spark clusters. Blaze is built to be extended to new systems as they evolve.

Usable Abstractions

Blaze includes a rich set of computational and data primitives useful in building and communicating between computational systems. Blaze primitives can help with consistent and robust data migration, as well as remote execution.

Blaze aims to be a foundational project allowing many different users of other PyData projects (Pandas, Theano, Numba, SciPy, Scikit-Learn) to interoperate at the application level and at the library level with the goal of being able to to lift their existing functionality into a distributed context.

Getting Started

Development installation instructions available here. Quick usage available here.

Blaze is in development. We reserve the right to break the API.

Blaze needs your help. Blaze needs users with interesting problems. Blaze needs developers with expertise in new data formats and computational backends. Blaze needs core developers to tie everything together. Please e-mail the Mailing list.

Source code for the latest development version of blaze can be obtained from Github.

Documentation

Documentation is available at blaze.pydata.org/

License

Blaze development is sponsored by Continuum Analytics.

Released under BSD license. See LICENSE.txt for details.

About

NumPy and Pandas interface to Big Data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 72.1%
  • CSS 18.4%
  • JavaScript 7.2%
  • Shell 1.5%
  • Makefile 0.8%