Skip to content



Repository files navigation


Sparkmagic is a set of tools for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. The Sparkmagic project includes a set of magics for interactively running Spark code in multiple languages, as well as some kernels that you can use to turn Jupyter into an integrated Spark environment.

Automatic SparkContext and SQLContext creation

Automatic visualization



  • Run Spark code in multiple languages against any remote Spark cluster through Livy

  • Automatic visualization of SQL queries with the %%sql magic in the PySpark and Spark kernels; use an easy visual interface to interactively construct visualizations, no code required

  • Capture the output of SQL queries as Pandas dataframes to work with them on your local machine


Check out the examples directory.


  1. Install the library

     pip install sparkmagic
  2. Make sure that ipywidgets is properly installed by running

     jupyter nbextension enable --py --sys-prefix widgetsnbextension 
  3. (Optional) Install the wrapper kernels. Do pip show sparkmagic and it will show the path where sparkmagic is installed at. cd to that location and do:

     jupyter-kernelspec install sparkmagic/kernels/sparkkernel
     jupyter-kernelspec install sparkmagic/kernels/pysparkkernel
  4. (Optional) Modify the configuration file at ~/.sparkmagic/config.json. Look at the example_config.json


Sparkmagic uses Livy, a REST server for Spark, to remotely execute all user code. The library then automatically collects the output of your code as plain text or a JSON document, displaying the results to you as formatted text or as a Pandas dataframe as appropriate.


This architecture offers us some important advantages:

  1. Run Spark code completely remotely; no Spark components need to be installed on the Jupyter server

  2. Multi-language support; the Python and Scala kernels are equally feature-rich, and adding support for more languages will be easy

  3. Support for multiple endpoints; you can use a single notebook to start multiple Spark jobs in different languages and against different remote clusters

  4. Easy integration with any Python library for data science or visualization, like Pandas or Plotly

However, there are some important limitations to note:

  1. Some overhead added by sending all code and output through Livy

  2. Since all code is run on a remote driver through Livy, all structured data must be serialized to JSON and parsed by the Sparkmagic library so that it can be manipulated and visualized on the client side. In practice this means that you must use Python for client-side data manipulation in %%local mode.


We welcome contributions from everyone. If you've made an improvement to our code, please send us a pull request.

To dev install, execute the following:

    git clone
    pip install -e hdijupyterutils 
    pip install -e autovizwidget
    pip install -e sparkmagic

and optionally follow steps 3 and 4 above.

To run unit tests, run:

    nosetests hdijupyterutils autovizwidget sparkmagic

If you want to see an enhancement made but don't have time to work on it yourself, feel free to submit an issue for us to deal with.


Jupyter magics and kernels for working with remote Spark clusters







No releases published


No packages published


  • Python 97.0%
  • Jupyter Notebook 2.8%
  • Other 0.2%