Skip to content

depristo/pygg

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pygg

ggplot2 syntax in python. Actually wrapper around Wickham's ggplot2 in R

Particularly good if you have preprocessed CSVs or Postgres data to render. Passable support for simple data in python lists, dictionaries, and panda DataFrame objects

pygg allows you to use ggplot2 syntax nearly verbatim in Python, and execute the ggplot program in R. Since this is just a wrapper and passes all arguments to the R backend, it is almost completely API compatible.

For a nearly exhaustive list of supported ggplot2 functions, see pygg/gen_cmd.py

Setup and Usage

Setup

  • install R
# on osx
brew install R

# on unix e.g., ubuntu
sudo apt-get install R
  • install R packages (run the following in the R shell)
install.packages("ggplot2")
install.packages("RPostgreSQL")   # optional

Install

pip install pygg

Command line usage

runpygg.py --help
runpygg.py -c "ggplot('diamonds', aes('carat', 'price')) + geom_point()" -o test.pdf
runpygg.py -c "ggplot('diamonds', aes('carat', 'price')) + geom_point()" -csv foo.csv

For Python usage, see tests/example.py

from pygg import *

# Example using diamonds dataset (comes with ggplot2)
p = ggplot('diamonds', aes('carat', y='price'))
g = geom_point() + facet_wrap(None, "color")
ggsave("test1.pdf", p+g, data=None)

Quirks to be aware of

The library performs a simple syntactic translation from python ggplot objects to R code. Because of this, there are some quirks regarding datasets and how we deal with strings.

Datasets

In R, ggplot directly references the data frame object present in the runtime (e.g., ggplot(<datasetname>, aes(...)). However, the python objects being plotted are not directly available in the R runtime.
We get around by providing a data object data argument to ggsave, which converts the data object to a suitable CSV file, writes it to a temp file, and loads it into the data variable in R for use with the ggplot2 functions

For example:

    df = pandas.DataFrame(...)
    p = ggplot(data, aes(...)) + geom_point()
    ggsave(p, "out.pdf", data=df)

In addition, we provide several convenience functions that generate the appropriate R code for common python dataset formats:

  • csv file: if you have a CSV file already, provide the filename to data
        p = ggplot(data, aes(...)) + geom_point()
        ggsave(p, "out.pdf", data="file.csv")
  • python object: if your data is a python object in columnar ({x: [1,2], y: [3,4]}) or row ([{x:1,y:3}, {x:2,y:4}]) format
        p = ggplot(data, aes(...)) + geom_point()
        ggsave(p, "out.pdf", data={'x': [1,2], 'y': [3,4]})
  • pandas dataframe: if your data is a pandas data frame object already you can just provide the dataframe df directly to data
        p = ggplot(data, aes(...)) + geom_point()
        ggsave(p, "out.pdf", data=df)
  • PostgresQL: if your data is stored in a postgres database
        p = ggplot(data, aes(...)) + geom_point()
        ggsave(p, "out.pdf", data=data_sql('DBNAME', 'SELECT * FROM ...')
  • existing R datasets: can you refer to any dataframe object
        p = ggplot('diamonds', aes(...)) + geom_point()
        ggsave(p, "out.pdf", data=None)

String arguments

By default, the library directly prints a python string argument into the R code string. For example the following python code to set the x axis label would generate incorrect R code:

    # incorrect python code
    scales_x_continuous(name="special label")

    # incorrect generated R code
    scales_x_continuous(name=special label)

    # correct python code
    scales_x_continuous(name="'special label'")

    # correct generated R code
    scales_x_continuous(name='special label')

    # less convenient but more explicit alternative syntax
    scales_x_continuous(name=pygg.esc('special label'))

You'll need to explicitly wrap these types of strings (intended as R strings) in a layer of quotes. For convenience, we automatically provide wrapping for common functions:

    # "filename.pdf" is wrapped
    ggsave(p, "filename.pdf")

Questions

Alternatives

  • yhat's ggplot: yhat's port of ggplot is really awesome. It runs everything natively in python, works with numpy data structures, and renders using matplotlib. pygg exists partly due to personal preference, and partly because the R version of ggplot2 is more mature, and its layout algorithms are really really good.

  • pyggplot: Pyggplot does not adhere strictly to R's ggplot syntax but pythonifies it, making it harder to transpose ggplot2 examples. Also pyggplot requires rpy2.

About

ggplot2 syntax in python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.8%
  • R 2.2%