Skip to content

silky/pandalone

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

git

$Id$

pandalone: wrapping pandas in trees

Development Status Integration-build status cover-status Documentation status Latest Version in PyPI Downloads Issues count

Release

0.0.1-dev.1

Documentation

https://pandalone.readthedocs.org/

Source

https://github.com/pandalone/pandalone

PyPI repo

https://pypi.python.org/pypi/pandalone

Keywords

utility, library, data, tree, processing, calculation, dependencies, resolution, pandas, dictionaries, maps, lists, scientific, engineering

Copyright

2015 European Commission (JRC-IET)

License

EUPL 1.1+

pandalone is a python library for processing hierarchical data (json, hdf5, pandas), for scientific and engineering exploration.

Introduction

Overview

An "execution" or a "run" of a calculation is depicted in the following diagram:

.---------------------.     _____________       .----------------------------.

; DataTree ; | | ; DataTree ;

;---------------------; ==> | <some code> | ==> ;----------------------------;

; ; ; ;

'---------------------' '----------------------------.

The Input & Output Data are instances of data-tree, trees of strings and numbers, assembled with:

  • sequences,
  • dictionaries,
  • pandas.DataFrame,
  • pandas.Series, and
  • URI-references to other data-trees/paths.

Quick-start

Note

The program runs on Python-2.7+ and Python-3.3+ (preferred) and requires numpy/scipy, pandas and win32 libraries along with their native backends to be installed. If you do not have such an environment already installed, please read install section below for suitable distributions such as WinPython_ or Anaconda_.

Assuming that you have a working python-environment, open a command-shell, (in Windows use cmd.exe BUT ensure python.exe is in its PATH), you can try the following commands:

Tip

The commands beginning with $, below, imply a Unix like operating system with a POSIX shell (Linux, OS X). Although the commands are simple and easy to translate in its Windows cmd.exe counterpart, it would be worthwile to install Cygwin to get the same environment on Windows. If you choose to do that, include also the following packages in the Cygwin's installation wizard:

* git, git-completion
* make, zip, unzip, bzip2, dos2unix
* openssh, curl, wget

But do not install/rely on cygwin's outdated python environment.

Install
$ pip install pandalone                 ## Use `--pre` if version-string has a build-suffix.

Or in case you need the very latest from master branch :

$ pip install git+https://github.com/pandalone/pandalone.git

See: install

Run
$ pandalone --version

Install

Current version() runs on Python-2.7+ and Python-3.3+ and requires numpy/scipy, pandas and win32 libraries along with their native backends to be installed.

It has been tested under Windows and Linux and Python-3.3+ is the preferred interpreter, i.e, the Excel interface and desktop-UI runs only with it.

It is distributed on Wheels.

Python installation

Warning

On Windows it is strongly suggested NOT to install the standard CPython distribution, unless:

  1. you have administrative priviledges,
  2. you are an experienced python programmer, so that
  3. you know how to hunt dependencies from PyPi repository and/or the Unofficial Windows Binaries for Python Extension Packages.

As explained above, this project depends on packages with native-backends that require the use of C and Fortran compilers to build from sources. To avoid this hassle, you should choose one of the user-friendly distributions suggested below.

Below is a matrix of the two suggested self-wrapped python distributions for running this program (we excluded here default python included in linux). Both distributions:

  • are free (as of freedom),
  • do not require admin-rights for installation in Windows, and
  • have been tested to run successfully this program (also tested on default linux distros).
Distributions WinPython_ Anaconda_
Platform Windows Windows, Mac OS, Linux

Ease of

Installation

Fair

(requires fiddling with the PATH

and the Registry after install)

  • Anaconda: Easy
  • MiniConda: Moderate
Ease of Use Easy

Moderate

(should use conda and/or pip

depending on whether a package

contains native libraries

# of Packages

Only what's included

in the downloaded-archive

Many 3rd-party packages

uploaded by users

Notes

After installation, see faq for:

  • Registering WinPython installation
  • Adding your installation in PATH
Check also installation instructions from ` <http://pandas.pydata.org/pandas-docs/stabl the pandas site e/install.html>`_.

Package installation

Before installing it, make sure that there are no older versions left over on the python installation you are using. To cleanly uninstall it, run this command until you cannot find any project installed:

$ pip uninstall pandalone                   ## Use `pip3` if both python-2 & 3 are in PATH.

You can install the project directly from the PyPi repo_ the "standard" way, by typing the pip in the console:

$ pip install pandalone
  • If you want to install a pre-release version (the version-string is not plain numbers, but ends with alpha, beta.2 or something else), use additionally --pre.
$ pip install pandalone
  • Also you can install the very latest version straight from the sources:

    $ pip install git+git://github.com/pandalone/pandalone.git  --pre
  • If you want to upgrade an existing instalation along with all its dependencies, add also --upgrade (or -U equivalently), but then the build might take some considerable time to finish. Also there is the possibility the upgraded libraries might break existing programs(!) so use it with caution, or from within a virtualenv (isolated Python environment)_.
  • To install it for different Python environments, repeat the procedure using the appropriate python.exe interpreter for each environment.
  • Tip

    To debug installation problems, you can export a non-empty DISTUTILS_DEBUG and distutils will print detailed information about what it is doing and/or print the whole command line when an external program (like a C compiler) fails.

After installation, it is important that you check which version is visible in your PATH:

$ pandalone --version
0.0.1-dev.1

To install for different Python versions, repeat the procedure for every required version.

Older versions

To install an older released version issue the console command:

$ pip install pandalone=0.0.1                   ## Use `--pre` if version-string has a build-suffix.

or alternatively straight from the sources:

$ pip install git+https://github.com/pandalone/pandalone.git@v0.0.9-alpha.3.1  --pre

Of course you can substitute v0.0.9-alpha.3.1 with any slug from "commits", "branches" or "releases" that you will find on project's github-repo).

Note

If you have another version already installed, you have to use --ignore-installed (or -I). For using the specific version, check this (untested) stackoverflow question.

You can install each version in a separate virtualenv (isolated Python environment)_ and shy away from all this. Check

Installing sources

If you download the sources you have more options for installation. There are various methods to get hold of them:

When working with sources, you need to have installed all libraries that the project depends on:

$ pip install -r requirements/execution.txt .

The previous command installs a "snapshot" of the project as it is found in the sources. If you wish to link the project's sources with your python environment, install the project in development mode:

$ python setup.py develop

Note

This last command installs any missing dependencies inside the project-folder.

Project files and folders

The files and folders of the project are listed below:

+--pandalone/       ## (package) The python-code of the calculator
+--tests/           ## (package) Test-cases
+--docs/            ## Documentation folder
+--setup.py         ## (script) The entry point for `setuptools`, installing, testing, etc
+--requirements/    ## (txt-files) Various pip-dependencies for tools.
+--README.rst
+--CHANGES.rst
+--LICENSE.txt

Usage

Cmd-line usage

Warning

Not implemented in yet.

The command-line usage below requires the Python environment to be installed, and provides for executing an experiment directly from the OS's shell (i.e. cmd in windows or bash in POSIX), and in a single command.

[TBD]

GUI usage

Attention

Desktop UI requires Python 3!

For a quick-'n-dirty method to explore the structure of the data-tree and run an experiment, just run:

$ pandalone gui

Excel usage

Attention

Excel-integration requires Python-3 and Windows or OS X!

In Windows and OS X you may utilize the excellent xlwings library to use Excel files for providing input and output to the experiment.

To create the necessary template-files in your current-directory you should enter:

$ pandalone excel

You could type instead pandalone excel {file_path} to specify a different destination path.

[TBD]

Python usage

Example python REPL (Read-Eval-Print Loop) example-commands are given below that setup and run an experiment.

First run python or ipython and try to import the project to check its version:

>>> import pandalone

>>> pandalone.__version__ ## Check version once more. '0.0.1-dev.1'

>>> pandalone.__file__ ## To check where it was installed. # doctest: +SKIP /usr/local/lib/site-package/pandalone-...

If everything works, create the data-tree to hold the input-data (strings and numbers). You assemble data-tree by the use of:

  • sequences,
  • dictionaries,
  • pandas.DataFrame,
  • pandas.Series, and
  • URI-references to other data-trees.

[TBD]

Getting Involved

This project is hosted in github. To provide feedback about bugs and errors or questions and requests for enhancements, use github's Issue-tracker.

Sources & Dependencies

To get involved with development, you need a POSIX environment to fully build it (Linux, OSX or Cygwin on Windows).

First you need to download the latest sources:

$ git clone https://github.com/pandalone/pandalone.git pandalone.git
$ cd pandalone.git

Virtualenv

You may choose to work in a virtualenv (isolated Python environment)_, to install dependency libraries isolated from system's ones, and/or without admin-rights (this is recommended for Linux/Mac OS).

Attention

If you decide to reuse stystem-installed packages using --system-site-packages with virtualenv <= 1.11.6 (to avoid, for instance, having to reinstall numpy and pandas that require native-libraries) you may be bitten by bug #461 which prevents you from upgrading any of the pre-installed packages with pip.

Liclipse IDE

Within the sources there are two sample files for the comprehensive LiClipse IDE:

  • eclipse.project
  • eclipse.pydevproject

Remove the eclipse prefix, (but leave the dot(.)) and import it as "existing project" from Eclipse's File menu.

Another issue is caused due to the fact that LiClipse contains its own implementation of Git, EGit, which badly interacts with unix symbolic-links, such as the docs/docs, and it detects working-directory changes even after a fresh checkout. To workaround this, Right-click on the above file Properties --> Team --> Advanced --> Assume Unchanged

Then you can install all project's dependencies in `development mode using the setup.py script:

$ python setup.py --help                           ## Get help for this script.
Common commands: (see '--help-commands' for more)

  setup.py build      will build the package underneath 'build/'
  setup.py install    will install the package

Global options:
  --verbose (-v)      run verbosely (default)
  --quiet (-q)        run quietly (turns verbosity off)
  --dry-run (-n)      don't actually do anything
...

$ python setup.py develop                           ## Also installs dependencies into project's folder.
$ python setup.py build                             ## Check that the project indeed builds ok.

You should now run the test-cases (see metrics) to check that the sources are in good shape:

$ python setup.py test

Note

The above commands installed the dependencies inside the project folder and for the virtual-environment. That is why all build and testing actions have to go through python setup.py {some_cmd}.

If you are dealing with installation problems and/or you want to permantly install dependant packages, you have to deactivate the virtual-environment and start installing them into your base python environment:

$ deactivate
$ python setup.py develop

or even try the more permanent installation-mode:

$ python setup.py install                # May require admin-rights

Development procedure

Authors

Design

See architecture live-document.

FAQ

Why another XXX? What about YYY?

  • These are the knowngly related python projects:
    OpenMDAO:

    It has influenced pandalone's design. It is planned to interoperate by converting to and from it's data-types. But it works on python-2 only and its architecture needs attending from programmers (no setup.py, no official test-cases).

    PyDSTool:

    It does not overlap, since it does not cover IO and dependencies of data. Also planned to interoperate with it (as soon as we have a better grasp of it :-). It has some issues with the documentation, but they are working on it.

    xray:

    pandas for higher dimensions; should in principle work "xray" data-trees.

    netCDF4:

    Hierarchical file-data-format similar to hdf5.

    hdf5:

    Hierarchical file-data-format, supported natively by pandas.

Glossary

data-tree

The container of data that the gear-shift calculator consumes and produces. It is implemented by pandalone.pandata.Pandel as a mergeable stack of JSON-schema abiding trees of strings and numbers, formed with sequences, dictionaries, pandas-instances and URI-references.

JSON-schema

The JSON schema is an IETF draft that provides a contract for what JSON-data is required for a given application and how to interact with it. JSON Schema is intended to define validation, documentation, hyperlink navigation, and interaction control of JSON data. You can learn more about it from this excellent guide, and experiment with this on-line validator.

JSON-pointer

JSON Pointer(6901) defines a string syntax for identifying a specific value within a JavaScript Object Notation (JSON) document. It aims to serve the same purpose as XPath from the XML world, but it is much simpler.

About

Wrapping pandas in trees and processing them with dependencies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%