- git
$Id$
- Release
0.0.1-dev.1
- Documentation
- Source
- PyPI repo
- Keywords
utility, library, data, tree, processing, calculation, dependencies, resolution, pandas, dictionaries, maps, lists, scientific, engineering
- Copyright
2015 European Commission (JRC-IET)
- License
pandalone is a python library for processing hierarchical data (json, hdf5, pandas), for scientific and engineering exploration.
An "execution" or a "run" of a calculation is depicted in the following diagram:
.---------------------. _____________ .----------------------------.
; DataTree ; | | ; DataTree ;
;---------------------; ==> | <some code> | ==> ;----------------------------;
; ; ; ;
'---------------------' '----------------------------.
The Input & Output Data are instances of data-tree
, trees of strings and numbers, assembled with:
- sequences,
- dictionaries,
pandas.DataFrame
,pandas.Series
, and- URI-references to other data-trees/paths.
Note
The program runs on Python-2.7+ and Python-3.3+ (preferred) and requires numpy/scipy, pandas and win32 libraries along with their native backends to be installed. If you do not have such an environment already installed, please read install
section below for suitable distributions such as WinPython_ or Anaconda_.
Assuming that you have a working python-environment, open a command-shell, (in Windows use cmd.exe
BUT ensure python.exe
is in its PATH
), you can try the following commands:
Tip
The commands beginning with $
, below, imply a Unix like operating system with a POSIX shell (Linux, OS X). Although the commands are simple and easy to translate in its Windows cmd.exe
counterpart, it would be worthwile to install Cygwin to get the same environment on Windows. If you choose to do that, include also the following packages in the Cygwin's installation wizard:
* git, git-completion
* make, zip, unzip, bzip2, dos2unix
* openssh, curl, wget
But do not install/rely on cygwin's outdated python environment.
- Install
$ pip install pandalone ## Use `--pre` if version-string has a build-suffix.
Or in case you need the very latest from master branch :
$ pip install git+https://github.com/pandalone/pandalone.git
See:
install
- Run
$ pandalone --version
Current version() runs on Python-2.7+ and Python-3.3+ and requires numpy/scipy, pandas and win32 libraries along with their native backends to be installed.
It has been tested under Windows and Linux and Python-3.3+ is the preferred interpreter, i.e, the Excel interface and desktop-UI runs only with it.
It is distributed on Wheels.
Warning
On Windows it is strongly suggested NOT to install the standard CPython distribution, unless:
- you have administrative priviledges,
- you are an experienced python programmer, so that
- you know how to hunt dependencies from PyPi repository and/or the Unofficial Windows Binaries for Python Extension Packages.
As explained above, this project depends on packages with native-backends that require the use of C and Fortran compilers to build from sources. To avoid this hassle, you should choose one of the user-friendly distributions suggested below.
Below is a matrix of the two suggested self-wrapped python distributions for running this program (we excluded here default python included in linux). Both distributions:
- are free (as of freedom),
- do not require admin-rights for installation in Windows, and
- have been tested to run successfully this program (also tested on default linux distros).
Distributions | WinPython_ | Anaconda_ |
---|---|---|
Platform | Windows | Windows, Mac OS, Linux |
Ease of Installation |
Fair (requires fiddling with the and the Registry after install) |
|
Ease of Use | Easy | Moderate (should use depending on whether a package contains native libraries |
# of Packages | Only what's included in the downloaded-archive |
Many 3rd-party packages uploaded by users |
Notes | After installation, see
|
|
Check also installation instructions from ` <http://pandas.pydata.org/pandas-docs/stabl | the pandas site e/install.html>`_. |
Before installing it, make sure that there are no older versions left over on the python installation you are using. To cleanly uninstall it, run this command until you cannot find any project installed:
$ pip uninstall pandalone ## Use `pip3` if both python-2 & 3 are in PATH.
You can install the project directly from the PyPi repo_ the "standard" way, by typing the pip
in the console:
$ pip install pandalone
- If you want to install a pre-release version (the version-string is not plain numbers, but ends with
alpha
,beta.2
or something else), use additionally--pre
.
$ pip install pandalone
Also you can install the very latest version straight from the sources:
$ pip install git+git://github.com/pandalone/pandalone.git --pre
- If you want to upgrade an existing instalation along with all its dependencies, add also
--upgrade
(or-U
equivalently), but then the build might take some considerable time to finish. Also there is the possibility the upgraded libraries might break existing programs(!) so use it with caution, or from within a virtualenv (isolated Python environment)_. - To install it for different Python environments, repeat the procedure using the appropriate
python.exe
interpreter for each environment. Tip
To debug installation problems, you can export a non-empty
DISTUTILS_DEBUG
and distutils will print detailed information about what it is doing and/or print the whole command line when an external program (like a C compiler) fails.
After installation, it is important that you check which version is visible in your PATH
:
$ pandalone --version
0.0.1-dev.1
To install for different Python versions, repeat the procedure for every required version.
To install an older released version issue the console command:
$ pip install pandalone=0.0.1 ## Use `--pre` if version-string has a build-suffix.
or alternatively straight from the sources:
$ pip install git+https://github.com/pandalone/pandalone.git@v0.0.9-alpha.3.1 --pre
Of course you can substitute v0.0.9-alpha.3.1 with any slug from "commits", "branches" or "releases" that you will find on project's github-repo).
Note
If you have another version already installed, you have to use --ignore-installed
(or -I
). For using the specific version, check this (untested) stackoverflow question.
You can install each version in a separate virtualenv (isolated Python environment)_ and shy away from all this. Check
If you download the sources you have more options for installation. There are various methods to get hold of them:
- Download the source distribution from PyPi repo_.
- Download a release-snapshot from github
-
Clone the git-repository at github.
Assuming you have a working installation of git you can fetch and install the latest version of the project with the following series of commands:
When working with sources, you need to have installed all libraries that the project depends on:
$ pip install -r requirements/execution.txt .
The previous command installs a "snapshot" of the project as it is found in the sources. If you wish to link the project's sources with your python environment, install the project in development mode:
$ python setup.py develop
Note
This last command installs any missing dependencies inside the project-folder.
The files and folders of the project are listed below:
+--pandalone/ ## (package) The python-code of the calculator
+--tests/ ## (package) Test-cases
+--docs/ ## Documentation folder
+--setup.py ## (script) The entry point for `setuptools`, installing, testing, etc
+--requirements/ ## (txt-files) Various pip-dependencies for tools.
+--README.rst
+--CHANGES.rst
+--LICENSE.txt
Warning
Not implemented in yet.
The command-line usage below requires the Python environment to be installed, and provides for executing an experiment directly from the OS's shell (i.e. cmd
in windows or bash
in POSIX), and in a single command.
[TBD]
Attention
Desktop UI requires Python 3!
For a quick-'n-dirty method to explore the structure of the data-tree and run an experiment, just run:
$ pandalone gui
Attention
Excel-integration requires Python-3 and Windows or OS X!
In Windows and OS X you may utilize the excellent xlwings library to use Excel files for providing input and output to the experiment.
To create the necessary template-files in your current-directory you should enter:
$ pandalone excel
You could type instead pandalone excel {file_path}
to specify a different destination path.
[TBD]
Example python REPL (Read-Eval-Print Loop)
example-commands are given below that setup and run an experiment.
First run python
or ipython
and try to import the project to check its version:
>>> import pandalone
>>> pandalone.__version__ ## Check version once more. '0.0.1-dev.1'
>>> pandalone.__file__ ## To check where it was installed. # doctest: +SKIP /usr/local/lib/site-package/pandalone-...
If everything works, create the data-tree
to hold the input-data (strings and numbers). You assemble data-tree by the use of:
- sequences,
- dictionaries,
pandas.DataFrame
,pandas.Series
, and- URI-references to other data-trees.
[TBD]
This project is hosted in github. To provide feedback about bugs and errors or questions and requests for enhancements, use github's Issue-tracker.
To get involved with development, you need a POSIX environment to fully build it (Linux, OSX or Cygwin on Windows).
First you need to download the latest sources:
$ git clone https://github.com/pandalone/pandalone.git pandalone.git
$ cd pandalone.git
Virtualenv
You may choose to work in a virtualenv (isolated Python environment)_, to install dependency libraries isolated from system's ones, and/or without admin-rights (this is recommended for Linux/Mac OS).
Attention
If you decide to reuse stystem-installed packages using --system-site-packages
with virtualenv <= 1.11.6
(to avoid, for instance, having to reinstall numpy and pandas that require native-libraries) you may be bitten by bug #461 which prevents you from upgrading any of the pre-installed packages with pip
.
Liclipse IDE
Within the sources there are two sample files for the comprehensive LiClipse IDE:
eclipse.project
eclipse.pydevproject
Remove the eclipse prefix, (but leave the dot(.)) and import it as "existing project" from Eclipse's File menu.
Another issue is caused due to the fact that LiClipse contains its own implementation of Git, EGit, which badly interacts with unix symbolic-links, such as the docs/docs
, and it detects working-directory changes even after a fresh checkout. To workaround this, Right-click on the above file Properties --> Team --> Advanced --> Assume Unchanged
Then you can install all project's dependencies in `development mode using the setup.py
script:
$ python setup.py --help ## Get help for this script.
Common commands: (see '--help-commands' for more)
setup.py build will build the package underneath 'build/'
setup.py install will install the package
Global options:
--verbose (-v) run verbosely (default)
--quiet (-q) run quietly (turns verbosity off)
--dry-run (-n) don't actually do anything
...
$ python setup.py develop ## Also installs dependencies into project's folder.
$ python setup.py build ## Check that the project indeed builds ok.
You should now run the test-cases (see metrics
) to check that the sources are in good shape:
$ python setup.py test
Note
The above commands installed the dependencies inside the project folder and for the virtual-environment. That is why all build and testing actions have to go through python setup.py {some_cmd}
.
If you are dealing with installation problems and/or you want to permantly install dependant packages, you have to deactivate the virtual-environment and start installing them into your base python environment:
$ deactivate
$ python setup.py develop
or even try the more permanent installation-mode:
$ python setup.py install # May require admin-rights
See architecture live-document.
- These are the knowngly related python projects:
- OpenMDAO:
It has influenced pandalone's design. It is planned to interoperate by converting to and from it's data-types. But it works on python-2 only and its architecture needs attending from programmers (no setup.py, no official test-cases).
- PyDSTool:
It does not overlap, since it does not cover IO and dependencies of data. Also planned to interoperate with it (as soon as we have a better grasp of it :-). It has some issues with the documentation, but they are working on it.
- xray:
pandas for higher dimensions; should in principle work "xray" data-trees.
- netCDF4:
Hierarchical file-data-format similar to hdf5.
- hdf5:
Hierarchical file-data-format, supported natively by pandas.
- data-tree
The container of data that the gear-shift calculator consumes and produces. It is implemented by
pandalone.pandata.Pandel
as a mergeable stack ofJSON-schema
abiding trees of strings and numbers, formed with sequences, dictionaries,pandas
-instances and URI-references.- JSON-schema
The JSON schema is an IETF draft that provides a contract for what JSON-data is required for a given application and how to interact with it. JSON Schema is intended to define validation, documentation, hyperlink navigation, and interaction control of JSON data. You can learn more about it from this excellent guide, and experiment with this on-line validator.
- JSON-pointer
JSON Pointer(
6901
) defines a string syntax for identifying a specific value within a JavaScript Object Notation (JSON) document. It aims to serve the same purpose as XPath from the XML world, but it is much simpler.