Skip to content

cbaenziger/squealer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Squealer is a jython library for unit testing Apache Pig scripts.


Motivation
*****************************************
Starting with v0.8 Pig comes with a library known as PigUnit that can
be used for unit testing pig scripts from java (or any other JVM
compatible language).  This served as an inspiration and starting
point for Squealer but is suboptimal in several aspects which 
this library hopes to solve:

1) All equality comparisons are done by converting data to strings and 
checking for string equality.  This makes testing scripts doing floating
point arithmetic nearly impossible.  It also requires you to know the
ordering of records in a relation, in practice this can be somewhat
difficult for a number of operations in Pig.

2) PigUnit keeps a static reference to the Pig run time which prevents
the implementation of isolated tests.  This can result in passing tests 
that should be failing and vice versa.  This could also result in the
non-deterministic failing of tests that should be deterministically 
failing, as the ordering of test execution changes.

3) PigUnit currently only works with Pig 0.8+ (though anyone is free
to back port it).  However, due to the dynamic nature of python it 
is trivial to write a single library which supports multiple versions 
of Pig.

4) A general language preference of Python over java.  Also, this yak 
isn't going to shave itself.

Dependencies/Installation
*****************************************
Currently Squealer is tested against Pig 0.8.1 and Jython 2.5.2, as 
well as Pig 0.9.3 and Jython 2.5.3.  Though if you can confirm/deny it's 
feasibility on other versions this information would be highly appreciated.  
A version matrix is planned for the future to documented what combinations 
of these libraries are supported. Of note is that these versions are what 
is available in the Ubuntu 10.04 and Cloudera apt repositories.

Option 1:
Once the dependencies are installed, installation of Squealer is as simple
as dropping it into $JYTHONHOME/Lib

Option 2:
Install JIP (http://pypi.python.org/pypi/jip
Run: jython setup.py install 

Examples
*****************************************
In tests/test_examples.py is a test case that illustrates how to define
a unit test with Squealer and some common tasks you may want to perform
in one.


Debugging
*****************************************
1) After performing parameter subsitition and alias overriding, Squealer
writes out a temp file containing the actual pig code to be executed. In
the event of a test failure or error, the path to this temp file is 
printed with the error report to provide an ease in debugging the actual
script that was executed.

2) The majority of output from the Pig runtime which is useful in 
debugging your scripts is provided via apache logging mechnisims.  
Unfortunately, the majority of information providied via logging is 
quite unuseful.  For this reason, all logging is turned off by default.
However, if you see a failure message you do not understand you can use
the '-v' option when running the python file with your tests (assuming
you are using Squealer's main() function).


Common issues
*****************************************
1) You see the following error message:
ImportError: no module named squealer

This indicates that the Jython run time can not find your copy of the Squealer 
library.  A common mistake with jython is to set the PYTHONPATH environment 
variable and expect this to be used as is with CPython.  To specify the path
to jython use the -Dpython.path=... command line argument, alternatively a
common practice is to use a bash alias so that the PYTHONPATH variable is
used like it is in CPython:
alias jython="jython -Dpython.path=\$PYTHONPATH"

2) You see any of the following error messages:
ImportError: no module named org
ImportError: no module named apache
ImportError: no module named hadoop

The Jython run time can not find the jars containing the required Pig/Hoop
dependencies.  You will need to set your class path so that they can be 
found.  On a system with Ubuntu/Cloudera packages installed the following
should get you going:
export CLASSPATH=/usr/lib/pig/*:/usr/share/java/*:/usr/lib/hadoop/*

Contrib
*****************************************
Thanks to the following people:
Ning Sun: for configuring setup.py


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.7%
  • PigLatin 2.3%