Skip to content

seanjensengrey/hadoopy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brandyn White <bwhite@dappervision.com>
Andrew Miller <amiller@dappervision.com>

Contributors


Source  https://github.com/bwhite/hadoopy/
Issues  https://github.com/bwhite/hadoopy/issues
Docs    http://bwhite.github.com/hadoopy/

IRC: #hadoopy @ freenode.net

Requirements
python development headers (python-dev), build tools (build-essential)

Optional
cython (>=.13) (without this it falls back to the pregenerated .c files)

Features
- oozie support
- typedbytes support (very fast)
- critical path is in Cython
- works on clusters without any extra installation, Python, or any Python libraries (uses Pyinstaller that is included in this source tree)
- Simple HDFS access (cat and ls) inside Python, even inside running jobs
- Unit test interface
- Reporting using status and counters
- Supports design patterns in the Lin/Dyer book (http://www.umiacs.umd.edu/~jimmylin/book.html)

Used in
- A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords (to appear in WWW'11)
- Web-Scale Computer Vision using MapReduce for Multimedia Data Mining (at KDD'10)
- Vitrieve: Visual Search engine

Ubuntu Install (others are similar)
sudo apt-get install python-dev build-essential
sudo python setup.py install

About

Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 58.5%
  • Python 41.5%