Skip to content

kn45/pymr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyMR

PyMR is a small python wrapper for map-reduce programming in hadoop streaming mode. It emulates many operations as Java MR framework. One can simply override map() or reduce() method to make code running in streaming mode.
https://github.com/kn45/PyMR

Features

Never bother with input, group by key, which have been handled by PyMR. Just override such method:

  • Mapper:
    _setup(self), _map(self, inl), _cleanup(self)
  • Reducer:
    _setup(self), _reduce(self, key, values), _cleanup(self)
    self.delim, self.key_fields

Example

cat test_in | python test.py m | sort | python test.py r

Todo

Lots. Add more functions such as counter, env_getter etc.

About

Python map-reduce framework used in hadoop streaming

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages