fhgebara/pyTeraSort
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This package is a version of the terasort written in python for pyspark. The output of the program produces valid sorted output sufficent to pass the valsort official test. This program must be called with spark-submit directory contains 3 files 1) README --> this file 2) pyTeraSort.py --> python spark implementation of terasort 3) teraval.py --> python terasort validation Usage pyTeraSort.py pyTeraSort.py <input file or directory> <output dir> -options -inputPartitions <int> number of input partitions -outputPartitions <int> number of output partitions example command line spark-submit --master local[16] pyTeraSort.py foo.txt out_dir -inputPartition 64 -outputPartition 64
About
python implementation of a spark terasort
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published