Skip to content

fhgebara/pyTeraSort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

This package is a version of the terasort written in python for pyspark. The
output of the program produces valid sorted output sufficent to pass the
valsort official test. This program must be called with spark-submit


directory contains 3 files
	1) README --> this file
	2) pyTeraSort.py --> python spark implementation of terasort 
	3) teraval.py --> python terasort validation

Usage
	pyTeraSort.py
		pyTeraSort.py <input file or directory> <output dir> -options
		-inputPartitions <int> number of input partitions
		-outputPartitions <int> number of output partitions
		example command line 
			spark-submit --master local[16] pyTeraSort.py foo.txt out_dir -inputPartition 64 -outputPartition 64

About

python implementation of a spark terasort

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages