Skip to content

Find median for a large set of numbers with a 10X constraint on processing the numbers.

Notifications You must be signed in to change notification settings

vinayjain404/find-median

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Find Median of a given set of numbers without loading entire set into the memory.

Usage:
	python find_median.py <path-to-input-file>

Example:
	python find_median.py examples/input.data 
	
Input file format:
	The first line of the input file is number of elements in the given file.

Constraint:
- None of the functions process more then max_capacity numbers at a time
- This algorithm has higher time complexity in lieu of space constraint

Algorithm:
- Calculate the max capacity for the system (number of elements / 10)
- Split a give input file of numbers into smaller "m" sorted files each of size max
  capacity -- O(nlogn * n) 
- Create m input buffers each of size (max_capacity / m) corresponding to each split file -- O(n)
- Fill input buffers initially with (max_capacity / m) numbers from each file -- O(n)
- Merge the input buffer and write to output files via a buffer of max capacity
  size -- O(n)
- If a input buffer for a file is empty refill it and continue it until all the
  split files have been processed -- O(n * n)
- Return the median from the sorted numbers stored sequentially in output files
  -- O(1)

Time complexity:
	O(nlogn * n)

About

Find median for a large set of numbers with a 10X constraint on processing the numbers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages