We implemented two parallel versions of the Max-Flow Min-Cut algorithm: one in MapReduce and one in MPI. Both implementations are based off the Ford Fulkerson algorithm. We use the MapReduce version to perform binary image segmentation (separating background from foreground). We also did a serial implementation to give us a baseline for timing.
Team: Joshua Lee and Mona Huang
Teaching Fellow: Verena Kaynig-Fittkau
Class: CS205, Harvard University, Fall 2012
All dependencies are located locally inside the project directory. There should not be any required steps for setup. If you run into issues, see the dependencies section below.
Binary Image Segmentation. Relabels foreground as black and background as white using our map reduce min cut algorithm (outputs binary image):
python segment.py <input_image_path> <output_image_path>
Map Reduce Max-Flow (returns max-flow):
python driver.py <in_file_path>
MPI Max-Flow (returns max-flow):
mpirun -n 4 python mpi.py <in_file_path>
Serial Max-Flow (returns max-flow):
python serial.py <in_file_path>
- See header comments in files for more in-depth description.
input_image_path
is a path to .jpg or .pngin_file_path
is graph in adjacency list format (see "graph file format" section for example)- You can find test graphs in the graphs directory.
See header comments in files for more in-depth description.
driver.py
is the driver for our MapReduce max-flow implementation. It handles file to graph conversation, reading/writing intermediate output between MapReduce iterations, and calculating the max flow and cut of the residual graph.
max_flow.py
is the MRJob class used by driver.py
accumulator.py
is a helper class used by max_flow.py
. It is responsible for ensuring we accept only valid paths that do not violate any capacity constraints.
mpi.py
is our MPI max-flow implementation
segment.py
segments foreground and background of input image using MapReduce max-flow implementation.
image_processor.py
is a helper file used by segment.py
to convert an image into a graph file in adjacency list format.
serial.py
is our max-flow serial implementation
test.py
uses a python-graph library to implement the max-flow algorithm. Used for testing.
timing.py
runs the serial, MapReduce, and MPI max-flow implementations on various graphs. Serves as a testing and timing module.
graphs/
directory for sample input graphs for max flow
images/
directory for sample input images for segmentation
tmp/
directory for temporary files used by segment.py and driver.py (e.g. intermediate mapreduce input/output)
scripts/
directory for helper scripts e.g. test graph generation
docs/
directory helper documents e.g. papers describing max flow min cut algorithm
library/
directory for external libraries
These packages came preinstalled on the CS205 VirtualBox and are readily available on Resonance Nodes.
We have placed these dependencies in our project's library directory. Our project imports these local files, so you should not get any import errors.
If you do run into import errors from python-graph, you may have to run the following install command from our project root directory. You will need sudo permissions:
cd project_root
cd library/python-graph/core/
sudo python setup.py install
"vertex_id" \t [["neighbor_id_1", edge_capacity_1], ["neighbor_id_2", edge_capacity_2], ...]
Note these (key, value) pairs must be tab separated or our JSON reader won't work.
"s" [["1", 2], ["2", 2]]
"0" [["s", 7], ["t", 4], ["1", 3], ["2", 1]]
"1" [["s", 6], ["t", 2]]
"2" [["s", 10], ["0", 8], ["1", 2]]
"t" []