A distributed computing library.
This library was implemented when I started learning about socket programming and distributed computing (January 2017). Its functionality is pretty good. However, the implementation is naive and contains a serious security hole. Also it's better to use MPI. This project is now an antique that reminds me what I've been through during my education.
This library contains 4 main functions : node_id
, n_node
, send
, recv
. Using these 4 functions, one can parallelize a program across multi machines.
You can send / receive many types of data (list, dict, set, numpy array,...), as long as they are pickle-able.
"""
Node i sends a string to the node (i+1) % number_of_nodes
and receive its string from node (i-1) % number_of_nodes (ie. a circle).
"""
import message as ms
ms.setup_connection()
myID = ms.node_id
nodes = ms.n_node
nxt = (myID+1)%nodes
pre = (myID-1+nodes)%nodes # avoid negative value
print "My ID is", myID
ms.send(nxt, "Next to %i is %i" % (myID, nxt))
msg = ms.recv(pre)
print msg
ms.close_connection()
Output on my cluster :
--------(Node localhost returns)--------
My ID is 2
Next to 1 is 2
--------(Node 192.168.0.113 returns)--------
My ID is 3
Next to 2 is 3
--------(Node 192.168.0.179 returns)--------
My ID is 1
Next to 0 is 1
--------(Node 192.168.0.169 returns)--------
My ID is 0
Next to 3 is 0
Total time : 1.523.
Suppose you have N
machines, then 1
of them would be master and N-1
of them would be workers. You want to run myprogram.py
on those machines.
On each worker machine :
- Put
distrComp.py
andworker.py
into a folder. - Run command line
python -B worker.py
inside that folder.
On master machine :
- Create file
peers.txt
contains IPs of all machines. - Place
peers.txt
,distrComp.py
,message.py
andmaster.py
into the folder which containsmyprogram.py
. - Tweak
master.py
constants to fit your purpose. Read the comments. Don't be afraid to read the code. - When you're ready to go, run
python -B master.py
. Yourmyprogram.py
would be automatically executed by all machines at the same time. - If you're still confused, then read...
When you run worker.py
, the machine will listen to a determined port (default : 6969). When you run master.py
, the master will connect to all workers whose IPs are specified in peers.txt
, or more precisely, the variable IPs
inside master.py
.
By tweaking master.py
, you specify which files (eg. source code, header files,...) you want send to workers and which terminal commands (eg. python XXX.py
, g++ main.cpp ; ./a.out
,...) you want to execute simultaneously. The master will send those files and commands (encoded as a binary string) to workers.
Workers will receive the string, decode it, save the files into a temporary folder, set the folder as current working directory and spawn subprocesses to execute the commands.
Output from STDOUT of those subprocesses are sent back to master to be printed out.
distrComp.py
: required by master.py
and worker.py
master.py
and worker.py
: scripts to run on master and workers.
message.py
: handle connection between machines, implementation of message.send
, message.recv
.