Skip to content

unoffices/mydht

 
 

Repository files navigation

                          MyDHT Distributed Hash Table

    1. What is it?
    --------------

    MyDHT is an implementation of a DHT written in Python.
    It supports getting, setting and deleting key/values.
    Values can be strings of files.

    A replication level can be set for the DHT (3 is default)
    where every key/value par will be stored in n different nodes.

    Every node can be viewed with an ordinary web browser
    (lynx did not work for me).

    I have focused on implementing dynamic add and remove of nodes.
    Load balancing and replication for increased protection against
    a server failure.

    If a node is removed in a nice fashion (SIGINT) it will handover
    all it's data to other nodes. If a node crashes the action REMOVE
    can be sent to a node with the crashed node as the key, this will
    remove the node from all other nodes and perform a load balancing.

    During a load balancing action all nodes will compare their keys
    with the replica nodes keys (using timestamps) and the the newest
    version of a value will be copied to all nodes.

    After a load balancing the PURGE operation may be used to purge
    keys from a node that don't belong there anymore.

    
    2. Documentation
    -------------

    The source code is documented with PyDoc and for every source file
    there is a html file with the generated output.

    If anything is unclear the source code contains more specific comments
    about the implementation.

    3. Installation
    ------------

    There is no installation of the system,
    just unpack the tar.gz-archive with:

    $ tar xf mydht.tar.gz

    This will unpack all files in a directory called MyDHT.

    4. Dependencies
    ------------

    MyDHT has been tested with python 2.6 on Ubuntu Server 10.10,
    Ubuntu 10.04 Desktop and Windows 7 x64.

    If nodes are to be run on separate machines the hostname must
    be set to such that the other nodes can reach it. Running on
    localhost will not work.

    5. Glossary
    -----------
    A brief explanation to some words used below.

    ring      a ring refers to a hashring (HashRing.py) where all nodes
            in the DHT gets a MD5 checksum based on hostname+port.

    node      a node is a MyDHT server.

    6. Running
    ----------

    A MyDHT server (sometimes called node) is started from the command
    line in linux. First go into the MyDHT directory where you extracted
    the files.

    $ python mydhtserver.py

    This will start a server on localhost running on port 50140.
    Hostname and ports can be specified with -h and -p (or --host and --port).

    To check that the node is running launch a webbrowser and point it to:
    http://localhost:50140

    This should give you a status page that shows that your ring consists
    of one node, has 0 keys and size is 0.

    This page is good for looking at the node status and it is also possible
    to navigate to the other nodes via hyperlinks and also download the values
    by clicking the links.

        6.1 Adding (and removing) nodes
        -------------------------------
        To add an additional node to the system run the following command:

        $ python mydhtserver.py -p 50141 -s localhost:50140

        The -p flag tells the server to run on port 50141 and -s (or --server)
        tells the server to join an existing server.
        The server at 50141 will contact the server at 50140 and ask to join.
        50140 will return a list of servers that are already in this ring.

        Check the webpage once again (on either http://localhost:50140 or
        http://localhost:50141) to see that it reports that the ring consists
        of 2 servers.

        This step should be repeated for every new server that should be added.
        Remember that the hostname and port must be reachable by the other nodes
        (ie using localhost will not work when nodes are on other machines).

        To remove a node just hit CTRL-C if it is running in a terminal or else
        just give it SIGINT (kill -2 PID). This will force the node to hand over
        the keys and values to another node (if there is any) and then quit.
        
        6.2 Running some test cases
        ----------------------------------
        To upload some data to the DHT you can either put a couple of file in the
        upload directory and then run the TestUpload.py test case.
        This will iterate over all files in the directory and upload them with
        the filename as the key. Also the filename will be uploaded as a string
        value with the key "key for " + filename.

        This is only intended as a quick way to put some data into the DHT.

        After this is done point a web browser to any server and check that
        all files are present. If there is fewer than 4 servers all files will
        be present in all servers.

        To try downloading the files run the TestDownload.py test case.
        This will try to download all files in the upload directory and
        save them to the download directory.


        6.3 Interacting with the DHT
        ----------------------------

        If you would like to interact with the servers "for real" there is a
        mydhtclient.py that should be used.

        To put a string value:
        $ python mydhtclient.py -c put -k mykey -val myvalue
        > PUT OK mykey

        This will set the value of "mykey" to "myvalue" at the server
        localhost:50140. To use another server use -h/--host and -p/--port.

        To put a file:
        $ python mydhtclient.py -c put -k myfilekey -f myfile.bin
        > PUT OK myfilekey

        This will upload the file myfile.bin and save it at key "myfilekey"

        To get a value:
        $ python mydhtclient.py -c get -k mykey
        > myvalue

        To get a file:
        $ python mydhtclient.py -c get -k myfilekey -o myfile.bin.download
        No output will be shown but the files can be compared with diff:
        $ diff myfile.bin myfile.bin.download

        To delete a value (string or file):
        $  python mydhtclient.py -c del -k myfilekey
        > DEL OK myfilekey

            6.3.1 Just for fun
            ------------------
            Since the web server is able to fetch keys from the key value store
            it is possible to change the favicon.ico that the web browser keeps
            asking for by doing this.
            $ python mydhtclient.py -c put -k favicon.ico -f favicon.ico

        6.4 Other operations
        --------------------
        Besides from get, put and del there are some more actions that can
        be performed:

            6.4.1 Haskey
            ------------
            returns the timestamp of a value (or 0.0 if it is missing).
            this action is mostly used between servers when they are load
            balancing.

            Example:
                    $ python mydhtclient.py -c haskey -k myfilekey
                    > 1300386351.67

            HASKEY = 4

            6.4.2 Purge
            ------------
            Purge is used to tell a node to throw away all keys that don't belong
            on that node. This could be useful if new nodes have joined the network
            and a load balancing operation have occured.

            The key should be the server where the command should be sent.

            NOTE: Nodes will never remove keys unless they are told to.

            Example:
            $ python mydhtclient.py -c purge
            > PURGE ok

            6.4.3 Leave
            -----------
            This command is only used internally when a node wants to leave.
            If you have started a node from the command line and press ctrl-C
            (or kill it whit SIGINT) it will send the LEAVE action to all other
            nodes.

            6.4.4 Remove
            ------------
            If a node left the ring without being able to send the leave message
            the other nodes will continue to try to contact it. Remove is used to forcibly
            remove a node from the ring.

            When a node receives the remove action it will remove that server from its
            own ring and tell all other servers. After that a load balance will occur
            and it will be sent to all nodes.

            Example:
            $ python mydhtclient.py -c remove -k localhost:50141
            > REMOVE ok

            6.4.5 Join
            ----------
            This action is used by a server when it is started and joining an existing network.

            6.4.6 Addnode
            -------------
            This is an internal action sent to all other nodes when a node has joined the network.

            6.4.7 Whereis
            -------------
            Whereis tells you at which nodes a key is stored.

            Example:
            $ python mydhtclient.py -c whereis -k mykey
            > localhost:50142, localhost:50140, localhost:50143

            NOTE: Whereis will only tell where the values should be but not if they're
                  actually there. To do this haskey may be used.

            6.4.8 Balance
            -------------
            Balance will tell the first node to go through its keys.
            For every key it will ask every server where the key should be (according to the ring)
            if they have it.

            If the other node has the key but it's older it will be transferred to that node.

            If the other node has a newer version of the key it will be transferred to the first node.

            If they have the same timestamp nothing will happen.

            After this the balance action will be forwarded to all other nodes and they will do the same thing.

    7. Limitations
    --------------

        7.1 Only in memory storage
        --------------------------
        Right now all keys and values are stored in a Python dictionary. This will probably not be suitable
        for production use. I thought about moving values to disk after they reached a certain age but left
        it out. Maybe it would be good to write the values to disk all the time and have the dictionary for
        fast access.
        
        This would make it possible for a node to be restarted after a crash and still have some or all values.
        The keys must be written to disk for this to work but it should not be that hard.

        Another thing about storing in memory is that at some point a limit will be reached and the os will
        not give Python more memory. I have tried to upload about 1 GB of data in the network and it was no
        problem but I guess if you multiply that by a factor of 10 something will go bad.

        7.2 No gossiping
        ----------------
        The nodes are not that social, they will not perform any smalltalk where they can discover errors.
        For example it is possible that 7 of 8 nodes have a HashRing with 8 servers but the last one only
        has 7. This will not be good since the last node have another opinion about where to put the keys
        and values.

        By looking at the web page for each node this could be discovered and the faulty node could be
        restarted.

        Also it would be good to have a timestamp on the hashring itself, this would make it easier to
        find and correct errors.

        7.3 Overhead in communication
        -----------------------------
        The nodes communicate mostly with messages of type DHTCommand. This is a homemade protocol that
        introduces some overhead. It would be good to replace this protocol with Google Protocol Buffers
        or Thrift.

        Also there are some unnecessary overhead when load balancing. Between the first load balancing
        node and other nodes holding the same keys there will be double checks for all values.

        7.4 No removal of presumably dead nodes
        ---------------------------------------
        If a node tries to contact a node that has crashed it will fail and continue with the next node
        that holds the same key if there is any.

        The MyDHTClient class that does most of the communication will try 3 times before giving up.

        7.5 Error handling
        ------------------
        I have tried to take care of most of the error that I have found during development but I am
        sure that there are things that I've missed. Also I have used "except:" without an exception
        in some places and this is not really good.

        7.6 Windows/Linux compatibility
        -------------------------------
        The system runs on Linux and Windows and you can use nodes from both worlds but it is not possible
        to write a file from Windows and read it from Linux. Or at least I have not been able to do this.

        Also there is something strange when sending packets from Windows to Linux so it is not recommended
        to use nodes running on both OS:es at the same time.

        7.7 Every node holds the full ring
        ----------------------------------
        This is not a real limitation in smaller networks but if you plan to use a lot of nodes it is not
        the best thing to have them to "multicast" everything to all other nodes.
        In this case every node should instead know the next couple of nodes and when a key is not found on
        one node it will be redirected through the ring eventually ending up in the right place.

        A good thing with the existing approach is that at most 1 hop will be needed to reach the correct node.

        7.8 Replication could be even better
        ------------------------------------
        If a put is sent to a node where a key should not be it will be forwarded to one of the replica nodes.
        That replica node will perform the replication to the other nodes after it has downloaded the data.
        A nicer way to do this would be to replicate to all node from the first node, reading from the socket
        and writing to the replica nodes sockets all at the same time.

        I tried to do this but the code did not turn out well so I reverted back to this simple solution.

        I tried to read from one incoming socket and forward it to another server but this failed when I started
        testing with different machines so I had to revert back to the solution where the first node that receives
        the command will store the data and then forward it. This works but it would be cleaner to just
        forward the request by by using the select syscall on incoming and outgoing sockets.

        Storing data on nodes while forwarding request is maybe not a problem unless you transfer very large
        files. Memcached for example has a limit of 1 MB and with that amount it should not be a problem.

        Maybe there should be a way of changing the number of replica nodes after the ring has been started.
        This should be fairly easy, just let all the nodes change their replicas value in HashRing.py and
        then do a load balance.

        7.9 Distribution of keys
        ------------------------
        The distribution of keys is not even. Like stated in HashRing.py I borrowed it and changed it
        to fit this system. I tried changing from MD5 to SHA1 but it was not better. In a real system
        it would be good to have an even distribution.

        Also it would be good to be able to add a node to a specific place in the ring (like in Apache
        Cassandra). The ring should also take into account the physical location of a node (distance etc).

        7.10 No bootstrapping for new nodes
        -----------------------------------
        When a new node is added to the network it is not able to get it to bootstrap itself and
        downloading data. A load balance action has to be sent to any node.

        7.11 The web interface
        ----------------------
        I know that most DHT's don't have a web interface, I built it for debugging purposes and left it
        because I think it is useful when reviewing this assignment since it is much easier to overview
        a nodes view of the ring and which key/values are on it. It is also very easy to test that for example
        an image is found (and looks the same) on all replica nodes by just clicking an url in the web browser.

        There are some ugly code in dhtclient for this to work. Since the web browser does not send exactly
        4096 bytes in a request I have to break the loop in read_from_socket when the received data starts
        with "GET /" and ends with "\r\n\r\n" (terminator for HTTP).

        7.12 Test cases
        ---------------
        The number of test cases are limited and they only test GET, PUT and DEL actions. It would have been
        nice to have larger scale tests which started nodes in different processes. I wrote some of these
        initially but since TCP/IP-communication seems to work perfect when everybody is on the same machine
        I removed them.

        I have however created a script for to start servers easily, createservers.py

        Example:
        $ python createservers.py -h 192.168.0.150 -p 50140 -n 4
        > python mydhtserver.py -h 192.168.0.150 -p 50140 -v -l 192.168.0.150-50140.log &
        > python mydhtserver.py -h 192.168.0.150 -p 50141 -s 192.168.0.150:50140 -v -l 192.168.0.150-50141.log &
        > python mydhtserver.py -h 192.168.0.150 -p 50142 -s 192.168.0.150:50140 -v -l 192.168.0.150-50142.log &
        > python mydhtserver.py -h 192.168.0.150 -p 50143 -s 192.168.0.150:50140 -v -l 192.168.0.150-50143.log &

        This will print 4 command lines that will start a new node for each line, all of them connecting to
        the first node (at port 50140).

        If the new servers should connect to an existing ring:
        $ python createservers.py -h 192.168.0.150 -p 50140 -n 4 -s 192.168.0.200:50400
        > python mydhtserver.py -h 192.168.0.150 -p 50140 -s 192.168.0.200:50400 -v -l 192.168.0.150-50140.log &
        > python mydhtserver.py -h 192.168.0.150 -p 50141 -s 192.168.0.200:50400 -v -l 192.168.0.150-50141.log &
        > python mydhtserver.py -h 192.168.0.150 -p 50142 -s 192.168.0.200:50400 -v -l 192.168.0.150-50142.log &
        > python mydhtserver.py -h 192.168.0.150 -p 50143 -s 192.168.0.200:50400 -v -l 192.168.0.150-50143.log &

        Also all output will be logged to a logfile if not -l filename is removed.

        7.13 Installation
        -----------------
        I have choosen not to do any fancy installation scripts.

        7.14 Examples
        -------------
        I have not provided any examples for using the classes in existing code, however the test cases
        performs GET, PUT and DEL and this could be used as a first example.

    8. Summary
    ----------
    This was by far the most fun I have had in a long time. I've learned a lot of Python by doing this.

    I have tried to keep the code as simple and clean as possible.
    The server (MyDHTServer) reuses a lot of code from MyDHTClient when communicating.

    I have tested the system on 2 different linux machines with 5 node on each one.
    Uploading about 120 MB to the ring at various sizes (not all 10 nodes active from start).
    Added, removed rings (both nicely and not so nice).
    Load balanced, purged, deleted keys.
    During all these test I have regularly downloaded all keys from the ring (from different node(s))
    and verified with diff that the content in my upload-directory is the same as in my download-directory.
    

About

Distributed Hash Table implemented in Python. (please keep in mind that I did this trying to learn Python, just for fun).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%