data-structures

Sample code for classic data structures and sorting algorithms implemented in Python by Ben Friedland.

Includes implementations of linked list, queue, doubly-linked list, stack, binary heap, priority queue, binary search tree, AVL-balanced binary search tree, a simple graph structure with depth- and breadth-first traversal demonstrations, the A* and Dijkstra's shortest path algorithms, and implementations of the insertion sort, merge sort, and quicksort algorithms.

Binary search trees are a quick and efficient way to store and search through data. Inserting a value into a binary search tree will cause it to store the value as a node in the tree in a way that allows them to be looked up through comparisons from the 'root' node to nodes with closer and closer values via binary comparisons (greater-than versus lesser-than).

This comparison-indexed tree structure is very efficient to build and
search through given either sufficiently randomly distributed data values
or a balancing function, such as the balancing_tree's automatically-
invoked implementation of the AVL balancing algorithm.

Simple graphs behave as the computer science notion of a graph, containing vertices (Nodes) and Edges. The user interacts with the graph by value rather than interacting with the abstraction (this was an issue of some concern in the specifications).

Traversable graphs are simple graphs with the additional functionality of being "traversable" by depth- and breadth-first search algorithms. These algorithms will return the full list of nodes connected to the node with the given value by any unbroken chain of edges; this is analogous to printing every node of the graph in a list. These functions may be modified to perform other duties as they go, if modified.

Weighted graphs are just like traversable graphs, but they have a weighting, which is largely useless unless some extra functionality is added. Necessary for implementing Djikstra's shortest-path algorithm.

Shortest paths graphs are weighted graphs that utilize their nodes' weighting attribute to calculate the shortest path between the nodes of two given values.

There are two shortest path algorithms available to the ShortestPathsGraph:

    dijkstra_algorithm: Finds the shortest path in the graph if it exists
        and returns it in a list where the first element is the total
        cost of following the path and the second element is a list of
        the values of each node required to follow the shortest path,
        ordered from the start to the end.
        Returns None if no path is found.

    a_star_algorithm: Finds the shortest path in the graph if it exists
        and returns it in a list where the first element is the total
        cost of following the path and the second element is a list of
        the values of each node required to follow the shortest path,
        ordered from the start to the end.
        Uses a heuristic, defaulting to None, which can increase the
        efficiency of this process when the heuristic is chosen based
        on reasonable assumptions about the graph's geometry.
        Returns None if no path is found.

        Notes about heuristics: The Euclidean, Manhattan and Chebyshev
            heuristics may only be used if nodes are provided by the
            user with x_coordinate and y_coordinate values; otherwise,
            the default heuristic may be used, making this algorithm's
            performance roughly identical to Dijkstra's algorithm.

            The provided heuristics only make sense on graphs with
            certain properties; the Euclidean heuristic, for example,
            is best used on graphs where traversal between nodes
            resembles unrestricted movement in the real world, whereas
            the Manhattan heuristic is best for graphs utilizing
            'taxicab geometry', and the Chebyshev heuristic is better
            for graphs where traversal costs resemble those of
            the king in chess.

            http://en.wikipedia.org/wiki/Taxicab_geometry
            http://en.wikipedia.org/wiki/Chebyshev_distance

hash_table.py will allow the construction of hash tables of user-defined sizes that allow only strings for keys.

The hashing algorithm used combines bit rotation and XOR hashing, and
    was researched from:
    http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

HashTables will accept any positive integer table size; to optimize
    for performance, the user is expected to determine their own ideal
    hash table size, which is likely to be around 1.6 times the size of
    the anticipated inputs, according to people with evaluation criteria
    I have not yet had time to research.

HashTable objects may be instantiated by calling:
    HashTable(size)

HashTable methods include:
    get(key)
        Retrieve from the hash table the value associated with
        the given key string.
    set(key, value)
        Set the value for key in the HashTable to refer to value.
    hash(key)
        Return the hash of a given key string.

Insertion sort is a simple sorting algorithm. It has somewhat poor performance, but it's very easy to implement. It will run in O(n) time in the best case and O(n^2) time in the worst case. Compared to more advanced sorting algorithms like merge sort, quicksort and radix sort, insertion sort is very inefficient on large lists.

Running insertion_sort.py from the command line will demonstrate the
performance characteristics of this implementation of insertion sort.

Merge sort is a fairly simple sorting algorithm. It has very predictable performance with a big O of n log(n) in every case. It will recursively divide the list until only single elements remain (which are trivially sorted), and then it will zip them back up using first-element comparisons to sort the list as it merges.

Running merge_sort.py from the command line will demonstrate the
performance characteristics of this implementation of merge sort.

Quicksort is similar to merge sort in that it's a "divide and conquer" algorithm, dividing a list into pieces and re-merging them in sorted order. Its main difference is that it sorts during the process of division, ordering elements by comparison with "pivot" value selected from the list each pass, then recursing on the pair of lesser- and greater-valued lists.

Running quicksort.py from the command line will demonstrate the
performance characteristics of this implementation of quicksort.

Radix sort is an unusual sorting algorithm in that it gains efficiency when the number of digits (or "keys" for non-numeric comparisons) is constant and less than the number of elements to sort against each other.

That is because the algorithm breaks the problem into a number of steps
equal to the number of keys elements in the queryset have and sorts
the elements by their value in that key alone. On the whole, this
can save time, because a small number of keys and a large number of
elements means the algorithm will scale according to the product of:

    (the constant-time operation of sorting each element's key's value)
    and
    (the linear-time operation of performing that algorithm once per key)

The net result is a big O rating of O(k*n), or the number of keys times
the size of the dataset. This simplifies to O(n) when the number of
keys is small and the size of the dataset is large, but can be worse
than O(n^2) when the situation is inverted.

This algorithm will NOT work properly when handed numbers of differing
key length, and I'm not stopping Python from shaving off any prepended
zeroes hypothetical users might try to use to skirt this requirement.

Running radix_sort.py from the command line will demonstrate the
    performance characteristics of this implementation of radix sort.

Dependencies include Python 2.7, only.

Collaborators: Jason Brokaw (binary_heap, priority queue, binary search tree, simple graph) ((especially bst deletion)) Charlie Rode (priority queue, binary search tree) Casey MacPhee (binary search tree)

Unit tests were usefully informed by:

https://github.com/linsomniac/python-unittest-skeleton/
    blob/master/tests/test_skeleton.py

http://stackoverflow.com/questions/6103825/
    how-to-properly-use-unit-testings-assertraises-with-nonetype-objects

http://stackoverflow.com/questions/6181555/
    pass-a-unit-test-if-an-exception-isnt-thrown

https://github.com/charlieRode/data-structures/blob/bst/test_bst.py

Resources used include:

linked_list:
    http://en.literateprograms.org/Singly_linked_list_%28Python%29

stack:
    http://en.literateprograms.org/Singly_linked_list_%28Python%29

validate_parenthetics:
    Own memory

binary_heap (and priority queue):
    most helpful:
    https://github.com/jbbrokaw/data-structures

    also helpful:
    http://domenicosolazzo.wordpress.com/2010/09/26/
        heapsort-a-python-example/
    http://pravin.paratey.com/posts/binary-heaps-and-priority-queues
    http://en.wikipedia.org/wiki/Binary_heap
    http://interactivepython.org/runestone/static/pythonds/Trees/heap.html

bst:
    https://github.com/jbbrokaw/data-structures/blob/master/bst.py
    https://github.com/caseymacphee/Data-structures/blob/master/test_bst.py

hash_table:
    http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
    https://github.com/jbbrokaw/data-structures/blob/master/test_hashtable.py

traversable_graph:
    http://eddmann.com/
        posts/depth-first-search-and-breadth-first-search-in-python/

weighted_graph:
    Own memory

shortest_paths:
    http://www.eoinbailey.com/content/dijkstras-algorithm-illustrated-explanation
    http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
    http://code.activestate.com/
        recipes/577519-a-star-shortest-path-algorithm/
    http://en.wikipedia.org/wiki/A*_search_algorithm

insertion_sort:
    http://en.wikipedia.org/wiki/Insertion_sort
    http://www.geekviewpoint.com/python/sorting/insertionsort

merge_sort:
    http://en.wikipedia.org/wiki/Merge_sort

quicksort:
    http://en.wikipedia.org/wiki/Quicksort
    http://en.literateprograms.org/Quicksort_%28Python%29

radix_sort:
    http://en.wikipedia.org/wiki/Radix_sort
    http://www.geekviewpoint.com/python/sorting/radixsort
    http://en.wikibooks.org/
        wiki/Algorithm_Implementation/Sorting/Radix_sort

balancing_tree:
    http://en.wikipedia.org/wiki/Binary_search_tree
    http://en.wikipedia.org/wiki/AVL_tree
    http://interactivepython.org/
        courselib/static/pythonds/Trees/balanced.html

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
balancing_tree		balancing_tree
binary_search_tree		binary_search_tree
binheap		binheap
dll		dll
insertion_sort		insertion_sort
linked_list		linked_list
merge_sort		merge_sort
priority_queue		priority_queue
queue		queue
quicksort		quicksort
radix_sort		radix_sort
shortest_paths		shortest_paths
simple_graph		simple_graph
stack		stack
traversable_graph		traversable_graph
weighted_graph		weighted_graph
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
hash_table.py		hash_table.py
test_hash_table.py		test_hash_table.py
test_validate_parenthetics.py		test_validate_parenthetics.py
validate_parenthetics.py		validate_parenthetics.py

License

BFriedland/data-structures

Folders and files

Latest commit

History

Repository files navigation

data-structures

About

Resources

License

Stars

Watchers

Forks

Languages