LamasAndGroves

Small project to solve anamgras.

LamasAndGroves is a program for finding the anagrams behind a hash. So given a phrase, a list of words, a hashing algorithm and list of hashes. The program will find the hashes in the list and output them with their hash counterpart:

# Output
8229b3735981c23fb122c6db1a2a09b9 --> Anagram found
...

Basic usage

$ python src/lamasandgroves.py "some search phrase" some-dictionary.txt "md5" hashes-to-find.txt
args {
    0 =>           lamasandgroves.py,
    1 => str:      phrase to find anagrams for,
    2 => filename: file of words that could be in the anagram,
    3 => str:      hash algorithm (md5, sha1, sha256, sha512, plain),
    4 => filename: file of hashes we should look for
}

Theory

When running the program a wordlist is provided, which will be the basis of the anagrams we need to find. To hold the words in memory so that it's quickly to parse and lookup. We'll be using a abstract syntax tree, where each branch in the tree represents a character.

We combine parsing words to an internal structure with reducing the list of words, to avoid keeping too much in memory.

The program constructs an abstract syntax tree of the given dictonary of words.

Word list	Syntax Tree
and app apple groves lamas

To find the anagrams of the words available, we create a tree of all the valid combinations, where each branch represents a word.

For the given phrase "an dlamasa pple", part of the tree we would produce is this:

To improve performance we implement a few heuristics that logically target the problem:

1. Avoid computing branches that has the same subproblem

When going down branches we might end up with subproblems that are the same, so we know those branches will create the same sub-branches.

We avoid this by creating a representation for the remaining characters and make a table for {dict_str => WordBranch}.
All branches with the same subproblem will therefore reference a single WordBranch.
Looking at a example of 1.100 words, 35% of them are permutations of other words in the list, meaning we can skip those computations on every level.

2. Solve hashes as we go

Using a dictionary of 99.000 words a small piece of text like "anagram" ends up having 34.668 valid anagrams.
With this many solutions, either we have a lot of IO time or we use a lot of resources to keep all solutions in memory. Therefore we compute the hash of each candidate and check if it's one of the hashes we're looking for before we return the solutions.

3. Terminate when hashes are found

When all solutions are found we might still have anagrams we haven't check. This gives no extra value and we terminate to recursive loop.

4. Solve anagrams in levels

Once the AST of word combinations are constructed, we look for the solutions one level at a time, due to the increased likelyhood of the phrase we're looking for contains words longer than 1 letter. We would use a BFS algortihm to find solutions, but the implementation has some problems when it comes to space effecientcy, because we need to store a queue with elements equal to the width of the tree.
We handle this by finding solutions using DFS for level 1, then solutions for level 1 + 2, then solutions for 1 + 2 +...+ k. This approach is not optimal, but due to the performance gains from removing dubplicate subproblems, we still save time overall.

Test

Running time on instance: pharse length 18, valid words after parsing 1.659

Description	simple	w. heuristics
3-word combinations	463s	246s
4-word combinations	25.153s	8.580s
5-word combinations	unknown	unknown

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.editorconfig

.editorconfig

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

Repository files navigation

LamasAndGroves

Basic usage

Theory

Test

About

Releases 1

Packages

Languages

License

MGApcDev/LamasAndGroves

Folders and files

Latest commit

History

Repository files navigation

LamasAndGroves

Basic usage

Theory

Test

About

Resources

License

Stars

Watchers

Forks

Languages