Homework: Anagram Finder

This is my solution to a homework problem for an; Anagram Finder, application.

Requirements

Input:
- <file>.txt as an argument.
Output:
- Stdout.
- No Anagrams in the file: "no anagrams found".
- All anagrams for a particular word should be on a single line of stdout.
- All anagrams for the next unique word should be on a new line.
- Assumption: Examples show no repeats for any repeated words (or any words that are found in the "found" anagrams list).
- Anagrams printed in file read order.

Notes

User does not provide a dictionary.
Anagrams are for each specific word on, not phrases / collections of words.

Expectation

INPUT	OUPUT
the quick brown fox	no anagrams found
eat my tea	eat tea
do or door no no	no anagrams found
pots stop pots spot stop	pots stop spot
on pots no stop eat\nate pots spot stop tea	on no\npots stop spot\neat ate tea

Solution

How to Run

Using pipenv to manage dependencies:

pip install pipenv
pipenv install

From: anagram_finder/, run Anagram Finder application via:

# run up the AnagramFinder server process:
pipenv run python main.py
# run the CLI Client:
pipenv run python client/cli.py [--dictionary=<anagram_dictionary>] </path/to/file.txt>
# example for anagram finding from source file:
pipenv run python client/cli.py --dictionary=source-file ../tests/data/example5.txt

From: project root, run pytest's via:

pipenv install -e .  # install package locally
pipenv run pytest

State at Point of Homework Submission

Submitted this homework piece at tag 2018-08-15_a31a8d9_homework_submitted, View that tag for the original state, including assumptions & PO questions.

Current State

Server

Synchronous AnagramFinder process (main.py) which exposes a REST API. Currently hardcoded to: http://127.0.0.1:5000/

/ - GET Hello World print.
/anagrams - GET List of dictionary sub-paths.
/anagrams/en-us-webster/<string> - GET anagrams from provided string against the EN US Webster dictionary.
/anagrams/source-file/<string> - GET anagrams from provided string against the provided string.

Clients

Clients built on the Server API are located in the clients/ folder.

cli.py - CLI client (as specified in the original homework piece).
- CLI can define the anagram dictionary via the optional --dictionary option.

Future Plans

...Or things I'd do if I had more time:

Find an appropriate GB English dictionary. Requires further investigation, but potential solutions include (Note: had rejected remote dictionaries due to Assumption listed above):
- Github: english-words - Local dictionary - tried initially. Has too many words I'd consider non-dictionary terms. Does have a lowercase JSON representation though. Unsure if it is GB or US English.
- Github: EnglishDictionaryAPI - Remote dictionary - Web scraping of Merriam-Webster and Thesaurus.com - not in pypy. Unsure if it is GB or US English.
- Oxford Dictionaries API - Remote dictionary - REST API. GB English.
- Brian Kelk: Wordlists - Local dictionary - Worth investigating. Found whilst compiling this list and thinking of client/server spell checkers like Aspell and [Hunspell. Loads of word list resources including an UK_English_wordlist.zip which includes frequency.
- pip search dictionary - local/remote dictionaries - 100 hits with ~1/3 at a quick glance that look like language dictionaries. Requires more investigation.
- OS level / package manager dictionary - local dictionary - ignored due to cross-platform worries over location/availability.
Update the backend AnagramFinder() to be an asynchronous process that exposes a REST API, so that it can handle multiple clients.
- Optimise the dictionary to improve searching. Most dictionaries are either flat text (word per line), CSV or JSON. Potentially changing the structure could improve look ups. Although a cursory glance shows that speed increases would be for searching the initial word eg. Wikipedia: Trie (prefix tree). For Anagrams, improvements could be done by: * Upfront linking all possible anagrams to each of the words in the anagram list. (eg. using a Trie to look up pots, which also returns post, pots, spot, stop. * Refining the search algorithm done on a flat dictionary.
First time using pytest and pipenv:
- pytest's fixtures are fairly powerful, like an easier to use version of C#'s xUnit Inline/Member data annotations. For speed (and consistency with the Company) I should have used unittest, but pytest seems to be interesting in a good way. Would like to dig more into it's feature set.
- pipenv seems good for local development to quickly spin up a consistent environment. Can see it being useful in CI/CD situations.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
anagram_finder		anagram_finder
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anagram_finder

anagram_finder

tests

tests

.gitignore

.gitignore

.gitmodules

.gitmodules

.travis.yml

.travis.yml

Pipfile

Pipfile

Pipfile.lock

Pipfile.lock

README.md

README.md

setup.py

setup.py

Repository files navigation

Homework: Anagram Finder

Requirements

Notes

Expectation

Solution

How to Run

State at Point of Homework Submission

Current State

Server

Clients

Future Plans

About

Releases

Packages

Contributors 2

Languages

jackson15j/python_homework_anagram_finder

Folders and files

Latest commit

History

Repository files navigation

Homework: Anagram Finder

Requirements

Notes

Expectation

Solution

How to Run

State at Point of Homework Submission

Current State

Server

Clients

Future Plans

About

Resources

Stars

Watchers

Forks

Languages