domain_count

The main script file (domain_count.py) accepts a single argument for the filename of the file to process.

The file is read line by line with each line passed to an EmailAddress constructor. EmailAddress abstracts the specifics of validating an email and splitting the local and domain information. The regex used is a fairly basic pattern, it does not perfectly encapsulate the RFC.

The Domains class contains the results storage and output. The output is split into domains that were parsed from valid email addresses and everything else. Parsing the results like this will allow the process to function while preserving possible useful data.

Non-unicode data is logged and skipped. Any valid data on the same line as non-unicode data will be lost. Python is handling the non-ascii character set email addresses (utf characters outside the ascii range). I have an implementation in the test_binary method of the included test suite that removes the jpeg data and isolates the indifference@nightclubHades.org address. While this was an interesting challenge, I did not include it in the implementation as the solution seemed to specific to jpeg binary data.

I included an empty requirements file in an attempt to indicate there are no module dependencies. It is written in python3.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
tests		tests
Domains.py		Domains.py
EmailAddress.py		EmailAddress.py
README.md		README.md
domain_count.py		domain_count.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests

tests

Domains.py

Domains.py

EmailAddress.py

EmailAddress.py

README.md

README.md

domain_count.py

domain_count.py

requirements.txt

requirements.txt

Repository files navigation

domain_count

About

Releases

Packages

Languages

kyledes/domain_count

Folders and files

Latest commit

History

Repository files navigation

domain_count

About

Resources

Stars

Watchers

Forks

Languages