tor_anomaly

Tor anomaly detection based off of the detect.py script from the tor-web repository: A simple visualization page is included. The page is not fully done yet but, it does the basic visualization of where censorship and spikes occur. I also added cute little date-range specific twitter and google news searches when you click on one of the censorship or spike anomalies.

Website for viewing results:
Branch with website code:
Branch containing data:
- Note: Data is updated manually using the update data script
Repo with original code

Overview

I have been updating the detector scripts in metrics-web with a goal towards making it easier for others (hopefully with more statistical knowledge than I) to work with and build on the code. It has been a substantial rewrite that relies heavily on the python pandas library. I have just reached the point where I can accurately duplicate the functionality of the original code as it is called in the 80-run-clients-stats.sh file. This code also removes the need for pre-processing the data as done by the userstats-detector.R script.

Sadly, My expertise is not in the statistical analysis, but in open source software development. This is why I focused on making the existing code cleaner and more cleanly documented and structured.

If you are a statistician who has some experience in anomaly detection I would be happy to work with you to implement a better algorithm. The current algorithm is over-zealous is it's classification.

I would also appreciate tickets on what restructuring would be needed for models to be more easily tested and implemented within this code base so that it is easier for any future statistician to implement new algorithms without my assistance.

Changes in output

Below is a quick overview of the changes in output that may impact other programs, or consumers of this information. I will write up a much more in-depth overview of functionality when I submit the actual pull request. I am thinking of getting basic PT anomaly detection added before this before I submit the pull request. This should be much easier with the new code.

Comparison of the old and new output

NEW_ranges_file_SUBSET.csv
OLD_ranges_file_SUBSET.csv
- The output from write_all function [now called write_censorship_analysis()] has had some fields added to it. The old code had some duplicate processing that was built into it. The new code identifies the censorship and spike events the first time it runs through the time series so that the other functions can just read from the ranges output.
- I have also changed the names of some of the fields.This will impact any code that is currently parsing this output. I can either change the field names back, write a seperate file that only has the currently formatted data and heading in it for further processing, or whatever code process' this output can be updated to parse this properly.
NEW_short_censorship_report.txt
OLD_short_censorship_report.txt
- I have slightly modified the short censorship report produced by write_ml_report() which is now called write_short_report(). The changes are merely cosmetic, but I think there is a lot that can be done to eventually make the short report a more useful document (e.g. putting it in a structured format that will allow others to scrape and incorporate it into a threat feed).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
country_info.py		country_info.py
detector.py		detector.py
split_countries.py		split_countries.py
update_data.sh		update_data.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

country_info.py

country_info.py

detector.py

detector.py

split_countries.py

split_countries.py

update_data.sh

update_data.sh

Repository files navigation

tor_anomaly

Overview

Changes in output

Comparison of the old and new output

About

Releases

Packages

Languages

seamustuohy/tor_anomaly

Folders and files

Latest commit

History

Repository files navigation

tor_anomaly

Overview

Changes in output

About

Resources

Stars

Watchers

Forks

Languages