Skip to content

ggpartagasyca/trafficcop-wat

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

	    TrafficCop WAT ad classifiers

Andrew Philpot 29 May 2015 philpot@isi.edu

This Github repository project contains two independently developed capabilities for classifying escort posts.

A. folder wat/escort implements a rule-based keyword scanner. Derived in large part from example data provided by FBI agents, it has been hand-crafted to match many common examples and derives all its capabilities from the knowledge encoded in the file by a human. It can indicate the part of a document that triggered the categorization. However, it is likely brittle and difficult to extend.

B. folder wat/learn implements a machine-learning Naive Bayes classifier. There is no embedded knowledge, so the capabilities of training new categories is much greater. The results are noisy and the system requires significant preparation including human annotation and data tokenization decisions, before one can deal with new rule classes or new sentences.

Releases

No releases published

Packages

No packages published

Languages

  • Python 85.4%
  • HTML 9.9%
  • Shell 4.7%