Skip to content

Feature generation for web page identification from network traffic with SVM.

Notifications You must be signed in to change notification settings

sinofeng/wpi-svm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wpi-svm

A tool to generate features (as input to libsvm) to identify web page from network traffic with Support Vector Machine.

Dependencies

[pyTree] (https://github.com/caesar0301/pyTree): python module implementing tree data structure. This module must be put into lib folder to be quoted correctly.

libsvm: A Library for Support Vector Machines

Main Programs

FGen_log.py: generate features from logs

FGen_har.py: generate features from HAR files

har2log.py: convert HAR files into HTTP logs with format below

HTTP Logs

  • Log file is generate by another project http-sniffer, which sniffers raw network traffic (trace files) and extracts HTTP logs.

  • If there isn't raw network trace on hand and only HAR format files (exported by like firebug), you can convert these HAR files into logs using har2log.py; but this method is just for validation. We also provide a tool FGen_har.py to generate features directly from HAR files without convertion.

  • Log format

Plain text, each line records a web page element. Lost item is replaced by 'N/A'.

[time]\t\t[dns]\t\t[connect]\t\t[send]\t\t[wait]\t\t[receive]\t\t[flow-id]\t\t[user-agent-id]\t\t[sourceip]\t\t[sourceport]\t\t[destip]\t\t[destport]\t\t[request-version]\t\t[response-version]\t\t[request-method]\t\t[response-status]\t\t[request-header-size]\t\t[request-body-size]\t\t[response-header-size]\t\t[response-body-size]\t\t[response-content-type]\t\t[url]\t\t[referrer]\t\t[redirect-url]

Tools Folder

A collection of tools to analysis data for research paper.

About

Feature generation for web page identification from network traffic with SVM.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%