Skip to content

xinmeng1/MobileBotnetAnalysisLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MobileBotnetAnalysisLab (MBAL)

##0 简介

该程序包使用Python和Tshark提取手机捕获的traffic文件中的特征, 并且进行处理. 从而生成适合WEKA使用的Machine Learning的数据集.

本程序包处理的原始数据为 pcap 文件. 最终生成WEKA可以识别的csv文件. 具体如何处理及原始文件要求参见下面内容.

##1 原始数据要求

本程序包目标是处理原始pcap流量数据包文件,生成可以用于Weka的数据集文件, 然后进行Machine Learning 分析.

目前程序使用surprised Machine Learning, 所以需要对数据包进行标记. 所以原始数据pcap文件至少包含两个文件,

  1. PD0: Normal Pcap 数据包, 所有流量均为正常软件生成的流量数据
  2. PD1: Infected Pcap数据包, 流量包含了正常的软件流量数据及malware applications 生成的流量数据

本程序包的工作尽可能的在PD1中分离出和PD0相似的数据包, 然后进行标记为normal,剩下的标记为infected.

此处主要使用IP进行筛选, 是否合理?

该筛选的依据是PD0和PD1采集的环境, 设当前手机环境为S0, Normal Application集合为 APP0, Malware Application集合为 APP1. 那么DP0采集的环境为S0+APP0, 而DP1采集的环境为S0+APP0+APP1. 而且以上环境均未对S0,APP0,APP1进行任何的操作, 采集到的流量均为background流量.

此处假设所有的Malware Application均会自动实施malware Action, 即使用户没有任何操作, Malware Application同样会实施malware Action.(是否合理, 直观上是合理的)

##2 数据处理流程

###2.1 Tshark处理

我们首先使用Tshark对PCAP信息进行提取, 生成使用空格分隔的TXT文件, 每一行代表一个packet, 空格隔离单条packet的属性. 例如: ip protocol size ....

这里也会用到pcap合并的命令等.

###2.2 Python处理

该处理过程为主要处理过程, 分析TXT文件, 然后生成CSV文件, 其中关键问题是, 如何对数据包进行标记(Normal or Infected)

##3 Experiment Note

3.1. Datasets Management

We have 5 Datasets now

  1. [First scenario]:2014_10_08 12pm-6pm Install 12 normal applications and capture the traffic on the mobile. (Not run these applications and just capture the background traffic) normal_traffic_dataset_1.pcap
  2. [Second scenario]: 2014_10_09 12pm-6pm Base on the the first scenario, install 160+ Botnet malware applications and capture the traffic on the mobile. (Not run these applications and just capture the background traffic)abnormal_traffic_dataset1.pcap
  3. [Third scenario]: 2014_10_10 12pm-6pm Base on the second scenario, try to run some applications on the mobile during the process to capture the traffic on the mobileabnormal_traffic_dataset2.pcap
  4. [Fourth scenario]: 2014_10_14 12pm-6pm Reset the mobile and install the same normal application and 50+ other botnet malware application and capture the traffic on the mobile. (Not run these applications and just capture the background traffic)abnormal_traffic_dataset3.pcap
  5. [Fifth scenario]: Install 10 families of malware applications, and run every applications. And then capture the traffic generated by the specific malware application. (got a lot of traffic file and then combine them to only one files).abnormal_traffic_dataset4.pcap
  6. **[Sixth scenario]**Also capture some normal traffic, the difference with the First Scenario is that we run some normal application and capture traffic from the specific applications.normal_traffic_dataset_2.pcap

Additional one:

According to the paper, I have try to install 10 categories Botnet malware on the mobile device and run them respectively for approximately 10 minutes. In the meanwhile, capture the traffic for every individual malware application. So there traffic contain some infected packet, but the problem is that, we need to clean up these traffic and find some normal packet in the infected traffic file.

2. The workflow for Data process

st=>start: Start Data Process
e=>end
op=>operation: [tPacketCapturepro] 
capture traffic on Mobile
to PCAP files
op2=>operation: [mergecap] merge 
PCAP the files
op3=>operation: [Tshark] extract 
features to TXT files
op4=>operation: [Python] convert
TXT files to CSV files
op5=>operation: [Weka] convert
CSV files to ARFF files
op6=>operation: [Weka] 
Machine Learning
cond=>condition: Yes or No?

st->op->op2->op3->op4->op5->op6->e

3. Process to Label the Packet

st=>start: Start Label Packet
[Python]
e=>end
op1=>operation: Process all packet 
in Normal PACP traffic file
op2=>operation: Store the Packet in
Database [NPDB]
op3=>operation: Process Infected PACP
traffic file
op4=>operation: matching packet with 
NPDB with algorithm [MatchNormal]
cond1=>condition: match or not?
op5=>operation: packet labeled with Normal
op6=>operation: packet labeled with Infected
cond2=>condition: precess end?
st->op1->op2->op3->op4->cond1
cond1(yes)->op5
cond1(no)->op6
op5->cond2
op6->cond2
cond2(yes)->e
cond2(no)->op3

We also can consider the algorithm [MatchNormal] as the cleaning up the infected traffic. It's need more complex and accurately algorithm to realise.

4. Python Script Design

  1. Configuration file

About

Machine Learning for Mobile Botnet Analysis Lab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages