Skip to content

vatodorov/snowman

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snowman

This project focuses on the use of convolutional neural networks for the identification of malicious URLs. The modeling work is largely inspired by

The resulting models are deployed through a REST API so that model scoring and prediction is exposed for generic use.

Contents

  1. Use
  2. Installation
  3. Deployment
  4. Backend
  5. Modeling

Use

This model is deployed through Heroku and can be accessed through a REST API. By hitting the endpoint /model, you are able to supply a query domain that you suspect may be malware related, and will receive a score from the model.

curl http://afternoon-bastion-75939.herokuapp.com/model -d "url=zis32msdi3.com" -X PUT

This request returns

{"input query": "zis32msdi3.com", "score": "0.19122"}

Because I'm using the Heroku free tier, the application will go into idle sleep after 30 minutes without any requests. If application is not responsive, it is probably asleep. But good news is that your request should wake it back up and it'll be back in action within a few seconds.

Installation

Set up and activate a virtual environment.

cd some-file-path/snowman
virtualenv env
source env/bin/activate

Then use pip to install required packages.

pip install -r requirements.txt

Deployment

My deployment strategy uses heroku. You can quickly get a free account which access to their free-tier deployments. And you will download the heroku CLI which allows you to deploy apps to heroku and interact with your deployments through your command line (no ssh and scp of materials).

cd some-file-path/snowman
heroku create

This will kick off various set up scripts. Then then deploy code to herko by pushing to a git branch called heroku (that get's made for you).

git push heroku master

This pushes out your materials to your heroku instance (called a dyno) and will install all required packages on your dyno. It will also startup your app according to the specifications set in the Procfile. You can check on the progress of set up using

heroku logs

Once complete, your app will be serving on some randomly-generated subdomain such as "afternoon-bastion-75939.herokuapp.com". You can then query whatever endpoints and services you have set up for yourself. One note about the herkou free tier - because it's free they will auto-hibernate your dyno after 30 minutes of no requests to your app. Once in hibernation, your app won't be able to quickly respond to new requests. But, any new request that comes in while it is asleep will have the affect of waking up your dyno. So then the app gets kicked back into action and will be able to handle requests again in a few seconds.

Backend

For deploying the model behind a backend REST API, I'm using python's Flask- more specifically, I'm mostly using flask_restful for defining endpoints and actions. A previously-trained model is deserialized and loaded. Then when the endpoint "/model" is queried with a PUT or a POST, the provided query string is run against the model for scoring. As described above, hosting is done with heroku.

Modeling

The model I'm using here is a convolutional neural network adapted for text classification. The convolutional kernels, pooling layers, and activations are all fairly standard. I suppose the slightly uncommon thing is how to use convolution with text data - this is why the first layer in the network is an embedding layer. Each character in a string is represented as a point in a low dimensional vector space. Thus, a whole string (character sequence) gives us a numeric array - to which I zero-pad to standard the sizes across all strings in the training data. Then, to this numeric array, we can apply all the convolutions we desire. More details about these ideas can be found in the Yoon Kim paper (above), where this kind of network was applied to movie review sentitment classification.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.3%
  • Shell 2.7%