machinelearning

Experiments in machine learning for web logs presented at bsides Vancouver 2014.

The python notebooks can be run via::

ipython notebook

in the directory where you place the notebooks.

The mlTail.py program is meant to take incoming logs via stdin and produce a report of bad actors after stdin has stopped.

These tools all use the excellent topic modelling library: gensim available via::

pip install gensim

Setup

Ideally you should read the presentation and the ipython notebooks. But if you can't wait:

Steps:

Separate your baseline apache logs into good and bad via the goodFromBad.py script: ./goodFromBad.py
Send a sample log into mltail.py: cat sample.log| ./mltail.py -c options.conf
Bask in the glorious output of machine learning telling you who is attacking you
Buy me a beer

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
all-attacks-unix.txt		all-attacks-unix.txt
bsides 1--weblog topic modelling-intro.ipynb		bsides 1--weblog topic modelling-intro.ipynb
bsides 2--weblog topic modelling-intro2 weblogs.ipynb		bsides 2--weblog topic modelling-intro2 weblogs.ipynb
bsides 3--separate good and bad.ipynb		bsides 3--separate good and bad.ipynb
bsides 4--weblog topic modelling corpus good and bad.ipynb		bsides 4--weblog topic modelling corpus good and bad.ipynb
bsides 5--weblog topic modelling corpus good and bad tokenized.ipynb		bsides 5--weblog topic modelling corpus good and bad tokenized.ipynb
cats or dogs-Multiple Transformations.ipynb		cats or dogs-Multiple Transformations.ipynb
cats or dogs.ipynb		cats or dogs.ipynb
dogs.txt		dogs.txt
goodFromBad.py		goodFromBad.py
mltail.py		mltail.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

all-attacks-unix.txt

all-attacks-unix.txt

bsides 1--weblog topic modelling-intro.ipynb

bsides 1--weblog topic modelling-intro.ipynb

bsides 2--weblog topic modelling-intro2 weblogs.ipynb

bsides 2--weblog topic modelling-intro2 weblogs.ipynb

bsides 3--separate good and bad.ipynb

bsides 3--separate good and bad.ipynb

bsides 4--weblog topic modelling corpus good and bad.ipynb

bsides 4--weblog topic modelling corpus good and bad.ipynb

bsides 5--weblog topic modelling corpus good and bad tokenized.ipynb

bsides 5--weblog topic modelling corpus good and bad tokenized.ipynb

cats or dogs-Multiple Transformations.ipynb

cats or dogs-Multiple Transformations.ipynb

cats or dogs.ipynb

cats or dogs.ipynb

dogs.txt

dogs.txt

goodFromBad.py

goodFromBad.py

mltail.py

mltail.py

Repository files navigation

machinelearning

Setup

About

Releases

Packages

Languages

License

jeffbryner/machinelearning

Folders and files

Latest commit

History

Repository files navigation

machinelearning

Setup

About

Resources

License

Stars

Watchers

Forks

Languages