Skip to content

wildtreetech/forests-intro

Repository files navigation

Introduction to random forests

Trees and forests, robust estimators for the 99%.


This is an interactive tutorial that will take about 60minutes. By the end you will know:

  • the basics of scikit-learn
  • how to use Decision Trees and Random Forests
  • how to use cross-validation to measure performance
  • that there are many metrics by which to measure performance

Shown at PyZurich July 2016.

You can either install python on your computer and run these notebooks or you can run them in the cloud by clicking the "binder" button below:

Binder

(the service is free so sometimes they do maintenance etc and it isn't available)

Get started

Install instructions

Anaconda is a python distribution that is easy to install and contains a large number of commonly used libraries. Download anaconda, clone this repository, and then from this directory run:

conda create -n forests-intro --file=environment.yml

This will create an environment with all the dependencies for these examples.

Try it

After setting up the dependencies, activate your conda environment with source activate forests-intro. To run the examples simply run jupyter notebook from a terminal in this directory.

Additional material

Two very nice (and pretty) explanations of how decision trees and neural networks work:

How to get Unbiased performance estimates, read this to find out why you need to keep some of your data secret and use it only once

Gilles Louppe's well written PhD thesis on Understanding Random Forests. Much more precise and formal than my descriptions.

License

Creative Commons License
Geneva's Humanitarian Big Data by Tim Head is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

About

🌲🌳🌴 Trees and forests, machine-learning for the 99%. A 1hr introduction.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published