31st JULY - 4th AUGUST 2015

Pullman Brisbane - King George Square

Menu
Create Account
  • Mini-Conferences
    July 31
  • Presentations
    August 1-2
  • Sprints
    August 3-4

<-- Back to schedule

Adventures in scikit-learn's Random Forest

Scikit-learn's Random Forests are a great first choice for tackling a machine-learning problem. They are easy to use with only a handful of tuning parameters but nevertheless produce good results. Additionally, a separate cross-validation step can be avoided using the out-of-bag sample predictions generated during the construction of the forest, and finally they make it relatively easy to identify and extract the most important features of the sample data.

In this talk we’ll go through the process of using scikit-learn’s random forests using a financial data-set (of ASX equities) as an example. We’ll begin with a basic overview of the random forest algorithm and of the tuning parameters available and their impact on the effectiveness of the forest. Secondly we’ll go over the basic usage of scikit-learn’s random forests and in the process trouble-shoot some common problems such as dealing with missing sample data. Next we’ll discuss the use of out-of-bag sample predictions as a method for quickly performing cross-validation and optimising the tuning parameters. Finally we’ll look at how to extract information from the model that scikit-learn has generated, most notably the relative importances of the features in the sample data.

Gregory Saunders

Greg has been programming in Python since 1995. He has a PhD in Computer Science and has been working in the financial services industry for over ten years.