31st JULY - 4th AUGUST 2015

Pullman Brisbane - King George Square

Menu
Create Account
  • Mini-Conferences
    July 31
  • Presentations
    August 1-2
  • Sprints
    August 3-4

<-- Back to schedule

Predicting sports winners using data analytics with pandas and scikit-learn

Project: scikit-learn; pandas

The pandas and scikit-learn packages combine together to produce a powerful toolkit for data analytics. In this talk, we will be using them together to analyse the outcome of NBA games, trying to predict the winner of a match. There is plenty of data out there to allow us to create good predictions – the key is getting it in the right format and building the right model.

In this talk we will go through importing data from the net, cleaning it up, creating new features, and building a predictive model. We then evaluate how well we did, using recent NBA data. The model we use will be a decision tree ensemble called a random forest.

Robert Layton

Robert is a data analyst with dataPipeline, providing consultancy, research and development for businesses to integrate data analysis within their organisation. He has worked with the financial and industry sectors, and has also worked with government and law enforcement in a research and development capacity.

Robert also writes regularly on security and data mining topics for a number of outlets, and is a contributor to a number of open source python projects. Robert is also a member of the Ballarat Hackerspace, working with embedded devices and electronics.