Project: | scikit-learn; pandas |
The pandas and scikit-learn packages combine together to produce a powerful toolkit for data analytics. In this talk, we will be using them together to analyse the outcome of NBA games, trying to predict the winner of a match. There is plenty of data out there to allow us to create good predictions – the key is getting it in the right format and building the right model.
In this talk we will go through importing data from the net, cleaning it up, creating new features, and building a predictive model. We then evaluate how well we did, using recent NBA data. The model we use will be a decision tree ensemble called a random forest.
Robert is a data analyst with dataPipeline, providing consultancy, research and development for businesses to integrate data analysis within their organisation. He has worked with the financial and industry sectors, and has also worked with government and law enforcement in a research and development capacity.
Robert also writes regularly on security and data mining topics for a number of outlets, and is a contributor to a number of open source python projects. Robert is also a member of the Ballarat Hackerspace, working with embedded devices and electronics.