Google and the NCAA are staging a data science contest to see who can best pick a bracket for the upcoming March Madness college basketball tournament. The two organizations have ponied up some serious incentives for people to compete, with a total of $100,000 in prize money available.
The contest will be hosted by Kaggle, a platform for data science competitions that Google acquired last year. Teams will compete in trials for both the Division I men’s and women’s tournament. For each of those tournaments, the first place team will receive $25,000, the second place team will receive $15,000, and the third place team will receive $10,000.
This is hardly the first time AI techniques have been used to tackle the question of how to build a March Madness bracket, nor even the first time Kaggle has hosted a competition to see which machine reigns supreme. Members of the online machine learning competition community have competed since 2014, mostly for bragging rights and ranking on the platform.
Microsoft’s Bing search engine and Cortana virtual assistant also offer that tech titan’s predictions for who’s going to succeed.
Google is using this tournament as an opportunity to drum up interest in machine learning, especially for its own cloud services. The company will be co-hosting 20 events to teach college students about machine learning skills and how to use Google Cloud Platform. Kaggle CEO Anthony Goldbloom will hold an “ask me anything” question and answer session on Reddit Wednesday at 10 a.m. Pacific to discuss the application of machine learning to topical events like this competition.
Right now, Kaggle teams can access data for how particular schools have performed in the past, in order to build their models for this year’s competition. On March 11, the NCAA will announce the 68 college teams that will participate in the tournament. The following day, data scientists can start submitting their predictions, until 7 a.m. Pacific on March 15.
Teams will be judged based on a log loss function, which evaluates whether a model’s prediction is correct, as well as how confident that model was. It’s built to penalize confident incorrect predictions the most. The smaller a team’s log loss is, the better.
Once the submission period closes, teams will be able to watch and see how their models handle the live results of the games played.