REDWOOD CITY, Calif. — When data scientists talk about their work, it isn’t always easy to understand. At the same time, many big data startups are trying to make the technology seem simple.
For BigML, machine learning — a common element of data science — can be a basic two-step process: making generalizations about the data that’s available and then applying those generalizations to new data.
David Gerster, vice president of data science at BigML, provided an example of the company’s approach at work during a talk at the DataBeat/Data Science Summit event today. He sought to predict the species of flowers he picked up in a field based on a set of 150 flowers from the field that he’d already had on hand. First, he measured the length and width of the petals on his existing flowers and plotted the flowers accordingly on a graph — a bunch of dots that generally formed a diagonal line going up and to the right.
Then he divvied up the chart into sections, with vertical and horizontal lines, to show that the flowers could be categorized into three different species. He gave a name to this act of divvying up: He was training a machine-learning model. And that training is something BigML can do.
From there, he took the sections in his chart and plotted new flowers found in a field. Four new flowers landed in different species sections.
“Congratulations,” Gerster said. “We just scored four previously unseen flowers using this model, using this collection of rectangles, and we made a prediction about each one. That process is called scoring.”
So according to BigML, machine learning can be just as simple as that, according to BigML — training and then scoring. (Meanwhile, other startups are trying to take the complexity out of machine learning, including Wise.io and Nutonian.)
Gerster argued that when people talk about wanting to do machine learning in real time, what they might mean is just scoring, or figuring out where new data points line up in relation to historical data, on the fly. The act of training might not need to be done so often.
The example is surely meant to make machine learning seem simple. All BigML has to do now is show the impact its flavor of machine learning can have on day-to-day business and the bottom line.