LinkedIn today is making its Pinot real-time analytics software available under an open-source license. It’s the latest major release of open-source software from LinkedIn, a company known for using large quantities of data to enrich its own applications.
The Pinot software, which LinkedIn talked about publicly for the first time in September, is now available on GitHub for people to download. LinkedIn engineers are bringing attention to the launch in a blog post today.
Pinot is designed to be both highly scalable and fault-tolerant, so that it can serve up results to LinkedIn’s live web application pretty much right away, Kishore Gopalakrishna, the technical lead for Pinot at LinkedIn, told VentureBeat in an interview. It offers low latency, high throughput, and an SQL-like interface.
Pinot has been powering LinkedIn’s Who’s Viewed Your Profile feature for the past year and a half or so, Gopalakrishna said.
“We used to deal with database technologies such as Oracle and MySQL, but these work well until a certain point,” he said. Running batch jobs off of data sitting in the Hadoop open-source file system and putting data into a relational database meant that it would take LinkedIn hours or even days to show profile views to users. With Pinot, it takes much less time.
The open-source tool that most closely resembles Pinot is Druid. One thing Gopalakrishna and his colleagues worried about with Druid was scale. “We wanted to do, like, 1 billion events per day,” he said. (Druid, to be fair, does scale. The Druid cluster at advertising analytics startup Metamarkets ingests 1.5 trillion events per month, cofounder and chief executive Mike Driscoll told VentureBeat. The system manages transactional data from programmatic advertising marketplaces, Driscoll said.)
LinkedIn is moving more and more analytics workloads, like job postings and ads, to Pinot, said Gopalakrishna.
LinkedIn over the years has developed a reputation for employing many data scientists who build data products — features that are made more intelligent by digital breadcrumbs of data that LinkedIn collects — and track usage to improve the company’s applications and help executives make better decisions.
The company has previously open-sourced several tools it has built in house for handling data, including Azkaban, Kafka, Samza, and Voldemort. Some of the LinkedIn employees who developed Kafka recently set out to build a startup around it. Time will tell if Pinot will develop enough adoption to persuade its creators to do the same thing.
But three fast-growing San Francisco startups have already shown interest in using Pinot to meet their own analytics needs, a LinkedIn spokesperson told VentureBeat in an email.
“Once people realize what this allows them to do, then I think adoption will naturally happen,” Gopalakrishna said.