Pinterest today open-sourced Querybook, a data management solution for enterprise-scale remote engineering collaboration. The company says the tool, which it uses internally, can help engineers compose queries, create analyses, and collaborate with one another via a notebook interface.
Querybook started in 2017 as an intern project at Pinterest. The development team early on decided on a document-like interface where users could write queries and analyses in one place, with collocated metadata and the simplicity of a note-taking app. Released internally in March 2018, Querybook became the go-to solution for big data analytics at Pinterest. It now averages 500 daily active users and 7,000 daily query runs.
“With Querybook, Pinterest engineers have brought together the power of metadata with the simplicity of a note-taking app for a better querying interface, where teams can compose queries and write analyses all in one place,” a spokesperson told VentureBeat. “Querybook can be set up and deployed in minutes.”
Every query executed on Querybook gets analyzed to extract metadata like referenced tables and query runners. Querybook uses this information to automatically update its data schema and search ranking, as well as to show a table’s frequent users and query examples. The more queries in Querybook, the better documented the tables become.
Querybook also features an admin interface that lets companies configure query engines, table metadata ingestion, and access permissions. From this interface, admins can make live Querybook changes without going through code or config files. And they can create visualizations, including lines, bars, stacked areas, pies, donuts, scatter charts, and table charts.
“The common starting point for any analysis at Pinterest is an ad-hoc query that gets executed on the internal Hadoop or Presto cluster. To continuously make these improvements, especially in an increasingly remote environment, it’s more important than ever for teams to be able to compose queries, create analyses, and collaborate with one another,” Pinterest wrote in a blog post. “We built Querybook to provide a responsive and simple web user interface for such analysis so data scientists, product managers, and engineers can discover the right data, compose their queries, and share their findings.”
Pinterest previously open-sourced Teletraan, a tool that can deploy code onto virtual machines, such as those available from public cloud Amazon Web Services. Prior to this, the company released Terrapin, software designed to more efficiently push data out of the Hadoop open source big data software and make it available for other systems to use.
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more