Pinterest today announced the availability of Terrapin, a new piece of open-source software that’s designed to more efficiently push data out of the Hadoop open-source big data software and make it available for other systems to use.
Engineers at Pinterest designed Terrapin as a replacement for the open-source HBase NoSQL database for this particular process, because HBase had proven slow and didn’t perform well beyond 100GB of data. The company looked at open-source key-value store ElephantDB as a possible alternative, but that wasn’t perfect, either.
“Terrapin provides low latency random key-value access over such large data sets, which are immutable and (re)generated in entirety,” Varun Sharma, an engineer on Pinterest’s core infrastructure team, wrote in a blog post on the news. “Terrapin can ingest data from S3, HDFS or directly from a MapReduce job, and is elastic, fault tolerant and performant enough to be used for various online applications at Pinterest, such as Pinnability and discovery data.”
Pinterest has been using Terrapin in production for more than a year, and it’s now holding around 180TB of data, Sharma wrote. Now other companies will be able to try it out. It’s live on GitHub now.
Many major web companies make internal tools they’ve developed available to the public. Just a few hours ago, for instance, Facebook released the React Native framework for Android under an open-source license. Facebook, Twitter, LinkedIn, and Airbnb have been quicker to open-source their tools than Pinterest. Typically, Pinterest opts to explain how it has built its software in extensive blog posts instead — so today’s news is notable.
Previous open-source releases from Pinterest include Pinball, PINCache, and Secor.
Sharma’s blog post provides much more detail on Terrapin.