Pinecone gears up to support next-gen web apps with AI-powered database

Today, Pinecone Systems Inc. announced they’re welcoming $28 million of new investment in a series A round supporting further expansion of their vector database technology. The company believes that the next generation of web applications will need more sophisticated ways to search through large data collections, often guided by artificial intelligence (AI). Their database is optimized to speed up queries to improve the efficiency of complex operations.

"Customers are demanding better searches and recommendations from applications because they’re trained by the high-quality search in Google or Amazon’s product recommendation.” explained Greg Kogan, Pinecone's VP of marketing. “Probably you and everyone else had this experience where you search through something other than Google, maybe Wikipedia for example, and you realize, `Something's off here. This doesn't work like I expect.’”

Simplifying database searches

Pinecone’s vector database makes it simpler to search for a long sequence of numbers and find the entry that is closest but not necessarily the same. Existing databases are optimized for exact matches on values like names, keywords or identification numbers, although there are several different approaches to look for similar words.

A large part of the motivation comes from AI scientists, who create machine learning (ML) models that produce vectors in need of storing and searching. These approaches often take some data sample like a block of text or an image and learn the best way to identify salient features. This process will reduce the often long or complex sample to a shorter series of numbers that describe it, a result that is sometimes called an embedding or a vector.

The embedding can be used for a number of different tasks, but Pinecone concentrates its efforts on speeding up these identification and sorting algorithms by delivering a fast way to find the best match in a big collection of vectors created by the same algorithm.

Existing databases already search through multiple fields or columns, but these fields are often thought of as separate. Tabular databases can look for rows that match any number of criteria on the individual rows. Document-style databases like Lucene also match on various records inside the database. They also often support functions that can locate similar records using heuristics.

These algorithms, though, are focused on working with the data as it’s already represented. Lucene, for example, is based on keywords that are already found inside the data. Pinecone can offer better results by using a machine learning model that converts any text into a vector embedding

Some of the common models already working with Pinecone from sources like OpenAI produce a vector of anywhere between 32 and 20,000 floating-point numbers. Pinecone relies upon some various compression-like approaches to turn this into a few numbers that can be easily indexed. These natural language models can produce better results than keyword-based solutions because they are able to work with synonyms and other linguistic ambiguities.

“Search technology has revolved around keywords for hundreds of years; books contained search indexes before the invention of the printing press. Amazingly, today’s predominant search infrastructure still works the same way,” said Edo Liberty, founder and CEO of Pinecone. “Today’s users expect more. They want search results that anticipate and understand their needs, and not just match keywords.”

To deliver this, Pinecone is exploring more sophisticated search algorithms that treat each vector as a point in a multidimensional field. They’re turning to mathematical approaches that reduce the dimensions in a way that simplifies searching without reducing accuracy.

“Without giving away too much detail, I would say you can think of it as a proprietary algorithm that does a form of compression based on random projections and it leads to a much smaller index that we can then scan over similar to Facebook’s Faiss,” explained Ram Sriharsha, VP of engineering at Pinecone.

The advantage of vector database search

Pinecone has added a number of other features to their algorithm that allow them to do a better job of supporting search, especially in the face of fast-moving applications with rapid changes. The team has worked on smoothly updating the index as new data arrives and gets deleted.

“Why use a proprietary index and not build on top of ice or something else? Building our own index gives us the flexibility to add features — not on top of it but into it,” explained Kogan. “Metadata filtering, for example, is a commonly requested feature and a really important one.”

Menlo Ventures is leading the funding round with help from Wing Venture Capital, the firm that led the seed-stage financing. Tiger Global is one of the new investors joining this stage.

Tim Tully, a partner from Menlo Ventures, will be joining the board. Tully was formerly the CTO at Splunk, a firm that specialized in flagging potential security problems by combing through log files and other system alerts. Indeed, looking for potential security problems and performing complex monitoring are other topics that Pinecone predicts will be good jobs for its database.

“We have a [client] company which uses Pinecone to compare and search through security alerts,” said Kogan. “There is no user query. It's not a search engine. The security alerts are not paragraphs of text, necessarily. But using a vector database like Pinecone you can see how similar one alert is to another. Even if they don't share the same IP address or other values.”

Pinecone plans to use the funding to improve both customer-facing teams that can nurture new applications as well as research and development with a longer horizon.

“Many of the largest companies in the world have already embraced the use of vector search, which has given them a distinct advantage over their competitors,” said Tim Tully, partner at Menlo Ventures. “Pinecone is ensuring that regardless of size and budget, companies can integrate next-generation search capabilities into their applications and get to production without the need to develop and maintain an entirely new architecture.”

Simplifying database searches

The advantage of vector database search

More