Rockset boosts AI capabilities for vector database search

Real time database vendor Rockset is expanding the AI capabilities of its namesake database with enhanced vector search and scalability.

Rockset's founder roots are in the open source RocksDB key-value store origin created at Meta (formerly Facebook) and an evolution of that technology helps to enable Rockset's real time indexing capabilities. The company has raised a total of $105 million in funding including a $44 million round that was announced in August.

With the new update, Rockset is pushing further into the generative AI realm, with the general availability (GA) of vector search as part of the real time database platform. The vector search capability was first previewed in April and has been expanded and improved over the last several months leading up to full availability now. The technology has already found early success, with discount airliner JetBlue among the early adopters that have provided insight into their usage of Rockset. Alongside the vector search update, Rockset is also rolling out integration with the widely used LangChain tool for AI orchestration and the LlamaIndex data framework tool.

"Our support for vector search is going GA and is getting very sophisticated, now you have an ability to build similarity indexes using approximate nearest neighbor (ANN)," Venkat Venkataramani, co-founder and CEO of Rockset, told VentureBeat. "You can do that at a massive scale while also having real time updates on your vector embeddings and your metadata."

How Rockset is bringing real time indexing to vector search

The market for vector search capabilities has become very competitive in 2023.

Vectors, which are numerical representations of data, play a crucial role in fuelling large language models (LLMs). A variety of specialized vector databases, such as Pinecone and Milvus, have emerged, adding to the expanding array of existing database technologies like DataStax, MongoDB and Neo4j that now facilitate the use of vector embeddings.

Rockset has positioned itself as a real time database which is where it also aims to differentiate with vector search. Venkataramani said that as new data comes into a Rockset database, the database index and vector embeddings are updated in real time, within single digit millisecond latencies. Rockset has an approach the company refers to as compute-compute separation whereby the compute resources used to build the indexes are separated from the resources used for queries enabling the real time data indexing and query performance.

"With almost all other vector databases you can't update in real time you have to rebuild your index periodically," Venkataramani said.

Accelerating ANN vector similarity search

There are a number of different ways that vector search can be enabled including both approximate nearest neighbor (ANN) and the more precise K Nearest Neighbor (KNN) approach.

Venkataramani explained that while ANN provides an approximate nearest neighbor search, KNN finds the top 10 or 20 most similar results exactly, but this can be very computationally expensive, especially for large datasets with billions of vectors. ANN on the other hand returns results that are close enough, but not necessarily the exact top matches.

Rockset uses both KNN and ANN based on the query and data. The SQL queries in Rockset allow combining vector search with other metadata filters. The query optimizer in Rockset picks whether to use KNN or ANN under the hood based on the query and data to provide the fastest results.

Since real-time updates to vector embeddings are foundational to Rockset, ANN indexes reflect the latest data within milliseconds.

Why vector databases aren't going to disappear anytime soon

At OpenAI's dev day earlier this month, OpenAI announced a series of new services that have disrupted the overall generative AI market.

With the OpenAI GPT builder and assistants API in particular, there has been some talk in the industry about why and if vector database technologies will still be needed. In some respects, OpenAI's new efforts have eliminated the need for an organization to have a vector database to support AI applications.

Venkataramani isn't too concerned about the OpenAI's dev day news. In his view, the OpenAI innovations will have the most impact on basic chatbots, not bigger enterprise applications and gen AI use cases.

"You still have very large companies having a lot of security and compliance needs where they can't just send all of their data to a third party company to build their chatbots," he said.

Venkataraman does not expect that the need for vector database capabilities will be diminished for very large and complex datasets that will help to power Retrieval Augmented Generation (RAG). He also noted that there are also use cases beyond chatbots, such as enabling similarity search at scale.

"I don't think vector databases are going anywhere, but I think the use cases are evolving," Venkataraman said. "I think for every use case, where there's some kind of an AI application being built, you still need a workhorse behind the scenes."

How Rockset is bringing real time indexing to vector search

Accelerating ANN vector similarity search

Why vector databases aren't going to disappear anytime soon

More