Vector Databases Take the Spotlight

Bluedata

By: Mary Jander


Data is fundamental to the creation of generative artificial intelligence (AI) applications, and databases designed to assist the training of large language models (LLMs) have taken off like rockets in 2023.

Some background: Vector databases deploy a technique in which the characteristics of unstructured data are represented by points in numerical arrays. The advantage of this approach is that queries can quickly turn up items that are similar to one another – providing a context for use in answering questions. A request for documents on a specific subject, for instance, can turn up three of the most relevant articles out of thousands available on a single topic. It’s therefore easy to see why generative AI models can benefit from using these sources of information.

If the technology behind vector databases is complex, funding for companies that offer the technology is not. Startups in this area have had no problem scoring significant rounds from influential investors. Following is a sampling of rounds this year, in alpha order by firm:

Qdrant. Founded in 2021 in Berlin by Andre Zayarni (CEO) and Andrey Vasnetsov (CTO), Qdrant scored $7.5 million in seed funding led by Unusual Ventures in April 2023. “We had more than 20 VCs interested — almost all of them wanted to join as co-investors later — and we most probably would have received more offers,” Zayarni told TechCrunch. The startup chose Unusual because it aligned with their open source strategy.

Pinecone. Founded in 2019 by CEO Edo Liberty and headquartered in New York City and Tel Aviv, Pinecone raised $100 million in Series B funding in April 2023. The round was led by Andreessen Horowitz with participation from ICONIQ Growth, Menlo Ventures, and Wing Venture Capital.

TileDB. On October 10, TileDB closed $34 million in a Series B round to further develop its “multi-modal” database for use in a variety of analytics, as well as for LLMs. Back in August, TileDB announced support for vector search. The latest round was led by AlleyCorp with participation from Two Bear Capital, Nexus Venture Partners, Big Pi Ventures, Intel Capital, Uncorrelated, Lockheed Martin Ventures, Amgen Ventures, NTT Docomo Ventures, Verizon Ventures, S Ventures, LDV Partners, and Scale Asia Ventures. TileDB was founded in 2017 by CEO Stavros Papadopoulos and is headquartered in Cambridge, Mass.

Weaviate. Founded in 2019 in Amsterdam by CEO Bob van Luijt, Etienne Dilocker (CTO), and Micha Verhagen, Weaviate raised $50 million in Series B funding April 2023. The round was led by Index Ventures with participation from Battery Ventures, NEA, Cortical Ventures, Zetta Venture Partners, and ING Ventures.

Other startups in this area include Chroma, LanceDB, Marqo, Vespa, and Zilliz. And there are more in the works, as vector capabilities are added to search engines, cloud services, and databases.

Large Providers Add Vector Search

In addition to a raft of startups, leading cloud tech players have been adding vector search capabilities to their database offerings.

Alibaba Cloud offers a vector engine in its AnalyticDB database with LLM connectivity.

AWS offers a preview Vector Engine for its OpenSearch Serverless database. Search and data visualization/analytics service.

Google Cloud offers vector search for CloudSQL and AlloyDB PostgreSQL databases.

IBM has announced plans to add vector capabilities into its watsonx.data service. A preview is planned for the fourth quarter 2023.

Microsoft offers vector search capabilities in preview for its Azure Cognitive Search service.

Oracle recently added AI vectors to Oracle Database 23c.

Futuriom Take: Vector database technology is taking off as the popularity of generative AI models continues to grow. Expect to see vector capabilities added to existing search engines, databases, and cloud services.