Databricks and Snowflake Adapt for Generative AI


By: Mary Jander

When Databricks signed on to acquire MosaicML this week for approximately $1.3 billion, it heralded the next trend in generative AI – namely, enterprise ownership of data for training large language models (LLMs).

First, a word about the acquisition: Databricks, a mature startup founded in 2013, specializes in the data lakehouse, a model that combines data warehousing – the analysis of data from a common repository – with data lake functionality, meaning the storage of data in its original state. Databricks claims over 7,000 customers worldwide, including ABN Amro, AT&T, Comcast, Conde Nast, Rolls Royce, Scribd, Sega, Shell, TD Bank, Toyota, and Walgreens, among others. It has raised $3.6 billion in several funding rounds from many investors, including Counterpoint Global, Baillie Gifford, Clearbridge, and Andreessen Horowitz.

MosaicML offers LLMs that feature security, speed, and the ability to customize using enterprise-specific datasets as input rather than publicly available data. Founded in 2021 by Naveen Rao (ex-Intel and ex-CEO of Nervana Systems, which was sold to Intel for about $350 million in 2016) and Hanlin Tang (ex-Intel, ex-Nervana), MosaicML raised $64 million from investors including Lux Capital, DCVC, AME Cloud Ventures, Future Ventures, and Playground Global, among others. It claims its MPT-7B LLM has had more than 3.3 million downloads. Its customers include the Allen Institute for AI (AI2), Generally Intelligent, Hippocratic AI, Replit, and Scatter Labs.

MosaicML’s mission was summarized in a blog by cofounders Rao and Tang this week:

“We started MosaicML to solve the hard engineering and research problems necessary to make large scale neural network training and inference more accessible to everyone. With the recent generative AI wave, this mission has taken center stage. We fundamentally believe in a better world where everyone is empowered to train their own models, imbued with their own data, wisdom, and creativity, rather than have this capability centralized in a few generic models.”

The acquisition of MosaicML should give Databricks customers the opportunity to to generate LLMs using the data stored in Databricks lakehouse format. It’s a clever and prescient move toward enabling enterprises to benefit from generative AI in ways not possible just using chatbots and general-purpose models.

One of MosaicML’s customers, Hippocratic AI, for instance, claims to have outperformed OpenAI’s GPT-4 on a range of metrics, including exceeding the chatbot on 105 of 114 tests and certifications. Hippocratic AI launched in May 2023 with $50 million in seed money from General Catalyst and Andreessen Horowitz.

Snowflake Too

While Databricks was busy acquiring MosaicML, its chief competitor, Snowflake (NYSE: SNOW) was also making a generative AI deal. At Snowflake’s Summit 2023 conference in Las Vegas this week, Snowflake and NVIDIA (Nasdaq: NVDA) announced a partnership in which Snowflake customers will use NVIDIA’s NeMo LLM platform to develop their own models – with their own data stored in Snowflake. In a press release, Snowflake CEO Frank Slootman stated:

“Snowflake’s partnership with NVIDIA will bring high-performance machine learning and artificial intelligence to our vast volumes of proprietary and structured enterprise data, a new frontier to bringing unprecedented insights, predictions and prescriptions to the global world of business.”

The two deals cited above indicate a solid trend toward making generative AI work for enterprises concerned with security, authenticity, and specificity. By leveraging data already stored in cloud-native, scalable accounts, large enterprise customers will be able to move quickly and economically to leverage the latest in AI. And who knows, maybe the Databricks/MosaicML pairup will lead Databricks closer to its anticipated IPO, market conditions allowing.