Komprise Streamlines Data for AI

Unstructured data management firm Komprise has added an intelligent “data ingestion engine” to its Smart Data Workflow solution, in a move that should eliminate waste in RAG and LLM pipelines.
Komprise’s Smart Data Workflow is part of its Intelligent Data Management platform, which furnishes a Global File Index to manage unstructured file and object data regardless of its storage location.
To clarify: Unlike a global file system, Komprise's Intelligent Data Management platform doesn’t front all data sources with a unique point of access. Instead, the platform creates an index based on interacting with protocols such as NFS, SMB, and S3, so that data can be analyzed where it resides, whether on premises or in the cloud. Smart Data Workflow adds a point-and-click wizard to search all data and execute functions on data subsets while tagging data with additional metadata.
Now, at no extra charge, Komprise is offering Smart Data Workflow customers something new: an automated, dedicated tool specifically for AI ingestion. According to co-founder, president, and COO Krishna Subramanian, it was formerly possible to use Smart Data Workflow to ingest data into an AI bucket or pipeline, but it wasn’t simple. Customers had to use Komprise’s APIs or deploy data management policies to create an off-site copy of selected data. “Now, there is a wizard in Smart Data Workflows called AI Ingest that steps you through the entire workflow in one place,” she told Futuriom.
There’s more: The AI Ingest wizard streamlines data transfer to AI, doubling the speed compared to “copy and sync” or ETL approaches that copy a file or data item from one source to another, Subramanian said. The swift ingestion achieves this by removing file overhead, optimizing the data flow for RAG or LLM pipelines.
Bottom line? Komprise says that getting just the right data to AI saves costs. According to a company blog:
“AI consumes compute and storage resources for every operation; the more data you give it, the more data it needs to process for each answer. If 70% of this data is irrelevant, you are spending 70% more processing power on irrelevant data that you could save through precise curation. More importantly, your results could be less accurate when you give too much data because of the risk of incorrect or outdated information. You would incur this processing wastage regardless of whether you run the AI in the cloud or in the data center. Data curation eliminates this waste of resources on irrelevant data.”
Fixing Unstructured Data for AI
Komprise’s news is significant on a couple of fronts. First, it augments an already useful approach to managing unstructured data, which is playing a major role in inferencing, the process of deploying corporate data in AI models to facilitate enterprise-specific applications.
The new AI Ingest wizard also takes action on preparing data for AI. It filters out sensitive data, personally identifiable information (PII), duplicates, and irrelevant or outdated items. All of these can foul up RAG or AI workflows by cluttering up context windows, or the amount of data an AI model can process. Clutter can also reduce accuracy and lead to latency in AI pipelines, Komprise says. According to a press release, “Studies show a 10% efficiency drop per 10,000 additional unstructured documents in typical RAG, leading to reduced accuracy and poor outcomes.” In contrast, Komprise claims it can cut 60% to 80% of AI costs by eliminating unnecessary and extraneous data.
Komprise, a Futuriom 50 company, was founded in 2014 by Kumar Goswami, Krishna Subramanian, and Mike Peercy. Its global file index approach competes with hybrid file systems from the likes of CTERA, Nasuni, and Panzura. The company has enjoyed its share of enterprise customers, including Pfiizer, Carhartt, Duquesne University, and many others.
Komprise also has interesting partnerships. With NVIDIA, it can send curated data to storage via NVIDIA’s GPUDirect connectivity technology. It can also work with the NVIDIA NeMo platform. With SUSE Linux, Komprise catalogs unstructured data for SUSE Rancher customers and enriches metadata to help track and select exact datasets as part of AI data governance.
Futuriom Take: With AI Ingest, Komprise is solving a big problem for enterprise customers, namely, how to send only the best data to AI, avoiding inaccuracy and latency issues. If the vendor’s claims prove out, enterprises could use this tool to avoid substantial waste in RAG and AI pipelines.