How lakeFS is Versioning the way to enterprise AI innovation

How lakeFS is versioning the way to enterprise AI innovation
In the modern enterprise, vast amounts of data are continually gathered, refined, and then used for an ever-growing number of AI-driven internal and end-customer use cases. Right now, there’s not an organization on the planet that does not endeavor to do this with greater speed, scale and security, and across every data type and format they’ve got.
Long before the “everything better with AI” fervor swept the enterprise, lakeFS co-founders Einat Orr and Oz Katz saw an opportunity to help streamline and ensure data integrity for data science and engineering workflows with a novel, yet proven approach: manage the data like code. “Git for data” was the idea and that served as the basis for the company’s eponymous open-source project.
At the time Orr and Katz kicked off lakeFS, version control was an often used but siloed capability for software development. But as the need for repeatable and reliable data delivery grew, lakeFS’ popularity surged. Now, version control is emerging as a fundamental layer in AI infrastructure. And lakeFS just raised a $20 million funding round from DTC and other investors to help companies train better models and build better applications by focusing on the key ingredient: their multimodal data.
“When you are training AI models, you rely on multimodal data. And organizations that don’t have the right infrastructure to support this will struggle to get AI systems into production,” Einat Orr, who serves as CEO, said. “Everyone is rethinking their data architecture, because they need to navigate a very stormy world. Things are not stable. You want to build an infrastructure that will take you through that, and that serves you as the AI revolution is happening and evolving.”
Tenured engineers, Orr and Katz knew first-hand the struggles of manually recovering lost assets. With lakeFS, data practitioners can easily version control data in a Git-like model. The technology takes the complexity out of managing data regardless of underlying format, across text, video and images. The result for customers like Lockheed Martin, Microsoft, NASA, Amazon and others is much faster time-to-market for new data-intensive use cases.
“Versioning is no longer something that a tool does for someone. It is an organizational infrastructure for the data that anyone who uses data in the organization can enjoy. And that infrastructure needs to support all data types, from all data sources, for the company’s AI needs,” Orr said.
Scale, reproducibility and trust
With lakeFS, at any moment, enterprises can log the state of every asset in any data repository, review past versions, and quickly revert back to prior states.
“That provides reproducibility. Since you can integrate lakeFS with other tools that you are using, you can have a synced reproducibility of your code, your data and your infrastructure,” said Orr.
Users can interact with the data how they want without compromising the original data set. As they make copies from object storage buckets like S3 and build upon those replications, lakeFS logs the assets and creates a lineage for easy auditing. Users can also expose any changes to team members, as well as integrate them back into the source data, ensuring consistency across the data set’s lifecycle.
“Today, there are many issues with people modifying data while others are still using it. A lot of confusion arises because the data is stored in a single location, like a shared folder” said Orr. “But with data versioning, collaboration and communication become much easier.”
And as fresh data comes in, lakeFS automatically blocks assets that don’t meet a certain quality threshold. This builds up over all trust in data domains and highlights the vital role the versioning layer now plays within enterprise infrastructure.
Everyone in an organization is interested in accessing different versions of their data. But when teams are off using their own versioning in their own tools, massive silos become the challenge. A git-for-data approach gives an organization versioning with full lineage and context.
Open technology, better results
Like other popular open-source projects before it, lakeFS’ vibrant community helps to quickly turn user feedback into new products and features. For example, with lakeFS Mount, users can quickly move experimental AI systems running on a laptop to a GPU, eliminating time-consuming integration shifts and further accelerating time-to-market.
And in a market besieged by subpar, half-baked AI tools, lakeFS’s open-source roots and the community built around it are powerful customer acquisition tools.
“Open technologies build trust. You know that a lot of people have tried it, you know how it was tested,” Orr shared. “The traction we have seen across our open source users has greatly and positively influenced the lakeFS roadmap. It’s also served as a foundation for strong relationships with customers like Amazon, Arm, Lockheed Martin, and Volvo. The fact that lakeFS is now cemented in enterprise AI tech stacks with customers of this caliber definitely fueled the demand for this round of growth funding.”
lakeFS announced a $20M growth round of funding, triple-digit growth, and the plans to expand its engineering and go-to-market teams.