Inside Cartesia’s Jump from Research to Voice AI Leadership

In the race to create the most seamless, human-like voice interactions, developers are pushing the boundaries of what’s possible with voice AI. The technologies are quickly becoming the backbone of customer experiences powering everything from personalized support to dynamic, interactive, voice-driven applications. With some industry analysts predicting the market for these technologies will surpass $30 billion by 2030, the opportunity has never been more real.
For developers, that opportunity is also a challenge. It’s about building smarter models and delivering systems that scale, adapt, and perform in the real world. That’s where Cartesia comes in. Born from Stanford’s research labs, Cartesia is bridging the gap between groundbreaking innovation and practical application. With tools like Sonic, its flagship text-to-speech model, and the newly launched Ink streaming speech-to-text models, the company is helping developers create the next generation of voice-driven experiences.
We spoke with Cartesia co-founder and CEO Karan Goel to learn more about what spurred him along with co-founders Albert Gu, Arjun Desai, Brandon Yang and Chris Ré to make the jump from academic research to a commercial endeavor, how they’ve navigated the pivot, and the opportunity ahead for Cartesia.
A Founder Watching from Afar
Growing up in New Delhi, Karan was surrounded by entrepreneurs who worked for the family’s 125-year-old scientific equipment manufacturing business. He watched from afar as universities in the US announced transformative breakthroughs in deep learning and other technology that would ultimately prove critical to AI development.
“I never imagined building robots or anything, I was mostly busy playing video games. But in college, I got really excited about AI. But I was in India, while all this amazing stuff was happening in American universities,” he said. “There was so much interesting work happening and I wanted to be part of it.”
Karan came to the US to attend two of the most prominent schools for AI and deep learning research, earning a master’s at Carnegie Mellon, and a PhD at Stanford. It was at Stanford where he met his crew of co-founders and in 2023, they spun their laboratory research out into Cartesia.
“Between when I joined my PhD program and left, the world was completely different,” said Karan. “When I was a grad student, there was a lot of optimism about the role of academia in shaping these systems. But with the speed of innovation, academia was becoming less relevant to the development of frontier systems.”
At Stanford, Karan and his co-founders were building in a hypercompetitive market, one turbocharged by the introduction of ChatGPT and the subsequent rush to create more intelligent frontier models.
“In academia, your only goal is to do interesting things. The more interesting the things, the better. There is no timeline. You can spend five years doing something, and that’s OK,” Karan said. “We were working on problems that were interesting and also very important for the world.”
But at the time, it was only getting harder for academics to secure the compute and financial backing needed to scale the models and technologies they were building. And then came the question of the potential impact of their work. “Having seen my dad run a business, and knowing the impact and scale you can achieve, it was a much more interesting way to develop and commercialize the technology.”
So, ultimately Karan, Albert, Arjun, Brandon, and Chris decided they needed the rigor of – and capital available to – a commercial business to help turn their research into real-world results. Karan added, unlike in academia, “when you’re running a business, you have to be more structured. You have to set up milestones and go after them.”
Connecting Innovation to Application
For Karan, building cutting-edge technology was only part of the challenge. The bigger question was: how do you connect that technology to real-world customer needs? It was about not just solving hard problems but the right ones for commercial success.
Fortunately, Karan brought two key skills from his academic journey to the startup world: the ability to think about and solve hard, interesting problems, and, as he shared somewhat jokingly, “a high tolerance for pain.” Those skills were put to the test as Cartesia tackled one of its first major challenges: rethinking how to build for customers.
The team broke down traditional silos, creating a structure where customers building on Cartesia’s platform could directly connect with the research, engineering, and sales teams behind the technology. “When there’s feedback that’s particularly relevant to how the model performs or operates, there has to be a direct line of communication,” Karan explained.
But listening to customers wasn’t enough. Cartesia also had to anticipate what customers didn’t yet know they needed. “We don’t just respond to customer needs; we help shape them,” said Aaron Melgar, Cartesia’s first go-to-market (GTM) leader, who joined from AWS. “Those use cases influence our model roadmap, but we’re also constantly balancing long-term research bets with immediate product demands, which comes with its own unique challenges.”
Hiring Aaron marked a turning point for Cartesia. As highly technical founders, Karan and his team knew they needed someone who could bridge the gap between their research-first approach and the realities of the market. But they also knew this wasn’t a typical sales role. “The sales process here requires a lot more education and enablement than typical SaaS models,” Aaron explained. “Customers often don’t know what to ask for yet. We need to guide them, show them what’s possible with voice AI, and co-create best practices as the space evolves.”
This approach has given Cartesia a unique perspective. By serving as the foundation for developers building voice AI experiences, the company gets a front-row seat to how applications are built and consumed. “You see all these applications being built, and you see how people are using them,” Karan said. “That feedback gives us a lot of opportunities to build on top of our own infrastructure and models.”
With Cartesia, companies like Forethought and Superdial are delivering powerful, agentic systems centered around voice AI. Forethought uses Cartesia’s technology to power conversational agents that resolve customer inquiries quickly and efficiently, while Superdial builds voice-first platforms enabling seamless, human-like interactions at scale. Together, they’re redefining outstanding customer experiences in the age of voice AI.
For Karan, however, the vision goes far beyond voice AI. “This is first and foremost a real-time intelligence company,” he said. “That means building the fastest, smartest models that never shut off, that can keep interacting with huge amounts of information and data from users.” Real-time intelligence is about creating systems that adapt and evolve in the moment, delivering insights and interactions that feel seamless and human. It’s this broader ambition that drives Cartesia’s innovation and sets it apart in a rapidly evolving market.
One exciting innovative outcome of this vision is Ink, Cartesia’s recently launched family of speech-to-text models. Built on OpenAI’s Whisper, Ink was rearchitected specifically for real-time voice AI, addressing the limitations of Whisper’s original design. Whisper excels at processing large datasets, but it wasn’t optimized for live dialogue. Ink changes that by combining ultra-low latency with high accuracy. It was purpose-built for the challenges of real-world conditions—background noise, accents, telephony artifacts, and fragmented audio. It delivers transcription that feels natural and human, even in the most demanding environments. It’s this exact type of innovation – models enveloped in real world conditions that provide us with a glimpse of real-time intelligence, where technology adapts to and excels at the complexities of human communication. It’s a future that is much closer thanks to Karan and team at Cartesia.
Cartesia recently announced a $64 million Series A that includes DTC’s investment led by Radhika Malik.