Advertisement: Click here to learn how to Generate Art From Text
There’s a lot of money in voice cloning.
Consider this example: ElevenLabs, a startup that develops AI-powered tools for creating and editing synthetic voices, announced today that it closed a $80 million Series-B round led by prominent investors, including Andreessen Hoowitz, former GitHub Chief Executive Nat Friedman, and entrepreneur Daniel Gross.
The round, which also had participation from Sequoia Capital, Smash Capital, SV Angel, BroadLight Capital and Credo Ventures, brings ElevenLabs’ total raised to $101 million and values the company at over $1 billion (up from ~$100 million Last June). CEO Mati Staniszewski says the new cash will be put toward product development, expanding ElevenLabs’ infrastructure and team, AI research and “enhancing safety measures to ensure responsible and ethical development of AI technology.”
“We raised the new money to cement ElevenLabs’ position as the global leader in voice AI research and product deployment,” Staniszewski told TechCrunch in an email interview.
ElevenLabs, co-founded by Piotr Dabbkowski, an ex Google machine learning engineer, along with Staniszewski (a former Palantir strategy strategist), was launched as a beta version around a month ago. Staniszewski claims that he and Dabkowski grew up in Poland and were inspired to create voice-cloning tools after watching poorly dub American films. They thought that AI could do a better job.
ElevenLabs’ browser-based speech generator app is perhaps the best known today. It can create lifelike voice with adjustable toggles for emotion, cadence, and other vocal characteristics. Users can enter text for free and have it read out loud by one of the default voices. Paying customers can upload voice samples to craft new styles using ElevenLabs’ voice cloning.
ElevenLabs invests in more and more versions of its speech-generating technology aimed at creating audiobooks, dubbing films and television shows, and generating character voices to be used in games and marketing activities.
Last year, the company released a “speech to speech” tool that attempts to preserve a speaker’s voice, prosody and intonation while automatically removing background noise, and — in the case of movies and TV shows — translates and synchronizes speech with the source material. The company is planning to release a new workflow for dubbing studios that includes tools to create and edit translations, as well as a mobile application that narrates text and webpages using ElevenLabs’ voices.
ElevenLabs’ innovations have won the startup customers in Paradox Interactive, the game developer whose recent projects include Cities: Skylines 2 and Stellaris, and The Washington Post — among other publishing, media and entertainment companies. Staniszewski claims ElevenLab users generated the equivalent to more than 100 years’ worth of audio, and that 41% of Fortune 500 employees use the platform.
But the publicity hasn’t been totally positive.
The notorious message board 4chan is known for its conspiracy-laden content. You can use ElevenLabs’ tools to share hateful messages mimicking celebrities like actress Emma Watson. The Verge’s James Vincent was able to tap ElevenLabs to maliciously clone voices in a matter of seconds, GeneratingSamples containing everything from racist and transphobic remarks to threats of violence. Joseph Cox is a reporter at Vox. generating a clone convincing enough to fool a bank’s authentication system.
ElevenLabs responded by launching a tool that detects speech generated by its platform. It also tried to track down users who repeatedly violated its terms of service which prohibits abuse. This year, ElevenLabs plans to improve the detection tool to flag audio from other voice-generating AI models and partner with unnamed “distribution players” to make the tool available on third-party platforms, Staniszewski says.
ElevenLabs has also faced criticism from voice actors who claim that the company uses samples of their voices without their consent — samples that could be leveraged to promote content they don’t endorse or spread mis- and dis-information. In a The latest news about the newest Vice article, victims recount how ElevenLabs was used in harassment campaigns against them, in one example to share an actor’s private information — their home address — using a cloned voice.
Then there’s the elephant in the room: the existential threat platforms like ElevenLabs pose to the voice acting industry.
Motherboard Writers about how voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them — sometimes without commensurate compensation. The fear is that voice work — particularly cheap, entry-level work — will eventually be replaced by AI-generated vocals, and that actors will have no recourse.
Some platforms are attempting to strike a balanced. Earlier this month, Replica Studios, an ElevenLabs competitor, signed a deal with SAG-AFTRA to create and license digital replicas of the media artist union members’ voices. In a press release, the organizations said that the arrangement established “fair” and “ethical” terms and conditions to ensure performer consent — and negotiating terms for uses of digital voice doubles in new works.
Even this didn’t please some voice actors, however — including SAG-AFTRA’s Own members.
ElevenLabs’ solution is a marketplace for voices. The marketplace is currently in beta and will be available to the public in the coming weeks. Users are able to create, verify, and share a voice. Staniszewski claims that the original creators of a voice receive compensation when others use it.
“Users always retain control over their voice’s availability and compensation terms,” he added. “The marketplace is designed as a step towards harmonizing AI advancements with established industry practices, while also bringing a diverse set of voices to ElevenLabs’ platform.”
Voice actors may take issue with the fact that ElevenLabs isn’t paying in cash, though — at least not at present. The current setup has creators receiving credit toward ElevenLabs’ premium services (which some find ironic, I’d wager).
Perhaps that’ll change in the future as ElevenLabs — which is now among the best-funded synthetic voice startups — attempts to beat back upstart competition like Papercup, Deepdub, ElevenLabs, Acapela, Respeecher and Voice.ai as well as Big Tech incumbents such as Amazon, Microsoft and Google. In any case, ElevenLabs, which plans to grow its headcount from 40 people to 100 by the end of the year, intends on sticking around — and making waves — in the fast-growing synthetic voice market.