wordpress blog stats
Connect with us

Hi, what are you looking for?

Here’s all you need to know about Meta’s generative AI ‘Voicebox’ for speech tasks

The AI model can produce audio clips, synthesize speech across six languages, and perform noise removal, content editing

What’s the news: On June 16, 2023, Meta announced ‘Voicebox’, a generative AI model that can carry out speech-generation tasks “it was not specifically trained” for. Although there have been reports of Mozilla working on an open-source Common Voice project in the past, Meta claims that Voicebox is the first speech model of its kind.

What can Voicebox do?

The AI model can produce audio clips, synthesize speech across six languages, and perform noise removal, content editing, style conversion, and diverse sample generation. These functions can be explained with the following use cases:

In-context text-to-speech synthesis: Voicebox can use a minimum two-second long input audio sample to match the audio style and use it for a text-to-speech generation. According to Meta, this can ‘bring speech’ to people unable to speak and customize the voices of non-player characters and virtual assistants.

STAY ON TOP OF TECH POLICY: Our daily newsletter with the top story of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!

Cross-lingual style transfer: The model can help people speaking different languages communicate with each other, provided it gets a sample of speech and a text of the relevant content in the required language. So far, Voicebox has been trained in English, French, German, Spanish, Polish and Portuguese.

Advertisement. Scroll to continue reading.

Speech denoising and editing: Voicebox’s in-context learning can edit segments within audio recordings.

“It can resynthesize the portion of speech corrupted by short-duration noise or replace misspoken words without having to rerecord the entire speech,” said Meta.

Diverse speech sampling: “Voicebox can generate speech that is more representative of how people talk in the real world and across the six languages listed above. In the future, this capability could be used to generate synthetic data to help better train a speech assistant model,” said Meta.

It said that speech recognition models trained on Voicebox-generated synthetic speech perform almost as well as models trained on real speech. Voicebox has a one percent error rate degradation, whereas previous text-to-speech models had 45 to 70 percent error rate degradation with synthetic speech.

Why it matters: While Mozilla’s Common Voice looked to create an open audio bank for easier editing and innovative voice-enabled technologies, Meta’s researchers have gone for a more ambitious approach with this new AI model. However, whether they are making these strides with basic privacy and data rights in mind remains to be seen. The past year has recorded many new emerging technologies like ChatGPT, although we still know very little about the model’s characteristics, etc. So, it will be interesting to see how Meta deals with this model in the future while trying to adhere to the ethical standards around generative AI systems.

Voicebox code not publicly available: While Meta researchers claimed it is important to be open with the AI community, they still decided that they will not be making the Voicebox model or code publicly available at this time.

Advertisement. Scroll to continue reading.

Classifier to distinguish real and synthetic speech: Researchers working on the AI model said that they recognized the model’s “potential for misuse and unintended harm.” As such, they talked in their research paper about a classifier that can distinguish between authentic speech and audio generated with Voicebox.

This post is released under a CC-BY-SA 4.0 license. Please feel free to republish on your site, with attribution and a link. Adaptation and rewriting, though allowed, should be true to the original.

Also Read:

Written By

I'm interested in the shaping and strengthening of rights in the digital space. I cover cybersecurity, platform regulation, gig worker economy. In my free time, I'm either binge-watching an anime or off on a hike.

Free Reads


Given that Apple itself admits that no threat is currently posed by quantum computing, this move might seem like too much too soon, but...


OpenAI argued that common use of the term GPT could lead to ‘infringement issues’ wherein users are likely to assume that products with the...


Flipkart at the moment remains unsure of precisely what it would be able to take over if it were to acquire Dunzo, which has...

MediaNama’s mission is to help build a digital ecosystem which is open, fair, global and competitive.



Notably, Indus Appstore will allow app developers to use third-party billing systems for in-app billing without having to pay any commission to Indus, a...


The existing commission-based model, which companies like Uber and Ola have used for a long time and still stick to, has received criticism from...


Factors like Indus not charging developers any commission for in-app payments and antitrust orders issued by India's competition regulator against Google could contribute to...


Is open-sourcing of AI, and the use cases that come with it, a good starting point to discuss the responsibility and liability of AI?...


RBI Deputy Governor Rabi Shankar called for self-regulation in the fintech sector, but here's why we disagree with his stance.

You May Also Like


Google has released a Google Travel Trends Report which states that branded budget hotel search queries grew 179% year over year (YOY) in India, in...


135 job openings in over 60 companies are listed at our free Digital and Mobile Job Board: If you’re looking for a job, or...


By Aroon Deep and Aditya Chunduru You’re reading it here first: Twitter has complied with government requests to censor 52 tweets that mostly criticised...


Rajesh Kumar* doesn’t have many enemies in life. But, Uber, for which he drives a cab everyday, is starting to look like one, he...

MediaNama is the premier source of information and analysis on Technology Policy in India. More about MediaNama, and contact information, here.

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ

Subscribe to our daily newsletter
Your email address:*
Please enter all required fields Click to hide
Correct invalid entries Click to hide

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ