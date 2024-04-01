wordpress blog stats
OpenAI has revealed plans to further develop its voice cloning model called ‘Voice Engine’, which uses text input and a 15-second audio sample to generate a copy of the original speaker’s voice. In a blogpost on March 29, OpenAI shared preliminary results from a small-scale preview model of Voice Engine, which was first developed in 2022 and is used to power the company’s text-to-speech API as well as ChatGPT Voice and Read Aloud.

Amidst deepfake proliferation and concerns about disinformation through misuse of synthetic voices in an election year, OpenAI wants to tread cautiously and ensure responsible deployment of Voice Engine at scale. The company stated that it is not releasing the model for public use as of now.

How is Voice Engine being tested currently?

The blogpost informed that Voice Engine was being tested for working as a reading assistant by ed-tech platforms, translating content, assisting people who are non-verbal, and for assisting community health workers in their native language.

OpenAI stated that the organisations testing the model are prohibited from impersonating another individual or organisation without legal permissions.

“…our terms with these partners require explicit and informed consent from the original speaker and we don’t allow developers to build ways for individual users to create their own voices. Partners must also clearly disclose to their audience that the voices they’re hearing are AI-generated. Finally, we have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used,” the blog added.

It has also recommended establishing voice authentication mechanisms to verify that the original speaker is knowingly adding their voice to the service, alongside a “no-go voice list” to detect and prevent creation of voices that resemble prominent figures.

Recent developments in the US to tackle deepfakes:

In February 2024, the US Federal Communications Commission (FCC) chairwoman Jessica Rosenworcel proposed to make AI-generated robocalls illegal. Rosenworcel urged that the FCC must recognize that calls made with AI-generated voices are “artificial” voices under the Telephone Consumer Protection Act (TCPA), which would make voice-cloning technology (deep fakes) used in common robocall scams targeting consumers illegal. Read more about the report here.

Additionally, US lawmakers on January 30 introduced a new bill called the Disrupt Explicit Forged Images and Non-Consensual Edits (DEFIANCE) Act of 2024, which allows victims of AI-generated porn and deepfakes to sue for compensation.

Alongside, US lawmakers María Elvira Salazar and Madeleine Dean on January 10 had introduced the No Artificial Intelligence Fake Replicas And Unauthorized Duplications (No AI FRAUD) Act, a bill aimed at protecting an individual’s likeness and voice from being used to generate AI fakes. Likeness refers to any distinguishable features of an individual such as their face. Additionally, 20 tech companies including Adobe, Microsoft, Meta, Google, and TikTok have signed an accord to tackle the deceptive use of AI in 2024 elections.

While positive use cases of voice cloning technology cannot be ruled out, it is important to consider the risks of misinformation and deception through impersonation that can cause user harm, especially during elections. Moreover, given that detecting deepfakes that are widely spread on social media platforms is still a major problem, it is important for tech companies to exercise restraint and employ adequate measures to before releasing newer voice cloning tech for public use.

