Meta introduces ‘Voicebox’ Next-Gen AI for Speech model


Meta has unveiled Voicebox, a versatile AI for Speech Generation that offers a range of capabilities. This advanced AI model can perform tasks like speech editing, sampling, and stylizing through in-context learning.

Voicebox – AI for Speech Generation

Voicebox excels in producing high-quality audio clips and editing pre-recorded audio, such as removing unwanted background noise, while maintaining the original content and style.

Additionally, it supports multiple languages, enabling speech production in six different languages. Back in September 2022, the Meta had unveiled “Make-A-Video,” a new AI system that turns text into high-quality video clips.

Voicebox is part of a new wave of generative AI models with various potential applications. For instance, it could enhance virtual assistants and non-player characters in the metaverse by providing natural-sounding voices.

It may also assist visually impaired individuals by enabling AI to read written messages in their friends’ voices. Furthermore, creators can leverage Voicebox to effortlessly create and edit audio tracks for videos, among other possibilities.

The versatility of Voicebox encompasses several tasks, including:

  • In-context text-to-speech synthesis: Voicebox can generate text-to-speech using an audio sample as short as two seconds, matching the style of the provided audio.
  • Speech editing and noise reduction: It can reconstruct interrupted speech segments or replace misspoken words without requiring re-recording. For example, users can remove a dog barking from a speech segment and instruct Voicebox to regenerate it seamlessly, akin to an audio editing eraser.
  • Cross-lingual style transfer: With a speech sample and a text passage in different languages, Voicebox can produce a reading of the text in any of the supported languages (English, French, German, Spanish, Polish, and Portuguese). This feature helps people who speak different languages talk to each other easily.
  • Diverse speech sampling: Voicebox learned from different kinds of data, so it can make speech that sounds like how people talk in the real world in the six languages it knows.

To learn more about Meta’s AI Voicebox Speech Generation, visit – Voicebox

Announcing the updates, Meta posted;

We are happy with Voicebox, our new project that makes sounds with AI. We want to keep learning more about sounds and AI, and we hope other people will use our work to make new things.