• Meta unveils speech generation AI: Voicebox
• Voicebox can generate speech from text and match an audio style based on a sample
• The AI model can also convert text to another language and edit existing recordings

Meta Unveils Speech Generation AI: Voicebox

Meta, the parent company of Facebook and Instagram, announced a speech-generation AI model called Voicebox on June 16. This tool could allow virtual assistants, non-player characters in its metaverse, content creators, and users with accessibility needs to utilize realistic voices.

Voicebox Features

Voicebox has various features that make it a powerful tool for many users. It can generate speech from text and match an audio style based on a sample just two seconds long. It can also convert a text sample to another language and read the translated text in the speaker’s original voice. In addition, it is capable of editing existing recordings to remove background noise and create speech modeled on diverse samples.

Benefits for Users

Voicebox provides many benefits for its users. Content creators could use it to create realistic voices for their characters or narrations without having to pay an actor or voiceover artist. It could also be useful for virtual assistants as they would able to communicate more naturally with their users using different accents or dialects. Additionally, Voicebox could help those with accessibility needs by providing them with easier access to content in their native languages.

Supporting Languages

At the moment, Voicebox supports six languages: English, French, German, Spanish, Polish, and Portuguese. However Meta is planning on adding more languages in the future depending on user feedback as well as demand from other countries’ markets.


Meta’s new speech-generation AI model called Voicebox has numerous features that make it incredibly useful for many kinds of user ranging from content creators to those with accessibility needs who need access content in their native language. This tool has the potential to revolutionize how we interact with technology by making our conversations more natural than ever before!