Meta launches Voicebox and does a world first in Generative AI Speech
Meta AI researchers made a major advancement in the field generative AI with Voicebox. Voicebox, unlike previous models, can generalize speech-generation tasks it wasn’t specifically trained for. This is a state-of-the art performance.
Voicebox is an audio generative system that produces high-quality audio clips. It can generate outputs or modify existing samples. It supports six languages of speech synthesis, noise removal, style conversion and sample generation.
In the past, generative AI speech models required a specific training for every task using carefully selected training data. Voicebox uses a new technique called Flow Matching that outperforms diffusion models. It is up to 20x faster than existing models such as VALL-E, and achieves better results in English text-to speech tasks. Voicebox outperforms YourTTS in cross-lingual style transfers by reducing error rates from 10.9% down to 5.2%, and increasing audio similarity up to 0.481.
Source:
https://tech.hindustantimes.com/tech/news/meta-introduces-voicebox-does-a-first-on-generative-ai-speech-71687025962593.html