Voicebox by Meta is an innovative generative AI model for speech developed by Meta AI researchers. It introduces state-of-the-art performance by being able to generalize to speech-generation tasks it was not specifically trained for. Voicebox utilizes a novel approach called Flow Matching, allowing it to learn from raw audio and an accompanying transcription, enabling modification of any part of a given sample, not just the end as in traditional models. This model can synthesize speech in six languages, perform tasks like noise removal, content editing, style conversion, diverse sample generation, and achieve superior performance metrics compared to existing models.
Voicebox's versatility and capabilities make it suitable for various applications such as in-context text-to-speech synthesis, cross-lingual style transfer, speech denoising, and editing. Additionally, it is trained on more than 50,000 hours of recorded speech and transcripts from public domain audiobooks in six languages. While Voicebox is not publicly available to prevent potential misuse, it shows promise for tasks ranging from aiding those who cannot speak to enhancing virtual assistant interactions and facilitating speech assistant model training.
Voicebox by Meta was created by a team of researchers including Matt Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, and Wei-Ning Hsu. Voicebox was launched on June 16, 2023. This generative AI model for speech is a breakthrough in the field, allowing for speech synthesis across six languages and offering features like noise removal, content editing, style conversion, and diverse sample generation.
To use Voicebox by Meta, follow these steps:
No reviews found!