HT TECH wants to start sending you push notifications. Click allow to subscribe

Meta introduces Voicebox, does a first on Generative AI speech

Meta introduces Voicebox which produce high-quality audio clips in a wide variety of styles.

By: MEDHA JHA
Updated on: Jun 17 2023, 23:59 IST
Voicebox can generalize to speech-generation tasks (Meta AI)

Meta AI researchers have moved a step forward in the field of generative AI for speech with the development of Voicebox. Unlike previous models, Voicebox can generalize to speech-generation tasks that it was not specifically trained for, demonstrating state-of-the-art performance.

Voicebox is a versatile generative system for speech that can produce high-quality audio clips in a wide variety of styles. It can create outputs from scratch or modify existing samples. The model supports speech synthesis in six languages, as well as noise removal, content editing, style conversion, and diverse sample generation.

You may be interested in

Mobiles Tablets Laptops
7% OFF
Apple iPhone 15 Pro Max
  • Black Titanium
  • 8 GB RAM
  • 256 GB Storage
₹148,900₹159,900
Buy now
28% OFF
Samsung Galaxy S23 Ultra 5G
  • Green
  • 12 GB RAM
  • 256 GB Storage
₹107,999₹149,999
Buy now
Google Pixel 8 Pro
  • Obsidian
  • 12 GB RAM
  • 128 GB Storage
₹106,998
Check details
Apple iPhone 15 Plus
  • Black
  • 6 GB RAM
  • 128 GB Storage
₹87,900
Check details
21% OFF
Acer Swift Go SFG14 41 NX KG3SI 002 Laptop
  • Pure Silver
  • 8 GB RAM
  • 512 GB SSD
₹58,999₹74,999
Buy now
41% OFF
Acer Aspire 5 A515 57G Laptop
  • Gray
  • 16 GB RAM
  • 512 GB SSD
₹52,990₹89,999
Buy now
22% OFF
Acer Aspire 3 A315 24 NX KDESI 004 Laptop
  • Silver
  • 8 GB RAM
  • 512 GB SSD
₹33,499₹42,999
Buy now
40% OFF
Asus VivoBook 15 X515JA BQ322WS Laptop
  • Transparent Silver
  • 8 GB RAM
  • 512 GB SSD
₹31,350₹51,990
Buy now
35% OFF
Xiaomi Pad 6
  • Mist Blue
  • 6 GB RAM
  • 128 GB Storage
₹25,999₹39,999
Buy now
55% OFF
Lenovo Tab M10 5G
  • Abyss Blue
  • 6 GB RAM
  • 128 GB Storage
₹20,999₹47,000
Buy now
32% OFF
Realme Pad 2
  • Imagination Grey
  • 6 GB RAM
  • 128 GB Storage
₹19,718₹28,999
Buy now
Honor Pad X9
  • Gray
  • 4 GB RAM
  • 128 GB Storage
₹14,999
Check details

Traditionally, generative AI models for speech required specific training for each task using carefully prepared training data. However, Voicebox adopts a new approach called Flow Matching, which surpasses diffusion models in performance. It outperforms existing state-of-the-art models like VALL-E for English text-to-speech tasks, achieving better word error rates (5.9% vs. 1.9%) and audio similarity (0.580 vs. 0.681), while also being up to 20 times faster. In cross-lingual style transfer, Voicebox surpasses YourTTS by reducing word error rates from 10.9% to 5.2% and improving audio similarity from 0.335 to 0.481.

Also read: Looking for a smartphone? To check mobile finder click here.

One of the main limitations of existing speech synthesizers is that they rely on monotonic. They clean data that is difficult to produce and limited in quantity. However, Voicebox overcomes this limitation by leveraging the non-deterministic mapping capabilities of the Flow Matching model. This allows Voicebox to learn from a diverse range of speech data without the need for meticulous labeling. The model was trained on over 50,000 hours of recorded speech and transcripts from public domain audiobooks in multiple languages.

Voice box can perform a variety of task including:

1-In-context text-to-speech synthesis: Voicebox's versatility enables it to excel in various speech generation tasks. It can perform in-context text-to-speech synthesis by matching the audio style of a given input sample and using it for generating speech from text. This capability has potential applications in assisting people who are unable to speak or customizing voices for non-player characters and virtual assistants.

2-Cross-lingual style transfer: Voicebox demonstrates proficiency in cross-lingual style transfer. By providing a sample of speech and a text passage in one of the supported languages, i.e English, French, German, Spanish, Polish, or Portuguese, Voicebox can produce a reading of the text in that language. This feature has the potential to facilitate natural and authentic communication between individuals who speak different languages.

3-Speech denoising and editing:

Voicebox also excels in speech denoising and editing tasks. Leveraging its in-context learning, the model can generate speech to seamlessly edit segments within audio recordings. It can replace misspoken words or synthesize portions corrupted by short-duration noise, without requiring the re-recording of the entire speech. This capability simplifies the process of cleaning up and editing audio recordings, similar to popular image-editing tools for adjusting photos.

4- Voicebox's ability to learn from diverse, real-world data allows it to generate speech that better represents how people naturally communicate in the six supported languages. This capability can be leveraged to generate synthetic data for training speech assistant models. Models trained on Voicebox-generated synthetic speech exhibit similar performance to models trained on real speech, with only a 1% error rate degradation compared to the significant degradation observed with synthetic speech from previous text-to-speech models.

While the researchers acknowledge the exciting use cases for generative speech models, they have decided not to make the Voicebox model or code publicly available at this time due to the potential risks of misuse. Responsible development and use of AI are paramount, and striking a balance between openness and responsibility is crucial. Instead, the researchers have shared audio samples and a research paper detailing the approach, results, and the creation of an effective classifier to distinguish between authentic speech and audio generated with Voicebox.

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on ,Twitter, Facebook, , and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 17 Jun, 23:58 IST

Sale

Mobiles Tablets Laptops
4% OFF
Samsung Galaxy S24 Ultra
  • Titanium Black
  • 12 GB RAM
  • 256 GB Storage
₹129,999₹134,999
Buy now
7% OFF
Apple iPhone 15 Pro Max
  • Black Titanium
  • 8 GB RAM
  • 256 GB Storage
₹148,900₹159,900
Buy now
13% OFF
Xiaomi 14
  • Matte Black
  • 12 GB RAM
  • 512 GB Storage
₹69,999₹79,999
Buy now
10% OFF
Apple iPhone 15 Plus
  • Black
  • 6 GB RAM
  • 128 GB Storage
₹80,590₹89,900
Buy now
33% OFF
Xiaomi Pad 6
  • Mist Blue
  • 6 GB RAM
  • 128 GB Storage
₹26,999₹39,999
Buy now
38% OFF
Lenovo Tab M10 5G
  • Abyss Blue
  • 6 GB RAM
  • 128 GB Storage
₹20,999₹34,000
Buy now
28% OFF
Realme Pad 2
  • Imagination Grey
  • 6 GB RAM
  • 128 GB Storage
₹17,999₹24,999
Buy now
13% OFF
Samsung Galaxy Tab S9 5G 256GB
  • Graphite
  • 8 GB RAM
  • 256 GB Storage
₹88,058₹101,398
Buy now
38% OFF
Infinix INBook X1 Neo XL22 Laptop Intel Celeron Quad Core 8 GB 256 GB SSD Windows 11
  • Blue
  • 4 GB RAM
  • 128 GB SSD
₹21,790₹34,990
Buy now
36% OFF
Infinix INBook X1 Pro Laptop
  • Black
  • 8 GB RAM
  • 256 GB SSD
₹44,990₹69,999
Buy now
29% OFF
Asus VivoBook 15 X515JA EJ522TS Laptop
  • Grey
  • 8 GB RAM
  • 512 GB SSD
₹44,689₹62,889
Buy now
34% OFF
Asus ROG Strix G17 G713QM K4215TS Laptop
  • Black
  • 16 GB RAM
  • 1 TB SSD
₹180,990₹272,990
Buy now
NEXT ARTICLE BEGINS