Meta unveils speech-to-text, text-to-speech AI models for over 1,100 languages; even shares open source data

Meta has unveiled its speech-to-text, text-to-speech AI models for over 1,100 languages.

By: HT TECH
| Updated on: May 23 2023, 20:30 IST

iPhone moment? Meta smart glasses, in Star Trek style, could launch as early as 2024

All the tech majors are in a fierce fight over delivering utility to users in the form of artificial intelligence (AI) boosted products. While everyone knows about OpenAI's ChatGPT and Google's Bard, there was very little available on it from Facebook co-founder Mark Zuckerberg's Meta Platforms. Till today, that is. Now, the company has launched its speech-to-text, text-to-speech AI models for over 1,100 languages and the best part is that it is not linked to ChatGPT. Check out the Massively Multilingual Speech (MMS) project.

The biggest takeaway is that Meta has shared the open source and that means it could lead to a skyrocketing of the number of speech apps created across the world.

You may be interested in

MobilesTablets Laptops

27% OFF

If all goes well in the real world, how useful this can be is clear from Meta's statement, "Existing speech recognition models only cover approximately 100 languages — a fraction of the 7,000+ known languages spoken on the planet."

Data Crunching

Now, good machine-learning models require large amounts of labeled data — in this case, many thousands of hours of audio, along with transcriptions. For most languages, this data simply does not exist.

However, Meta has overcome that through its MMS project, which combined wav2vec 2.0, its pioneering work in self-supervised learning, and a new dataset that provides labeled data for over 1,100 languages and unlabeled data for nearly 4,000 languages.

Patting itself on the back, Meta, in a statement said, "Our results show that the Massively Multilingual Speech models outperform existing models and cover 10 times as many languages."

It also revealed that, "Today, we are publicly sharing our models and code so that others in the research community can build upon our work. Through this work, we hope to make a small contribution to preserve the incredible language diversity of the world."

How Meta did it

The MMS project's first job was to collect audio data for thousands of languages, but the largest existing speech datasets covered at most 100 languages. The challenge was overcome by "turning to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research".

The MMS project even created a dataset of readings of the New Testament in over 1,100 languages.

Having sensed that the idea was good and that it could be milked for much more, the project also considered unlabeled recordings of various other Christian religious readings. This increased the number of languages available to over 4,000.

Bias, what bias?

EVen though the data is from a specific domain, the biases seemed not to have entered into the system. This is clear from the fact that even though this text is often read by male speakers, Meta analysis showed that its MMS models perform equally well for male and female voices.

And, importantly, though the content of the audio recordings is religious, MMS analysis shows that this does not overly bias the model to produce more religious language.

Meta credits this success to the use of the Connectionist Temporal Classification approach, which it found to be better than the large language models (LLMs) or sequence to-sequence models for speech recognition.

How it was made usable

Meta preprocessed the data to make it usable by machine learning algorithms by training an alignment model on existing data in over 100 languages.

To reduce the error rate, Meta said, "We applied multiple rounds of this process and performed a final cross-validation filtering step based on model accuracy to remove potentially misaligned data.

Results obtained

Meta trained multilingual speech recognition models on over 1,100 languages. The consequence of this was explained by Meta in this way, "As the number of languages increases, performance does decrease, but only very slightly: Moving from 61 to 1,107 languages increases the character error rate by only about 0.4 percent but increases the language coverage by over 18 times."

MMS vs OpenAI Whisper

In a like-for-like comparison with Whisper, Meta said that models trained on the Massively Multilingual Speech data achieve only half the word error rate, but importantly, Massively Multilingual Speech covers 11 times more languages.

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 23 May, 20:20 IST

NEXT ARTICLE BEGINS

Best Deals For You

Air purifiers to buy in India for healthy and clean air- Here are top 5 picks

Trending Gadgets

Mobiles Laptops Tablets

Meta unveils speech-to-text, text-to-speech AI models for over 1,100 languages; even shares open source data

Meta has unveiled its speech-to-text, text-to-speech AI models for over 1,100 languages.

You may be interested in

Data Crunching

How Meta did it

Bias, what bias?

How it was made usable

Results obtained

MMS vs OpenAI Whisper

Tips & Tricks

iPhone 16 series, OnePlus 13, and other 5 flagship smartphones to launch in 2024

iPhone users will be able to transcribe voice recordings with iOS 18: Here is how it works

Protect your Aadhaar Card: How to check, lock, and report misuse effectively online

Wondering if your iPhone has hidden apps? Know how to find and manage them easily

Editor’s Pick

Trending Stories

iPhone SE 4 launch still months away, powerful mid-ranger likely to arrive in…

Apple’s ‘Glowtime’ Event on 9 September: These products, including iPhone SE 4, are not expected to launch

iPhone 16 Pro must improve in these 3 areas—And I say this after using iPhone 15 Pro for almost a year

Aadhaar Card Update for free online: Act before September 14 to avoid future fees

Anil Kapoor featured in TIME's 100 Most Influential People in AI cover, but Sam Altman misses out: Here’s why

Gaming

GTA 6 gameplay leaks: Explorable buildings, destructibility, and more coming in fall 2025

Good news, PS5 owners: Sony is giving 5 free days of PS Plus after weekend outage

PlayStation network outage affects millions of GTA Online players across PS4 and PS5

GTA 5 Online Expanded and Enhanced could launch on PC next month, hints rockstar insider

Nintendo Switch 2: Full details coming in April – Launch titles, features and more

Best Deals For You

Air purifiers to buy in India for healthy and clean air- Here are top 5 picks

5 best smartphones for your eyes: Xiaomi 13, Honor 90 to Motorola Edge Plus, check list

Top 10 smartwatch brands: Leading the market with innovation

Japanese toilets in India: TOTO washlet starting price, features and all details to know

Amazon Diwali Sale 2024: Get up to 40% off on ASUS Vivobook S 16 OLED to Lenovo Yoga Slim 6 and more laptops

Trending News