Microsoft launches AI text-to-speech avatar at Ignite 2023

Microsoft's latest Ignite 2023 conference saw the announcement of a new AI-based product called Azure AI Speech, a text-to-speech avatar program for creating talking avatar videos.

| Updated on: Nov 17 2023, 15:56 IST
Satya Nadella Azure AI Speech
Microsoft Azure AI Speech was unveiled at the Ignite 2023 conference. (Microsoft)
Satya Nadella Azure AI Speech
Microsoft Azure AI Speech was unveiled at the Ignite 2023 conference. (Microsoft)

In the last few months, Microsoft has embarked on a mission to incorporate artificial intelligence (AI) in its suite of products, ranging from consumer-focused Microsoft Office to Copilot 365 for businesses. At its latest Ignite 2023 conference, the technology giant announced several new AI-based products such as Copilot Studio, and Windows AI Studio, while also renaming Bing Chat to simply Copilot. The company also launched a text-to-speech avatar program called Azure AI Speech which can help create talking avatar videos. It is being rolled out in the public preview. Know all about this new feature.

Microsoft Azure AI Speech

The Azure AI Speech is a text-to-speech avatar that allows you to convert text into a 2D video of a human-like speaking avatar. Microsoft says the Neural text-to-speech Avatar models are trained by deep neural networks based on the human video recording samples, and the voice of the avatar is provided by a text-to-speech voice model. Users can use text inputs to build training videos, product introductions, customer testimonials, and more, enabling more digital interactions.

How it works

The Azure AI Speech avatar content generation workflow involves 3 steps - the text analyzer, the TTS audio synthesizer, and the TTS avatar video synthesizer. First, the text input is provided by the user and the text analyzer outputs it in the form of a phoneme sequence. Then, the TTS audio synthesizer predicts the acoustic features of the input text and synthesizes the voice. Both of these features are powered by text-to-speech voice models.

Lastly, the neural text-to-speech avatar model predicts the image of lip sync with the acoustic features, so that the synthetic video is generated.

The Azure AI Speech service is being offered in two tiers. The first is a prebuilt neural voice that features natural out-of-the-box voices. To access it, users can create an Azure account and subscribe to the Speech service. Then, they can use the Speech SDK or visit the Speech Studio portal to select prebuilt voices.

On the other hand, Microsoft is also offering the facility to create custom neural voices. This feature is called Custom Neural Voice. It is an easy-to-use self-service for creating a natural brand voice, with limited access for responsible use. Microsoft is currently only offering limited access to this feature.

Follow HT Tech for the latest tech news and reviews , also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 16 Nov, 16:37 IST