Microsoft Unveils AI Tool for Realistic Talking Videos

Researchers acknowledge potential misuse but highlight benefits and artistic uses

Microsoft Research Asia has unveiled a new experimental AI tool called VASA-1 that can take a still image of a person — or the drawing of one — and an existing audio file to create a lifelike talking face out of them in real-time. It has the ability to generate facial expressions and head motions for an existing still image and the appropriate lip movements to match a speech or a song. The researchers uploaded a ton of examples on the project page, and the results look good enough that they could fool people into thinking that they’re real.

Embed from Getty Images

Potential for Misuse Acknowledged by Creators

While the lip and head motions in the examples could still look a bit robotic and out of sync upon closer inspection, it’s still clear that the technology could be misused to easily and quickly create deepfake videos of real people. The researchers themselves are aware of that potential and have decided not to release “an online demo, API, product, additional implementation details, or any related offerings” until they’re sure that their technology “will be used responsibly and in accordance with proper regulations.” They didn’t, however, say whether they’re planning to implement certain safeguards to prevent bad actors from using them for nefarious purposes, such as to create deepfake porn or misinformation campaigns.

Benefits and Potential Uses

The researchers believe their technology has a ton of benefits despite its potential for misuse. They said it can be used to enhance educational equity, as well as to improve accessibility for those with communication challenges, perhaps by giving them access to an avatar that can communicate for them. It can also provide companionship and therapeutic support for those who need it, they said, insinuating the VASA-1 could be used in programs that offer access to AI characters people can talk to.

Embed from Getty Images

Training and Artistic Uses

According to the paper published with the announcement, VASA-1 was trained on the VoxCeleb2 Dataset, which contains “over 1 million utterances for 6,112 celebrities” that were extracted from YouTube videos. Even though the tool was trained on real faces, it also works on artistic photos like the Mona Lisa, which the researchers amusingly combined with an audio file of Anne Hathaway’s viral rendition of Lil Wayne’s Paparazzi. It’s so delightful, it’s worth a watch, even if you’re doubting what good a technology like this can do.

Write for Us

News

Entertainment

Sports

Science

Lifestyle

Business

Exclusive

About Us

Privacy Policy

Contact Us

Microsoft unveils AI tool to create realistic talking videos from photos

Researchers acknowledge potential misuse but highlight benefits and artistic uses

Potential for Misuse Acknowledged by Creators

Benefits and Potential Uses

Training and Artistic Uses

LEAVE A REPLY Cancel reply

You might also like

Gym floor execution: Ex-prison officer shot dead in chilling revenge plot

Rochdale man faces new child sex charges in expanding exploitation probe

Hackney Wick horror: Woman stabbed to death in suspected targeted attack

Charles’s last wish: Reunite family in death as rift looms in life

About us

The latest

Gym floor execution: Ex-prison officer shot dead in chilling revenge plot

Rochdale man faces new child sex charges in expanding exploitation probe

Hackney Wick horror: Woman stabbed to death in suspected targeted attack