Microsoft's New AI App Makes Mona Lisa Sing, Internet Calls It ''Crazy''

Hypophrenia

New member
Microsoft's New AI App Makes Mona Lisa Sing, Internet Calls It ''Crazy''Microsoft recently introduced a new artificial intelligence (AI) model that can generate hyper-realistic videos of talking human faces. Dubbed VASA-1, the AI image-to-video model can transform still photos of people's faces into lively animations. The company says the created videos will have synchronised lip movements to match the audio as well as facial expressions and head movement to make it appear natural.

Recently, a video demonstrating the app's capabilities went viral on social media, leaving people amazed. The AI-generated video shows Mona Lisa, the iconic painting by Leonardo da Vinci, lip-syncing to Anne Hathaway's 'Paparazzi'.

''Microsoft just dropped VASA-1. This AI can make a single image sing and talk from audio reference expressively. Similar to EMO from Alibaba. 10 wild examples: 1. Mona Lisa rapping Paparazzi,'' the caption of the thread shared by Min Choi reads.

Watch the video here:


Microsoft just dropped VASA-1.

This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba

10 wild examples:

1. Mona Lisa rapping Paparazzi pic.twitter.com/LSGF3mMVnD

— Min Choi (@minchoi) April 18, 2024
The video has gone viral, with some being amused by the funny clip. One user wrote, ''The Mona Lisa clip had me rolling on the floor laughing.'' Another commented, ''Oh, man. If only Da Vinci could witness this.''

Some also expressed their concerns about its unethical usage, especially to create deep fakes.

A third wrote, ''Creepy? Fascinating? For one thing, deepfake potential just grew exponentially…but opens up some interesting creative possibilities as well.''

A fourth added, ''Deepfake Tech Just Took a Terrifying Leap Forward and it's more convincingly deceptive than we ever imagined.''

According to Microsoft, VASA is a framework for generating lifelike talking faces of virtual characters with appealing visual affective skills (VAS).

''VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos,'' the company wrote.

''We have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and by proper regulations,'' Microsoft added.