Newswise — TikTok dances have taken the world by storm, emerging as a fun way to pass the time during the COVID-19 pandemic. But for the last year, University of Minnesota Twin Cities Ph.D. student Yasamin Jafarian has been using dance videos from the viral social media platform for a different purpose—as food for a computer algorithm that uses the frame-by-frame data to construct lifelike 3D avatars of real people. 

Jafarian is studying computer science, and more specifically the field of computer vision, which involves training artificial intelligence computers to understand visual data through images and video. She’s a member of Assistant Professor Hyun-Soo Park’s lab in the Department of Computer Science and Engineering.

Jafarian’s interest lies in using machine learning and artificial intelligence to generate realistic 3D avatars for people to use in virtual reality settings. Right now, there are ways to create 3D avatars in VR, but most are cartoonish—not avatars that look exactly like the real person using them. 

The entertainment industry is able to achieve this through CGI (computer-generated imagery), where a lifelike avatar of an actor or actress is created for use in a film or video game. However, movie productions have a lot of resources that the average person doesn’t have. To generate these avatars, film crews often use thousands of cameras to scan a person’s body so that the computers can replicate it on screen. 

“The problem with movie technology is that it’s not accessible to everybody,” Jafarian explained. “It’s only accessible to those few people in the movie industry. With this research, I wanted to generate the same opportunity for the average person so they can just use their phone camera and be able to create a 3D avatar of themselves.”

Jafarian’s goal was to design an algorithm that only needed one photo or video of a person in order to generate their realistic avatar. To do this, she needed a large dataset of videos to “train” the algorithm. TikTok dance videos—which often feature only one person showing the full length of their body in multiple different poses—provided the perfect solution.

By the end of summer 2021, Jafarian had watched about 1,000 TikToks. She ended up using 340 of the videos in her dataset, each one 10-15 seconds long. At a video rate of 30 frames per second, she had amassed more than 100,000 images of people dancing.

So far, Jafarian has been able to successfully use her algorithm to generate a 3D avatar of a person from the front view. She published a paper on the subject, and it won a Best Paper Honorable Mention Award at the 2021 Conference on Computer Vision and Pattern Recognition.

Jafarian plans to continue refining the algorithm so that it can generate an entire person’s body using only one or two images. Her hope is that the technology could eventually be used by real people in virtual reality. 

“The most important application of this is in online social presence,” Jafarian said. “We’ve seen this need throughout these COVID times when all of our interactions were over Zoom. But, it doesn’t need to be just Zoom. We can have virtual environments, using VR goggles like Oculus for example, where we can see and interact with each other. If we can make those digital avatars realistic, it would make those interactions deeper and more interesting.”

Another future application of Jafarian’s research is customizing the clothing of your realistic avatar. Imagine a situation in which you see a t-shirt online, and you can “try it on” in a virtual environment and skip a trip to the store.

Learn more and watch videos of Jafarian's TikTok research.

Meeting Link: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021