Memoji on steroids: This AI model can reconstruct 3D avatars from videos


We see digital avatars everywhere, from our favorite chat apps to virtual marketing assistants on our favorite e-commerce sites. They are becoming more and more popular and are quickly integrated into our everyday lives. You go into your avatar editor, choose skin color, eye shape, accessories, etc. and you have one ready to mimic you in the digital world.

Manually constructing a digital avatar face and using it as a living emoji can be fun, but it only scratches the surface of what’s possible. The true potential of digital avatars lies in their ability to become a clone of our entire body. This type of avatar has become an increasingly popular technology in video games and virtual reality (VR) applications.

Generating high-fidelity 3D avatars requires expensive and specialized equipment. Therefore, we only see them in a limited number of applications, like the professional actors that we see in video games.

🚨 Be part of the fastest growing AI subreddit community with 15,000 members

What if we could simplify this process? Imagine being able to create a high definition 3D full body avatar simply by using some videos captured in the wild. No professional equipment, no complicated sensor setup to capture every little detail, just a camera and a simple shot with a smartphone. This breakthrough in avatar technology could revolutionize many applications in VR, robotics, video games, movies, sports, etc.

The time has come. We have a tool that can generate high definition 3D avatars from videos shot in the wild. Time to meet Vid2Avatar.

Vid2Avatar learns 3D human avatars from in-the-wild videos. It does not require ground truth monitoring, priors extracted from large datasets, or external segmentation engines. You just give it a video of someone and it will generate a robust 3D avatar for you.

READ :  Metaverser Announced its Second $50,000 Airdrop

Vid2Avatar has some clever tricks up its sleeve to achieve this. The first thing to do is separate the human in a scene from the background and model it as a neural field. You solve the tasks of scene separation and surface reconstruction directly in 3D. They model two separate neural fields to implicitly learn both the human body and the background. This is usually a challenging task as you need to associate the human body with 3D points without relying on 2D segmentation.

The human body is modeled using a single temporally consistent representation of human form and texture in canonical space. This representation is learned from deformed observations using an inverse mapping of a parametric body model. In addition, Vid2Avatar uses an optimization algorithm to adjust several parameters related to the background, the human person and their poses in order to best fit the available data from a sequence of images or video frames.

To further improve the separation, Vid2Avatar uses a special technique to render the scene in 3D, separating the human body from the background in such a way that it becomes easier to analyze the movement and appearance of each one individually. It also uses novel targets such as B. Focusing on a clear boundary between the human body and the background to direct the optimization process to create more accurate and detailed reconstructions of the scene.

Overall, a global optimization approach for a robust and faithful reconstruction of the human body is proposed. This method uses videos captured in the wild without requiring any additional information. Carefully designed components achieve robust modeling and in the end we get 3D avatars that could be used in many applications.

READ :  What Is VICUNA and Its Financial Opportunities

Look at the paper and the project. All credit for this research goes to the researchers on this project. Also, don’t forget to join our SubReddit, Discord Channel and email newsletter with 15,000+ ML where we share the latest AI research news, cool AI projects and more.

Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. Dissertation on image denoising using deep convolutional networks. He is currently pursuing a Ph.D. Graduated from the University of Klagenfurt, Austria and works as a researcher in the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.