A new artificial intelligence (AI) study proposes a 3D-enabled blending technique using generative NeRFs

Source: https://arxiv.org/pdf/2302.06608.pdf

Image blending is a primary method in computer vision, one of the most well-known branches in the artificial intelligence component. The goal is to blend two or more images to create a unique combination that captures the finest aspects of each input image. This method is used extensively in various application areas including imaging, computer imaging and medical imaging.

Image blending is widely used in artificial intelligence activities such as image segmentation, object identification, and super resolution of images. It is critical for improving image clarity, which is essential for many applications such as robotics, automated driving, and surveillance.

Several image blending techniques have been developed over the years, mainly relying on distorting an image through 2D affine transformation. However, these approaches do not account for the discrepancy in 3D geometry features such as pose or shape. 3D alignment is much more difficult to achieve as it has to derive the 3D structure from a single view.

🚨 Be part of the fastest growing AI subreddit community with 15,000 members

To address this problem, a 3D-capable image blending method based on generative neural radiation fields (NeRFs) has been proposed.

The purpose of generative NeRFs is to learn a strategy for synthesizing images in 3D using only collections of 2D single-view images. Therefore, the authors project the input images onto the bulk density map of generative NeRFs. To reduce the dimensionality and complexity of data and operations, 3D aware merging is then performed on the latent representation spaces of these NeRFs.

Specifically, the formulated optimization problem takes into account the influence of the latent code when synthesizing the mixed image. The goal is to process the foreground based on the reference images while preserving the background of the original image. For example, if the two images viewed are faces, the frame must replace the facial features and features of the original image with those of the reference image, while leaving the rest untouched (hair, neck, years, surroundings, etc.).

READ :  Accenture to Accelerate Cyient’s Digital Transformation Journey

The image below provides an overview of the architecture compared to previous strategies.

The first method consists of 2D-only blending of two 2D images without alignment. An improvement can be found by supporting this 2D blending process with the 3D aware alignment with generative NeRFs. To further exploit 3D information, the final architecture relies on two images in the latent representation spaces of NeRFs instead of 2D pixel space.

3D alignment is achieved via a CNN encoder that derives the camera pose of each input image and via the latent code of the image itself. Once the reference image is properly rotated to reflect the original image, the NeRF representations of both images are computed. Finally, the 3D transformation matrix (scaling, translation) is estimated from the original image and applied to the reference image to get a semantically accurate blend.

The results for misaligned images with different poses and scales are shown below.

According to the authors and their experiments, this method outperforms both classical and learning-based methods in terms of photorealism and fidelity to the input images. Additionally, by leveraging latent dream representations, this method can disentangle color and geometry changes during mixing and produce view-consistent results.

This was the summary of a novel AI framework for 3D-aware blending using Generative Neural Radiance Fields (NeRFs).

If you are interested or want to learn more about this framework, below is a link to the paper and project page.

Check out Paper, Github, and Project. All credit for this research goes to the researchers on this project. Also, don’t forget to join our SubReddit, Discord Channel and email newsletter with 15,000+ ML where we share the latest AI research news, cool AI projects and more.

READ :  This AI Stock Is Down 93%, Yet It's Partnered With Amazon, Microsoft, and Alphabet. Is It a Buy?

Daniele Lorenzi received his M.Sc. in ICT for Internet and Multimedia Engineering in 2021 from the University of Padua, Italy. He is a Ph.D. Candidate at the Institute for Computer Science (ITEC) at the Alpen-Adria-Universität (AAU) Klagenfurt. He is currently working in the Christian Doppler Laboratory ATHENA and his research interests include adaptive video streaming, immersive media, machine learning and QoS/QoE evaluation.