A novel AI tool promises that with just a few mouse clicks, anyone can do previously difficult-to-achieve photo edits easily.
The technique is being developed by a research group headed by the Max Planck Institute for Informatics in Saarbrücken, in particular by the Saarbruecken Research Center for Visual Computing, Interaction and Artificial Intelligence (VIA) situated there.
This innovative technique can revolutionize digital image processing.
With ‘DragGAN,’ we are currently creating a user-friendly tool that allows even non-professionals to perform complex image editing. All you need to do is mark the areas in the photo that you want to change and specify the desired edits in a menu. Thanks to the support of AI, with just a few clicks of the mouse anyone can adjust things like the pose, facial expression, direction of gaze, or viewing angle, for example in a pet photo.
Christian Theobalt, Managing Director, Max Planck Institute for Informatics, Director, Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence, and Professor, Saarland University
This is facilitated via the use of artificial intelligence, particularly a kind of model known as “Generative Adversarial Networks” or GANs. “As the name suggests, GANs are capable of generating new content, such as images. The term ‘adversarial’ refers to the fact that GANs involve two networks competing against each other,” explains Xingang Pan, a postdoctoral researcher at the MPI for Informatics and the first author of the paper.
A GAN comprises a generator that is accountable for creating images, and a discriminator, whose task is to identify if an image is real or produced by the generator.
These two networks, occupied in tandem, are subjected to training until they reach a point where the generator produces images that the differentiator cannot distinguish from real ones.
GANs are utilized for several purposes. For instance, besides the evident use of image generators, GANs are good at forecasting images, allowing video frame prediction.
This has the potential to decrease the data needed for video streaming by anticipating the next frame of a video. Or they could upscale low-resolution images, thereby enhancing image quality by computing where the extra pixels of the new images must go.
In our case, this property of GANs proves advantageous when, for example, the direction of a dog's gaze is to be changed in an image. The GAN then basically recalculates the whole image, anticipating where which pixel must land in the image with a new viewing direction.
Xingang Pan, Study First Author and Postdoctoral Researcher, MPI for Informatics, Saarland University
Pan added, “A side effect of this is that DragGAN can calculate things that were previously occluded by the dog's head position, for example. Or if the user wants to show the dog's teeth, he can open the dog’s muzzle on the image.”
Also, DragGAN can determine applications in professional settings. For example, fashion designers could make use of its features to make adjustments to the cut of clothing in photographs following the initial capture.
Besides, vehicle manufacturers can effectively explore various design configurations for their planned vehicles. While DragGAN functions on various object categories like cars, animals, people, and landscapes, the majority of the outcomes are achieved on GAN-generated synthetic images.
How to apply it to any user-input images is still a challenging problem that we are looking into.
Xingang Pan, Study First Author and Postdoctoral Researcher, MPI for Informatics, Saarland University
After just a few days post-release, the new tool developed by the Saarbrücken-based computer scientists is already leading to a stir in the international tech community and is being examined by many as the next big step in AI-assisted image processing. While tools like Midjourney could be utilized to create entirely new images, DragGAN could greatly ease their post-processing.
The new technique is being developed at the Max Planck Institute for Informatics in collaboration with the “Saarbrücken Research Center for Visual Computing, Interaction, and Artificial Intelligence (VIA),” which was opened there in collaboration with Google. Also, the research consortium includes experts from the Massachusetts Institute of Technology (MIT) and the University of Pennsylvania.
Besides Professor Christian Theobalt and Xingang Pan, contributors to the paper entitled “Drag Your GAN: Interactive Pointbased Manipulation on the Generative Image Manifold” were: Thomas Leimkuehler (MPI INF), Lingjie Liu (MPI INF and University of Pennsylvania), Abhimitra Meka (Google), and Ayush Tewari (MIT CSAIL). The paper has got approval from the ACM SIGGRAPH conference, known to be the world’s largest professional conference on computer graphics and interactive technologies, to be performed in Los Angeles, August 6-10, 2023.