Imagine for a moment that we are on a safari watching a giraffe graze. After looking away for a second, we then see the animal lower its head and sit down. But we wonder what happened in the meantime? Computer scientists at the University of Konstanz’s Center for the Advanced Study of Collective Behavior have found a way to encode an animal’s pose and appearance to indicate the intermediate movements that were statistically likely to have taken place.
A key problem with computer vision is that images are incredibly complex. A giraffe can take many different poses. Missing part of a trajectory is not usually a problem on safari, but this information can be crucial for studying collective behavior. This is where computer scientists come into play with the new model “Neural Puppeteer”.
Predictive silhouettes based on 3D points
“One idea of computer vision is to describe the very complex image space by coding as few parameters as possible,” explains Bastian Goldlücke, Professor of Computer Vision at the University of Konstanz. A previously frequently used representation is the skeleton. In a new article published in the Proceedings of the 16th Asian Conference on Computer Vision, Bastian Goldlücke and doctoral students Urs Waldmann and Simon Giebenhain present a neural network model that makes it possible to represent trajectories of movement and the full appearance of animals from each Show perspective on just a few bullet points. The 3D view is more malleable and precise than the existing skeleton models.
“The idea was to be able to predict 3D key points and also be able to track them independently of the texture,” says doctoral student Urs Waldmann. “So we developed an AI system that uses 3D key points to predict silhouette images from any camera perspective.” By reversing the process, it is also possible to determine skeletal points from silhouette images. Based on the key points, the AI system can calculate the statistically probable intermediate steps. Using the individual silhouette can be important. Otherwise, if you’re just working with skeletal points, you won’t know if the animal you’re looking at is a pretty massive one or about to starve.
There are applications for this model in biology in particular: “At the Cluster of Excellence ‘Centre for the Advanced Study of Collective Behavior’ we see that many different animal species are tracked and poses also have to be predicted in this context,” says Waldmann.
Long-term goal: Application of the system to as much wildlife data as possible
The team started by predicting silhouette movements of humans, pigeons, giraffes and cows. People are often used as test cases in computer science, Waldmann notes. His colleagues from the Cluster of Excellence work with pigeons. However, their fine claws are a real challenge. Good model data was available for cows, and the extremely long neck of the giraffe was a challenge that Waldmann was happy to take on. The team created silhouettes based on a few key points – from 19 to 33 in total.
Now the computer scientists are ready for real use: In future, data from insects and birds will be collected in the Imaging Hanger at the University of Konstanz, its largest laboratory for researching collective behavior. In the imaging hangar, environmental aspects such as lighting or background can be controlled better than in the wild. However, the long-term goal is to train the model for as many wild animal species as possible in order to gain new insights into animal behavior.