Meta AI Researchers Getting Closer to Realistic Avatar Legs Without Extra Tracking Hardware
Meta AI researchers have been conducting the ‘Avatar Grow Legs’ Research that now shows Realtime AI-Powered Body Pose Estimation. The researchers are coming closer to creating realistic avatar legs without the need for extra tracking hardware.
Current out-of-the-box virtual reality systems are only capable of tracking the position of the user’s head and hands. However, the position of the wearer’s torso, elbow, and legs can only be estimated through the use of a class of algorithms known as inverse kinematics (IK). However, this type of tracking is not entirely accurate. It is sometimes accurate for elbows but rarely is it accurate for legs as there are so many potential solutions for every given set of the head and hand positions.
Inverse kinematics, therefore, has some limitations and as a result, many virtual reality apps show only the user’s hands or the upper body of the user, commonly called legless avatars.
PC headsets that use SteamVR tracking can support work extra trackers like HTC’s Vive Tracker. However, for body tracking, you will have to purchase enough of these which will cost you hundreds of dollars. Most games don’t support SteamVR tracking because of this.
In September last year, Meta AI researchers unveiled a neural network trained with reinforcement learning dubbed QuestSim that can estimate plausible body poses using just the tracking data from the Quest 2 headset and its controllers. However, QuestSim has a latency of 160ms which is more than 11 frames at 72Hz. QuestSim would, thus, be better suited for seeing the avatar bodies of others rather than your own when you look down. However, the paper doesn’t mention the runtime performance of the system or what its GPU is running on.
A New Approach
A new paper titled Avatars Grow Legs (AGRoL) by other Meta Artificial Intelligence researchers and an intern Yuming Du shows a new approach that the researchers claim can achieve “state-of-the-art performance” with lower computational requirements compared to previous AI approaches.
The AGRoL approach is a diffusion model, similar to the recent AI image processing systems like Stable Diffusion and OpenAI’s DALL-E 2.
According to the researchers, AGRoL “can run in real-time” unlike most AI research papers and diffusion models. On NVIDIA V100, it runs at approximately 41 FPS. This is still a $15,000 GPU but machine learning algorithms will initially require this kind of high-spec hardware but will eventually run smartphones after a few years of optimization advancements. Speech recognition and synthesis models such as Google Assistant and Siri followed the same path.
However, it is unlikely that AGRoL’s body pose estimation would be available in Meta Quest products any time soon. Meta announced that it will introduce avatars with legs this year but this will likely use a less technically advanced algorithm and will only be for the other users’ avatars rather than your own avatar.