WorldGaze uses phone cameras as gaze trackers, helping AIs see context

Under ideal circumstances, a smartphone's AI assistant can not only understand the words you're saying but also the appropriate context they're meant in, such as the difference between answering "when will 2020 be over" in either a number of days, or the remaining hours until a 24-hour clock strikes 20:20. But context awareness can be challenging for AIs, which is why researchers are hoping to leverage a new trick to determine the user's intent -- using a smartphone's front and rear facing cameras together for gaze tracking.

Currently offered in research prototype form, WorldGaze software from Carnegie Mellon University's Future Interfaces Group uses the front camera to track the user's head position in 3D, referencing its findings against live rear camera footage to determine specifically what's being looked at. The researchers suggest that gaze will help on-device AIs deliver more "rapid, natural, and precise interactions," such that a user could direct remote commands to the AI while looking at individual objects in a densely packed room, or ask questions about the specific real world locations they're seeing: What are the hours of operation for this GameStop? What kinds of things does this store sell?

Apart from benefiting business users by increasing the responsiveness of their personal AI assistants, the future implications for companies are clear -- broader, faster access to whatever information that's been published about them online. This could be positive, if the AI assistant grabs information directly from the company or a friendly resource, or less predictable if data comes from an aggregation site such as Yelp.

While the researchers suggest the WorldGaze system is a "software-only" solution, it realistically will require at least current-generation smartphone hardware and updated OS-level support before it's ready for broad use. Lower-end devices may lack the image processing bandwidth to simultaneously handle two camera streams at the same time, and the prototype app currently has a seven-second startup time, enough to dampen first-time use. There's also the question of whether users will really want to hold up their phones all the time to help their AI assistants react faster, or whether the feature will just be an occasional magic trick.

WorldGaze was developed by Carnegie Mellon's Sven Mayer, Chris Harrison, and Gierad Laput, though it's worth noting that Laput was a Google Ph.D. Fellow and now leads the Interactive Sensing + ML research group at Apple -- in other words, phone-based gaze tracking is certainly on leading smartphone and OS makers' radars. Similar technologies are already being built into augmented reality and virtual reality headsets, utilizing even more precise eye tracking for everything from UI controls to differential graphics rendering technologies, without the need to hold smartphones up for camera sensing.

More