Facebook has revealed additional details about the development of Portal, its first in-house video calling device, including the tidbit that early prototypes included a motor that let Portal swivel to face video subjects.
If the plan was to sell lots of video calling devices, it’s probably best that a swiveling Portal was never released. Following Facebook's persistent parade of controversy, privacy concerns raised by Portal were bad enough without the device turning to face you when it senses your presence.
A motorized camera was also seen as impractical because it did not improve Portal's reliability, Facebook engineers said in a blog post today.
Work to build Portal began two years ago as part of the secretive Building 8 project for exploring hardware products at Facebook's headquarters in Menlo Park, California, Portal team lead and Facebook VP Rafa Camargo told VentureBeat ahead of the device's debut last fall.
The Portal team is now part of Facebook's AR/VR division.
Portal video calls with Facebook Messenger rely on Smart Camera computer vision, zooming and moving to frame shots and account for each of the people in a room -- even people up to 20 feet from the camera. A 140-degree field of vision for the fish-eyed wide angle camera lens means Portal doesn't need to move in order to see what's happening.
Smart Volume is also used to amplify, reduce, and modulate volume to optimize call sound.
Smart Camera runs on-device machine learning with Mask R-CNN2Go, a computer vision system derived from Mask R-CNN, which won the Best Paper award at the International Conference on Computer Vision (ICCV) in 2017.
The system uses pose recognition to scan 30 frames per second to look for human subjects and properly frame each shot. In addition to artificial intelligence, the camera takes into consideration the way people respond to camera movement and advice from filmmakers about how to frame shots.
The Portal AI and Mobile Vision teams at Facebook took steps to address the fact that a system made to run with GPUs had to be reduced to a few megabytes to be small enough to work on-device with Qualcomm's Snapdragon Neural Processing Engine.
"To compensate, we developed several strategies, including improving low-light performance by applying data augmentation on low-light examples in the training data set and balancing multiple pose-detection approaches (such as detecting a subject's head, trunk, and entire body). And we used additional preprocessing to differentiate between multiple people in proximity to one another," said the blog post written by software engineers Rahul Nallamothu and Eric Hwang, research scientist Peter Vajda, and engineering director Matt Uyttendaele.

