Computer vision is an exceedingly useful subfield of machine learning that’s been applied to everything from facial recognition to tuberculosis diagnosis, and Microsoft wants to streamline its deployment on Windows. The company today released a preview of Windows Vision Skills, a set of packages that enable a range of AI-driven photo and video analysis tasks.
Three prebuilt skills are available at launch: Object Detector, Skeletal Detector, and Emotion Recognizer.
“Implementing and integrating efficient machine learning and computer vision solutions is a hard task for developers. The industry is moving at a fast pace, and the amount of custom-tailored solutions coming out makes it strenuous for application developers to keep up,” wrote Microsoft developer writer Eliot Cowley in an article. “The Windows Vision Skills framework is meant to make it easier to utilize computer vision. It standardizes the way computer vision modules are put to use within a Windows application, running on the local device.”
Developers can add the skills — modular bits of code that process inputs and produce outputs — to any .NET, Win32, and UWP application courtesy out-of-the-box WinRT APIs that don’t require prior machine learning or computer vision knowledge to use. Meanwhile, computer vision developers can take advantage of hardware acceleration frameworks like DirectX and DirectML on Windows devices by packaging their solutions as skills.
Microsoft says that the Windows Vision Skills framework can be extended to work with existing machine learning frameworks and libraries such as OpenCV, and it says that skills can be pieced together within an application to address a complex scenario or bundled together in a single package.
Windows Vision Skills complements existing Windows support for inference of ONNX models by utilizing WinML for local inferencing. The framework allows you to build intelligent applications while leveraging platform optimization.
“Skills are strongly versioned to ease iteration without breaking existing applications,” said Cowley, “[and they’re] easy to ingest, easy to update, and they preserve intellectual property through licensing.”
Microsoft isn’t the only company that’s made computer vision tools available in open source recently. Last week, Google debuted AI image segmentation models optimized for its Cloud TPU hardware platform, and in March, Intel made generally available CVAT, a toolkit for image data labeling. Last March saw the launch of Intel’s OpenVINO, a computer vision toolkit for edge computing that’s compatible with open source frameworks like Facebook’s Caffe2 and Google’s TensorFlow. And two years ago, Facebook rolled out a trio of tools for segmenting objects within images.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here