Robot-assisted surgery promises nothing short of a paradigm shift in medicine. In subfields from urology and gynecology to cardiothoracic and pediatric surgery, it’s enabling surgeons to perform complex procedures without having to resort to laparotomy (surgical incisions into the abdomen). Better still, surgical robots contain cameras that capture every knife and needle movement and suture stitch, contributing to a video library that could be used to train gesture recognition systems for skills assessments, step-by-step instruction, and automation of pre- and post-operative tasks.

The trouble is, state-of-the-art methods for action recognition require samples like videos to be manually labeled, which tends to be both time-consuming and error-prone. Perhaps that’s why researchers at the Robotics Institute at UCL in London, the Polytechnic University of Milan, and the University of Verona recently explored in a preprint paper on Arxiv.org (“Weakly Supervised Recognition of Surgical Gestures“) a method that requires no more than several annotated demonstrations to train a recognition neural network algorithm.

By way of background, neural networks consist of neurons that are arranged in layers and transmit signals to other neurons. Those signals — the product of data, or inputs, fed into the neural network — travel from layer to layer and slowly “tune” the network by adjusting the synaptic strength (weights) of each connection. Over time, the network extracts features from the data set and identifies cross-sample trends, eventually learning to make predictions.

In this study, in order to find a mixture of multi-dimensional probability distributions that best modeled their surgical demonstration input corpus, the researchers leveraged an unsupervised recognition algorithm based on a classical Gaussian mixture model (GMM). It’s an ideal model architecture for tasks that don’t rigidly influence each other, wrote the researchers, like simultaneous segmentation and classification. Additionally, it’s intuitive because the GMM-based algorithms represent action classes through independent means and other variables.

The researchers tapped three surgical demonstrations — two from expert users and one from an intermediate user — along with ground truth annotations to initialize the GMM-based algorithm’s parameters. To validate it, the team sourced a public data set — JIGSAWS — containing labeled video and kinematic data captured during demos by eight surgeons with Intuitive Surgical’s da Vinci Surgical System. The paper’s coauthors say that in a set of experiments — with the proposed annotations and by redefining the actions and optimizing the inputs — they managed to boost overall recognition accuracy by 25% compared with the baseline and improve action class recognition.

“Experimental results on real surgical kinematic trajectories during a training exercise confirm that weakly supervised initialization significantly outperforms standard task-agnostic initialization methods,” wrote the coauthors.

That said, they concede that their experimental data sets were relatively small and that GMM approaches aren’t generally robust against “increasingly variable” data. But they say that in future work they intend to further explore the effects of weak supervision on the initialization of probability distributions in unsupervised HMM-based approaches.