Intel researchers create AI system that rates similarity of 2 pieces of code

In partnership with researchers at MIT and the Georgia Institute of Technology, Intel scientists say they've developed an automated engine -- Machine Inferred Code Similarity (MISIM) -- that can determine when two pieces of code perform similar tasks, even when they use different structures and algorithms. MISIM ostensibly outperforms current state-of-the-art systems by up to 40 times, showing promise for applications from code recommendation to automated bug fixing.

With the rise of heterogeneous computing -- i.e., systems that use more than one kind of processor -- software platforms are becoming increasingly complex. Machine programming (a term coined by Intel Labs and MIT) aims to tackle this with automated, AI-driven tools. A key technology is code similarity, or systems that attempt to determine whether two code snippets show similar characteristics or achieve similar goals. Yet building accurate code similarity systems is a relatively unsolved problem.

MISIM works because of its novel context-aware semantic structure (CASS), which susses out the purpose of a given bit of source code using AI and machine learning algorithms. Once the structure of the code is integrated with CASS, algorithms assign similarity scores based on the jobs the code is designed to perform. If two pieces of code look different but perform the same function, the models rate them as similar -- and vice versa.

CASS can be configured to a specific context, enabling it to capture information that describes the code at a higher level. And it can rate code without using a compiler, a program that translates human-readable source code into computer-executable machine code. This confers the usability advantage of allowing developers to execute on incomplete snippets of code, according to Intel.

Intel says it's expanding MISIM's feature set and moving it from the research to the demonstration phase, with the goal of creating a code recommendation engine to assist internal and external researchers programming across its architectures. The proposed system would be able to recognize the intent behind an algorithm and offer candidate codes that are semantically similar but with improved performance.

That could save employers a few headaches -- not to mention helping developers themselves. According to a study published by the University of Cambridge's Judge Business School, programmers spend 50.1% of their work time not programming and half of their programming time debugging. And the total estimated cost of debugging is $312 billion per year. AI-powered code suggestion and review tools like MISIM promise to cut development costs substantially while enabling coders to focus on more creative, less repetitive tasks.

"If we're successful with machine programming, one of the end goals is to enable the global population to be able to create software," Justin Gottschlich, Intel Labs principal scientist and director of machine programming research, told VentureBeat in a previous interview. "One of the key things you want to do is enable people to simply specify the intention of what they're trying to express or trying to construct. Once the intention is understood, with machine programming, the machine will handle the creation of the software -- the actual programming."

More