DeepCode learns from GitHub project data to give developers AI-powered code reviews

We're fast approaching a point where every company is effectively a software company, a notion proffered by some of tech's top people such as Microsoft CEO Satya Nadella. This is partly why we've seen a slew of major investments into tools that help developers operate -- last year, for example, Microsoft went all-in and snapped up code-hosting and collaboration platform GitHub in a $7.5 billion deal, while GitLab raised huge funds from big-name investors including Alphabet's GV.

With more software, however, comes more code, which requires more checking, testing, and debugging. And that is why automated developer operations (DevOps) testing tools are also attractive targets for investors, with the likes of Functionize, Testim, and Mabl all raising big bucks over the past year. In fact, the automated software testing market was pegged at $8.5 billion in 2018, a figure that could grow to nearly $20 billion within five years.

And it's against that backdrop that Swiss startup DeepCode today announced a $4 million seed round of funding to expand its machine learning systems for code reviews. The round was led by Earlybird, with participation from 3VC and Btov Partners.

Code review

The code review process is broadly concerned with finding bugs, vulnerabilities, style violations, and more in the earlier stages of software development, before code is merged or deployed -- it usually happens before software testing takes place. "Software testing looks at code from the outside, but code reviews enable you to get an inside look at the DNA of the code," DeepCode cofounder and CEO Boris Paskalev told VentureBeat.

Often, code reviews involve collaborations between the original code authors, their peers, and managers, with a view toward finding obvious errors before it gets to a more advanced phase. And the bigger a project is, the more lines of code there are to review, which is a time-consuming process. There are options out there for analyzing source code for errors, such as static analysis tool Lint, but these are often not holistic in terms of their scope -- they're focused on a smaller, targeted set of "annoying and repeatable stylistic issues, formatting and minor issues," according to Paskalev.

DeepCode's selling point is that it covers a broader range of problems, including vulnerabilities such as cross-site scripting and SQL injection, while it also promises to establish the intent behind the code, rather than spotting simple syntax mistakes. Underpinning all this is machine learning (ML) systems, which are trained using billions of lines of code from public open source projects, which constantly learn and update their knowledge base.

Though DeepCode can ingest code from any source code repositories, Paskalev told VentureBeat that the public knowledge base today contains mostly GitHub repositories.

"Knowledge gained from open source software helps developers write clean and secure code in a fraction of the time that it would normally take," Paskalev said.

These learnings from existing software projects include previous versions of code and subsequent changes made, through which DeepCode learns patterns, figures out the intent behind a specific piece of code, and establishes where bugs existed and how they were fixed.

"This creates a live knowledge base of all known bugs and their appropriate solutions, which is then used to identify any bugs in your code before they actually happen," Paskalev explained. "The DeepCode AI engine will immediately identify and suggest the best possible solution for your software code."

Additionally, DeepCode leverages various predictive / inferring algorithms to expand further on known issues and dig down into code issues that may not have been fixed in the original sample projects.

Developers can connect DeepCode to their GitHub or Bitbucket accounts in the cloud, though it also supports GitLab on-premises, after which it reviews each commit that is made and flags potential issues. It's worth noting here that DeepCode also offers an API that enables developers to integrate DeepCode however they want internally, though that would usually be aimed at larger enterprises with the necessary resources.

All this raises one important question, though: how reliable is DeepCode's automated code review smarts? Anything less than 100% accuracy means that developers will still have to manually pore over their code, in which case how much time does this actually save? According to Paskalev, DeepCode can save developers around 50% of the time they currently spend on bugs.

"On average, developers waste about 30% of their time finding and fixing bugs, but DeepCode can save half of that time now, and more in the future," he said. "Since DeepCode learns from the global development community, it finds more issues than any single reviewer or group of reviewers could ever identify."

The story so far

Founded in 2016, DeepCode is a spin-out from ETH Zurich (Swiss Federal Institute of Technology), which is sometimes referred to as Europe's answer to the Massachusetts Institute of Technology (MIT). The company claims three cofounders, including CTO Veselin Raychev, who was previously a software engineer at Google before joining ETH for his PhD, and adviser Martin Vechev, a professor at ETH.

Alongside today's funding news, DeepCode also announced a new pricing structure. Up until now, the DeepCode bot was free only for open source software development projects. Now, it will be free for educational use and for enterprise teams of up to 30 developers -- clearly, this is a move designed to expedite uptake in small teams. Beyond that, DeepCode charges $20 per developer each month when deployed in the cloud, and $50 per developer for on-premises support.

Paskalev wouldn't divulge any of its corporate clients, but it did mention some of open source repositories that are using DeepCode, which include Embark Framework, the European Environment Agency (EEA), and PyMedusa.

Prior to now, DeepCode had raised a small $1 million seed round of funding. With another $4 million in the bank, the company said that it plans to expand its supported languages beyond Java, JavaScript, and Python, which will include catering to C#, PHP, and C/C++, among others. It also confirmed that it's working on its first integrated developer environment (IDE) integration.

"For all industries and almost every business model, the performance and quality of coding has become key," added Earlybird cofounder and partner Christian Nagel. "DeepCode provides a platform that enhances the development capabilities of programmers. The team has a deep scientific understanding of code optimization and uses artificial intelligence to deliver the next breakthrough in software development."

Code review

The story so far

More