Google study: Typos are the top reason your code won't compile

A major empirical study of programming errors concludes that typographical errors loom large in failed software builds created by experienced programmers.

Produced over a period of several months, the published study looked at data from over 26 million builds by Google programmers working in the Java and C++ programming languages.

Experienced devs may sing, “I told you so.” Still, there are many complex reasons for failure far beyond misspelled commands in software development, according to the study.

Titled Programmers’ Build Errors: A Case Study (at Google), the stated goal was to learn why and how software build or compilation processes fail. Improving programmer productivity is the goal. A paper based on the study was presented last month at the 36th International Conference on Software Engineering in Hyderabad, India.

Three Google researchers joined the principal investigator from Hong Kong University of Science and Technology and another from the University of Nebraska. The final report spans 11 pages. Download the PDF here: http://goo.gl/YSA0uX.

Software developers creating executable programs use compilers, linkers, build files and scripts running on specialized computer hardware to generate executable code. The process is often complex, involving numerous subsystems and inter-dependencies among application programming interfaces (API's). Lacking hard data on productivity-killing compilation failures, Google set out to quantify how often the build process fails, why it fails, and what it takes effort-wise to fix the errors.

Although typos top the reasons for failed compilations, the study documents many other common and recurring issues. For Java, typos generate a “can’t resolve” error message. 24 other errors are tracked in the study (Figure 1).

For C++, the top reason for build failure is “undeclared_var_use” and a related subordinate. Again, typos are often the culprit. The compiler generates a failure message when it doesn't recognize the named symbol.

The investigators combined these 25 errors into a more useful picture (Figure 2). As shown, five categories track the percentage of build or compile errors: dependency; type mismatch; syntax; semantic; and other. Dependency errors account for roughly half of all problems, and account for the lion’s share of costs to remediate them. The study includes detailed explanations of the categories.

VentureBeat asked long-time developer Fred Simon (co-founder and chief architect at JFrog) about the significance of the Google case study. His company counts Google as a client:

VB: Is this an important case study? Why?Simon: Finding the most common and recurring issues that impair developer productivity is always good. We continue pushing our machines and software to help us create more and more software, faster and with higher quality. I focused a recent keynote titled The Revenge of the Machines on this very question.

VB: As a developer, what are the study’s chief takeaways for other developers and development companies?Simon: Using dependencies is a huge part of today's software making. As shown in this study, dependencies haves costs, but the benefits of getting them right outweigh them. At JFrog, we are writing software that helps a lot in the quest of automating and managing program dependencies and resolving, before hand, dependencies issues. More can be done, and we are really happy to see Google targeting this.

VB: Do you know of any other studies like this, or is this one unique?Simon: From my knowledge, the study is unique. Now there are many studies looking at dependencies, management and build automation, and they are also relevant for this domain. One example is research from Forester earlier this year, titled Navigate the Modern Application Delivery Landscape.

VB: What is the single chief message out of this study?Simon: Playing with dependencies (changing, adding and removing) generates many build errors in static languages like Java and C++. These errors could be "pre-resolved" or aided with automation and tools. Also, in dynamic languages these errors do not appear most of the time.

VB: Even though the authors offer cautions regarding the span of the study -- just a single company in the study and limited development environments -- as threats to the data, do you think the findings are broadly valid?Simon: From my personal experience, the study is totally valid. We as developers have a good mental picture of the code we wrote and also the impact and interaction between the classes. Issues appear when integrating code that is external. It is impossible to have a picture of all the components that any given piece of software is using today. The amount is overwhelming, so the compiler is helping here; errors are actually very helpful when the API’s we are interacting with are well designed.

VB: Is Google a paying JFrog customer?Simon: Yes. We provide our binary repository, Artifactory, to different teams; and we provide our social platform, Bintray (JCenter), to Android Developers. You can follow this work at Twitter.