Could AI be used to cheat on programming tests?

Plagiarism isn't limited to essays. Programming plagiarism — where a developer copies code deliberately without attribution — is an increasing trend. According to a New York Times article, at Brown University, more than half of the 49 allegations of academic code violations in 2016 involved cheating in computer science. At Stanford, as many as 20% of the students in a single 2015 computer science course were flagged for possible cheating, the same piece reports.

Measure of Software Similarity, or MOSS, has remained one of the most popular systems to detect plagiarism in software since its development in 1994. MOSS can analyze code in a range of languages including C, C++, and Java, automatically listing pairs of programs with similar code and highlighting individual passages in programs that appear to be the same.

But a new study finds that freely available AI systems could be used to complete introductory-level programming assignments without triggering MOSS. In a paper coauthored by researchers at Booz Allen Hamilton and EleutherAI, a language model called GPT-J was used to generate code "lacking any particular tells that future plagiarism detection techniques may use to try to identify algorithmically generated code."

"The main goal of the paper was to contextualize the fact that GPT-J can solve introductory computer science exercises in a realistic threat model for plagiarism in an education setting," Stella Biderman, an AI researcher at Booz Allen Hamilton and coauthor of the study, told VentureBeat via email. "[Our] findings demonstrated that a student with access to GPT-J and very minimal knowledge of computer science can deliver introductory-level assignments without triggering MOSS."

Biderman and Edward Raff -- the other coauthor -- had GPT-J answer questions that required it to code programs that could create conversion tables from miles to kilometers, calculate a person's BMI given weight and height, and more. GPT-J made minor mistakes that needed correction in most cases, but these mistakes often didn't require programming beyond the ability to run code and search the web for error codes.

While Biderman didn't find evidence that GPT-J is, in fact, being used to cheat on assignments, the work raises questions about whether it (or tools like it) might be abused in professional coding tests. Many tech companies rely on exams, either in-house or third-party, to assess the knowledge of software hires. Depending on the design, these could be susceptible -- at least in theory -- to AI-generated code.

"MOSS was developed long before things like GPT were a possibility, but this illustrates the importance of understanding the way digital tools evolve over time to introduce new risks and limitations," Biderman added.

Rick Brownlow, the CEO and cofounder of Geektastic, a technical assessment platform, says he hasn't seen any evidence of plagiarism by a test-taker using AI. He notes that for most companies, a coding test forms only a part of a hiring process. Candidates are generally expected to be able to explain their solutions in a way that makes it apparent whether they were dishonest about their programming abilities.

"[O]ur plagiarism tools will pick up when someone has copied another solution either outright or in part, [even spotting] when someone has obfuscated some of the copied code to try and avoid detection. If -- and this is a big if -- AI could write a 'good' solution to one of our take home-challenges and this was original (i.e., didn't trawl and copy the solution from the web), then this is going to be as hard to spot as someone using their developer friend from Google to help," Brownlow told VentureBeat. "I think when we get to a point where AI is solving take home coding challenges, we'll be at the point where you won't be hiring software engineers anymore."

Qualified.io's CEO Jake Hoffner says that his company, too, detects cheating based on aspects like "lack of coding effort (e.g., copy-paste, minimal editing)" and recommends that customers have candidates walk through their code. But he sees a future in which AI changes the nature of programming assessments, shifting the focus away from actual coding to code management skills.

Emerging AI-powered suggestion and review tools, indeed, promise to cut development costs while allowing coders to focus on less repetitive tasks. During its Build developer conference in May 2021, Microsoft detailed a feature in Power Apps that taps OpenAI’s GPT-3 language model to assist people in choosing formulas. OpenAI's Codex system, which powers GitHub's Copilot service, can suggest whole lines of code. Intel’s ControlFlag can automatically detect coding errors. And Facebook’s TransCoder converts code from one programming language into another.

"[At] the point that AI starts to write more quality code, the industry as a whole starts to move towards developers .... directing machines to write code but less involvement in the actual coding," Hoffner said. "[T]he need for any code to be involved starts to take a back seat for many of the 'reinvent the wheel' tasks that developers still perform today, such as assembling a mobile app that retrieves and writes data. Coders move on from these common tasks and onto things that are less defined and that are novel. These are areas where there won't be enough existing code for AI systems to learn from, so coders will still need to perform it -- and these are the tasks that we will begin to test on assessment wise."

Nis Frome, GM at coding challenge and tutorial platform Coderbyte, says he sees less of a risk in AI used to cheat on coding exams than employers "[sacrificing] great candidate experiences for honest candidates." Too much of a focus on preventing cheating typically comes at the expense of recruitment and sourcing, he says, with the consequence of turning candidates away.

A 2022 survey from CoderPad and CodinGame puts the problem into sharp relief. Nearly half of recruiters cite finding qualified developers as their number one challenge, with 39% claiming that they've now broadened their applicant pool to developers from non-academic backgrounds -- up from 23% in 2021.

"We see countless techniques for cheating, from sending another person the assessment to copying answers online. We have little doubt that candidates have tried to use GPT-J or copilot when taking code assessments on Coderbyte," Frome told VentureBeat via email. "[But] cheating will always be a game of cat-and-mouse ... Odds are that if most of your candidates are cheating, you have a sourcing problem! Perhaps you need more senior candidates and shouldn't be posting roles on university job boards. The solution isn't to make an authoritarian and tedious experience for all candidates."

Biderman points out that policing integrity, whether involving AI or not, isn't a new endeavor. Along the same vein as Hoffner's prediction, the advent of easy-to-use code-generating AI might simply require new evaluations where debugging tasks are done with AI-generated solutions, she says.

"We can still teach students the important computer science skills they need and find new applications for [AI]. These structural changes could deliver better outcomes to mitigate plagiarism and shortcuts, while paving the way for a future in which more AI-powered development tools are in the hands of a wider set of users," Biderman added. This also helps us prepare for a potential future in which AI and machine learning might be able to do more than just introductory level assignments, and we should begin to prepare for it."

More