OpenAI and Google DeepMind demonstrated that their foundation models could outperform human coders — and win — showing that large language models (LLMs) can solve complex, previously unsolved algorithms.

OpenAI’s GPT-5 and Google’s Gemini 2.5 Deep Think participated in the 2025 International Collegiate Programming Contest (ICPC) World Finals. The competition brings together coding teams from universities to compete in answering complex algorithmic questions.

Although both models technically didn’t compete alongside human teams — their participation was governed by ICPC rules and supervised by the organizations — the LLMs successfully answered problems that some contestants could not.

GPT-5 managed to achieve a perfect score, answering 12 out of 12 problems, a performance akin to winning a gold medal in the event. Gemini 2.5 Deep Think solved 10 of the 12 algorithmic problems in 677 minutes, which Google DeepMind said in a blog post would rank second overall in the competition.

OpenAI noted that they did not train a version of GPT-5 to learn how to answer questions at ICPC specifically. Google indicated that it entered an “advanced version” of Gemini 2.5 DeepThink. 

If you were wondering, the actual human gold medal winners of ICPC are teams from St. Petersburg State University, the University of Tokyo, Beijing Jiaotong University and Tsinghua University. (Harvard and MIT were the top-ranking American colleges, ending up on the silver medal level.)

None of the human teams scored a 12 out of 12.

The competition 

ICPC attracts thousands of participants, with 139 universities from at least 103 countries competing in the World Finals this year.

During the finals, competitors must solve an identical set of algorithmic problems within a five-hour time frame. The final rankings will depend on which teams solved the questions and how quickly they were completed. 

“We officially competed in the onsite AI track of the ICPC, with the same 5-hour time limit to solve all twelve problems, submitting to the ICPC World Finals Local Judge - judged identically and concurrently to the ICPC World Championship submissions. We received the problems in the exact same PDF form, and the reasoning system selected which answers to submit with no bespoke test-time harness whatsoever. For 11 of the 12 problems, the system’s first answer was correct. For the hardest problem, it succeeded on the 9th submission. Notably, the best human team achieved 11/12,” OpenAI said in a post on X. 

Google, on the other hand, said Gemini “solved eight problems within just 45 minutes and two more problems within three hours.” 

Google ICPC

Additionally, Google said Gemini solved one problem that none of the university teams could figure out. It involved finding a way to distribute liquid through a series of ducts. 

“Gemini found an effective solution with a clever insight: it first assumed each reservoir has a 'priority value' representing how much each reservoir should be favored compared to the others. When given a set of priority values, the best configuration of the ducts can be found using a dynamic programming algorithm. Gemini discovered that by applying the minimax theorem, the original problem can be approached by finding the priority values that make the resulting flow most constrained. Leveraging the relationship between priority values and optimal flows, Gemini used nested ternary searches to quickly find optimal priority values in the bowl-like convex solution space, and solved Problem C,” Google said. 

LLMs and complex problems

There’s no doubt that foundation models like GPT-5 and Gemini 2.5 can solve general knowledge questions; after all, these LLMs constantly prove their knowledge base on the more common benchmark tests available

What the performance at ICPC shows is that, given more complex math problems, and in a competitive coding event pitted against human coders, the models could beat humans. 

The gap has been narrowing for a while. Earlier this year, Google announced that Gemini won a gold medal at the International Mathematical Olympiad, one of the world's toughest math competitions. This performance comes just months after LLMs proved unable to answer complex math problems on the FrontierMath benchmark

Admittedly, some enterprise use cases do not need a model that can answer the world’s most unsolvable programming questions. However, as enterprises find more and more complex workflows to delegate to AI systems and organizations seek more AI-powered analysis, having LLMs that have proven strong coding and mathematical skills would be extremely useful. It also shows that foundation models have come a long way, able to utilize deep abstract reasoning and creative problem-solving skills that may prove beneficial for enterprise issues in the future. 

A path to AGI

Many believe that models displaying this level of reasoning and problem-solving represent a strong move towards artificial general intelligence. Closing the gap between human reasoning and LLMs via a programming competition certainly shows that the current crop of models is slowly marching down that path.

These gold medal-winning performances have caught the attention of AI power users on social media.