deepswe-card

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude Opus, and Google's Gemini Pro have clustered within a narrow band on Scale AI's SWE-Bench Pro leaderboard, making it nearly impossible for engineering leaders to determine which agent will actually perform best inside their codebases.

Subscribe to get latest news!

Deep insights for enterprise AI, data, and security leaders

By submitting your email, you agree to our Terms and Privacy Notice.

Nuneybits Vector art of glowing scatterplot transformed into co 2860d5e5-a9d2-4366-acd8-947838753fb6

AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.

For decades, the IQ test has been one of the most familiar — and most contested — yardsticks for human intelligence. Now, a startup project called AI IQ is applying the same metaphor to artificial intelligence, assigning estimated intelligence quotients to more than 50 of the world's most powerful language models and plotting them on a standard bell curve.

Nuneybits Vector art of pastel watercolors 1980s computer termi ed08da9b-0305-45aa-9851-1d339920bf82

OpenAI turns its sold-out GPT-5.5 party into a monthlong Codex giveaway for 8,000 developers

"We had over 8,000 people express interest in just 24 hours, and while we wish our office was big enough to welcome everyone, we weren't able to make space for every person who applied," the company wrote in the email, which VentureBeat obtained. "As a small token of appreciation, we've 10x'ed your Codex rate limits until June 5th on your personal ChatGPT account."