
Google finds that AI agents learn to cooperate when trained against unpredictable opponents
Google finds diverse opponent training beats hardcoded orchestration for getting AI agents to cooperate in enterprise deployments.
Ben Dickson
New KV cache compaction technique cuts LLM memory 50x without accuracy loss
Enterprise AI hits a memory ceiling with long documents and complex tasks. MIT's new Attention Matching compresses the KV cache by 50x without accuracy loss — in seconds, not hours.
Ben Dickson
Microsoft's new AI training method eliminates bloated system prompts without sacrificing model performance
Microsoft's new OPCD framework trains AI models to internalize long system prompts directly into their weights, cutting inference overhead without losing general capability.
Ben Dickson
Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding
This training technique triples LLM inference speed without auxiliary models or infrastructure changes — using just a single special token added to the model's existing architecture.
Ben Dickson
New agent framework matches human-engineered AI systems — and adds zero inference cost to deploy
A new group-evolving agent framework from UC Santa Barbara matches human-engineered AI systems on SWE-bench — and adds zero inference cost to deploy. Here's how it works.
Ben Dickson
Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy
Nvidia researchers developed dynamic memory sparsification (DMS), a technique that compresses the KV cache in large language models by up to 8x while maintaining reasoning accuracy — and it can be retrofitted onto existing models in hours.
Ben Dickson
MIT's new fine-tuning method lets LLMs learn new skills without losing old ones
MIT researchers unveil a new fine-tuning method that lets enterprises consolidate their "model zoos" into a single, continuously learning agent.
Ben Dickson
TTT-Discover optimizes GPU kernels 2x faster than human experts — by training during inference
A new technique from Stanford, Nvidia, and Together AI lets models learn during inference rather than relying on static weights. It costs ~$500 per run but produced a GPU kernel 2x faster than the prior state-of-the-art.
Ben Dickson
This tree search framework hits 98.7% on documents where vector search fails
PageIndex, a new open-source framework, achieves 98.7% accuracy on complex document retrieval by using tree search instead of vector embeddings. The approach eliminates the need for dedicated vector databases.
Ben Dickson
AI models that simulate internal debate dramatically improve accuracy on complex tasks
A new study reveals that top models like DeepSeek-R1 succeed by simulating internal debates. Here is how enterprises can harness this "society of thought" to build more robust, self-correcting agents.
Ben Dickson
MemRL outperforms RAG on complex agent benchmarks without fine-tuning
A new technique developed by researchers at Shanghai Jiao Tong University and other institutions enables large language model agents to learn new skills without the need for expensive fine-tuning.

MIT’s new ‘recursive’ framework lets LLMs process 10 million tokens without context rot
Rather than expanding context windows or summarizing old information, the MIT team reframes long-context reasoning as a systems problem. By letting models treat prompts as something they can inspect with code, recursive language models allow LLMs to reason over millions of tokens without retraining. This offers enterprises a practical path to long-horizon tasks like codebase analysis, legal review, and multi-step reasoning that routinely break today’s models.