Chinese search giant Baidu has introduced a new addition to its ERNIE 4.5 series of large-scale language models: ERNIE-4.5-21B-A3B-Thinking and while its benchmark performance remains below that of top U.S. models like OpenAI's GPT-5, Google Gemini 2.5 Pro or Anthropic's Claude Opus 4, it nonetheless performs LLM reasoning with impressive efficiency (activating only 3 billion of its total 21 billion parameters per input token) and is open source, making it a good choice for organizations on a budget or with specific ownership/customizability needs.

Parameters are the internal settings that govern a model's behavior, with more typically denoting a more powerful model (OpenAI's older GPT-4 was speculated to have 1.8 trillion — the company hasn't confirmed or denied), but increasingly, AI labs have been squeezing higher performance out of lower parameter count models in a bid to bring down costs for themselves and customers who may be running the models.

ERNIE-4.5-21B-A3B-Thinking is available now on Hugging Face under an enterprise-friendly Apache 2.0 license — allowing for commercial usage — and is specifically optimized for advanced reasoning, tool usage, and extended context tasks, marking a clear focus on the challenges of deep thinking for enterprises and individual developers.

Specifically, Baidu says that Ernie-4.5-21B-A3B-Thinking is tailored to high-complexity tasks in areas such as logical reasoning, math, science, and code generation.

According to Baidu, the model delivers significantly improved reasoning performance compared to earlier ERNIE 4.5 lightweight variants (described in its technical white paper on the model family here), particularly on tasks requiring logical deduction, mathematical problem solving, and structured academic reasoning.

These improvements are a result of enhancements in post-training, which include supervised fine-tuning and reinforcement learning designed specifically for reasoning workloads.

The model also introduces enhanced tool usage, including support for structured function calling. This enables use cases where models must interact with external tools or APIs—for example, providing weather forecasts by invoking a function call with specific parameters.

A Lightweight Model Designed for Heavyweight Tasks

While the ERNIE 4.5 family includes models with up to 424 billion total parameters, the 21B-A3B variant is positioned as a more accessible option, offering a balance between performance and resource efficiency.

Only 3 billion parameters are activated per token, making it a more viable candidate for organizations with constrained compute resources.

Its context length is 128,000, equivalent to the previous generation of U.S. lab flagship AI models (GPT-5 is now up to 256,000, or double that, with Gemini, Qwen and other labs going to a million-token context length for some models.) Context windows refer to the amount of information that can be exchanged in one input/output exchange between a user and LLM — measured in tokens, which is the LLM's numerical representations of concepts, words or word parts, and mathematical formulations, basically, its "native language." Larger context means more information can be exchanged. In this case, it's roughly equivalent to a 300-page book.

Despite its relatively compact size compared to flagship models in the ERNIE lineup, this version is tuned to excel in tasks that demand structured reasoning.

The improvements stem in part from its post-training phase, where the model is further refined through supervised fine-tuning and reinforcement learning on reasoning-focused datasets.

Importantly, Baidu marked this release with a “thinking” designation, distinguishing it from non-thinking versions that may prioritize faster inference or simpler task execution.

The “thinking” variant is recommended specifically for scenarios where complex, step-by-step logical reasoning is essential, such as tough math, science, research and development problems.

Decent benchmarks

ERNIE-4.5-21B-A3B-Thinking demonstrates strong results across several reasoning-heavy tasks, notably scoring 89.8 on ZebraLogic, 87.77 on BBH (Big-Bench Hard), and 86.5 on WritingBench—tasks that require multi-step logical processing and structured thinking.

On HumanEval+ and MBPP, two commonly used programming benchmarks, it scores over 90 and 80 respectively, indicating solid code synthesis and function generation capabilities.

ERNIE-4.5-21B-A3B-Thinking

Graphic of benchmark results shared by Baidu.

In academic-style math tasks (AIME2025) and scientific QA (BFCL, MUSR), the model trails slightly behind Gemini 2.5 Pro but remains competitive.

It leads on WritingBench and IF-Eval (Instruction Following Evaluation), highlighting its strength in following structured prompts and generating coherent responses. However, it shows relatively lower performance in ChineseSimpleQA at 49.06, suggesting a tradeoff between deep reasoning specialization and general multilingual performance. These results reflect Baidu’s design emphasis on “thinking” tasks, where the model’s long-context support and expert routing architecture are most impactful.

Deployment and Ecosystem Compatibility

The model is available under the Apache 2.0 license, making it freely accessible for both research and commercial use. Baidu has published it through Hugging Face and GitHub, with support for deployment via a range of popular frameworks:

  • FastDeploy, which supports launching the model with just one 80GB GPU (with version 2.2 or higher)

  • vLLM, with ongoing development for reasoning-specific parsers

  • Transformers 4.54.0+, including full support for tokenization and text generation workflows

The model card, technical documentation, and citation guidelines are all publicly available, and the company also offers a GitHub repository for development toolkits.

It's also already hooked up to AnyCoder, the open source "vibe coding" application from Hugging Face's ML Growth Lead Ahsen Khaliq (@_akhaliq on X).

This broad compatibility enables researchers and developers to incorporate the model into existing AI infrastructure without extensive adaptation.

For inference, users can interact with the model via REST API or Python-based generation pipelines, including function-call support for structured outputs.

Architecture and Design Details

The ERNIE-4.5-21B-A3B-Thinking model is built on a Transformer-based architecture that leverages 64 text and 64 vision experts, with six of each activated per token.

The architecture supports Mixture-of-Experts routing for efficient specialization, along with dense attention layers that facilitate cross-modal interaction when used in broader multimodal systems.

While this specific variant is language-only, it inherits the modular design from the full ERNIE 4.5 system. This design allows teams to swap or remove vision-related modules for more efficient deployment in pure language applications.

As with other ERNIE models, training was carried out using the PaddlePaddle deep learning framework, which enables optimizations such as FP8 mixed-precision training and inference-friendly quantization.

Baidu reports that models in the ERNIE 4.5 family are capable of high-throughput inference with relatively low hardware requirements. For example, the largest model in the lineup can achieve up to 56,000 input tokens per second (TPS) and 18,000 output TPS per node using four 80GB GPUs and 4-bit quantization.

While the 21B model operates at a smaller scale, it benefits from these same engineering optimizations.

AI Community Reception

Among AI experts, researchers and third-party company personnel on X, the reaction to the new Ernie 4.5 model was largely positive from my anecdotal observation.

User Petri Kuittinen (@KuittinenPetri) noted that with only 3 billion active parameters, it was "optimized for fast inference," and offers "competition for Qwen3-30B-A3B-2507, which is has been one of the most cost efficient models ever."

Content strategist Knut Jägersberg wrote on X that "it's fast, on my modest hardware getting 90 tokens/s for the 8 bit gguf."

User Girish lelouch (@girish_lelouch) called it "a true powerhouse for complex tasks."

Open Release Strategy

Baidu continues its open-source approach with this release, encouraging researchers and developers to experiment, adapt, and build upon the ERNIE 4.5 models.

It follows on the heels of many other leading Chinese AI labs — from DeepSeek to Moonshot, Z.ai, and Alibaba's Qwen — also pursuing a strategy of releasing powerful models under open source licenses that developers can take, use, modify and deploy freely, as opposed to the U.S. lab offerings which are often paid and proprietary (OpenAI's new gpt-oss models were a notable attempt by the U.S. AI leader to compete in this category, though initial reception has been mixed).

With this release, Baidu positions the ERNIE-4.5-21B-A3B-Thinking model as a publicly accessible, high-capability language model aimed at researchers, AI developers, and enterprises tackling logic-intensive or context-heavy applications. The model is now live for download and deployment.

Implications for Enterprise Technical Decision Makers

The release of ERNIE-4.5-21B-A3B-Thinking offers an advanced open-source alternative for organizations aiming to deploy LLMs capable of deep reasoning and extended context handling.

For AI engineers responsible for managing the lifecycle of language models, this version of ERNIE presents a compelling option where lightweight architecture meets high reasoning performance. Its tool-use integration and 128K context window unlock capabilities such as multi-turn logic chains, long document analysis, and function call orchestration — all useful for enterprise-level applications ranging from customer support automation to internal RAG systems.

From a platform orchestration perspective, the model’s compatibility with FastDeploy, Transformers, and vLLM makes it feasible to integrate within existing pipelines—particularly in hybrid cloud environments where modular deployment and fine-grained GPU scheduling are critical. For organizations balancing tight engineering resources with growing model deployment needs, ERNIE-4.5-21B-A3B-Thinking represents an option that scales technically without demanding a flagship-class budget.

Data and infrastructure leads may find the modular design advantageous. The model’s architecture supports separate deployment of vision or text experts, making it easier to fine-tune or prune unnecessary components depending on the enterprise data mix. This allows for more efficient use of hardware—especially important when trying to optimize throughput under memory or compute constraints.

However, security and compliance stakeholders must also weigh geopolitical realities. ERNIE-4.5 is developed by Baidu, a major Chinese technology company, and while the model is open-source under Apache 2.0, deploying it —particularly in production settings — may raise internal or external scrutiny in certain regions. For U.S. and EU-based enterprises, depending on organizational policy or industry-specific regulation, there may be concerns about supply chain transparency, latent dependencies, or national security standards and perceptions.

This tension doesn’t make ERNIE unusable, but it does make it politically complex — particularly for firms operating in sensitive sectors (e.g. defense, healthcare, or finance), those contracting with government agencies, or those subject to export control considerations.

Ultimately, the decision to adopt ERNIE-4.5-21B-A3B-Thinking is not purely technical. It must be weighed in the context of deployment flexibility, performance requirements, and geopolitical risk tolerance. For teams focused on research, prototyping, or internal tools where exposure is limited, the model could be a powerful and cost-effective option. For production systems serving regulated or high-trust environments, risk management policies should be consulted before adoption.