The debate over neural network complexity: Does bigger mean better?

Artificial intelligence (AI) has made tremendous progress since its inception, and neural networks are usually part of that advancement. Neural networks that apply weights to variables in AI models are an integral part of this modern-day technology.

Research is ongoing, and experts still debate whether bigger is better in terms of neural network complexity.

Traditionally, researchers have focused on constructing neural networks with a large number of parameters to achieve high accuracy on benchmark datasets. While this approach has resulted in the development of some of the most intricate neural networks to date — such as GPT-3 with more than 175 billion parameters now leading to GPT-4. But it also comes with significant challenges.

For example, these models require enormous amounts of computing power, storage, and time to train, and they may be challenging to integrate into real-world applications.

Experts in the AI community have differing opinions on the importance of neural network complexity. Some argue that smaller, well-trained networks can achieve comparable results to larger models if they are trained effectively and are efficient.

For instance, newer models such as Chinchilla by Google DeepMind — comprising “just” 70 billion parameters — claims to outperform Gopher, GPT-3, Jurassic-1 and Megatron-Turing NLG across a large set of language benchmarks. Likewise, LLaMA by Meta — comprising 65 billion parameters — shows that smaller models can achieve greater performances.

Nevertheless, the ideal size and intricacy of neural networks remain a matter of debate in the AI community, raising the question: Does neural network complexity matter?

The essence of neural network complexity

Neural networks are built from interconnected layers of artificial neurons that can recognize patterns in data and perform various tasks such as image classification, speech recognition, and natural language processing (NLP). The number of nodes in each layer, the number of layers and the weight assigned to each node determine the complexity of the neural network. The more nodes and layers a neural network has, the more complex it is.

With the advent of deep learning techniques that require more layers and parameters, the complexity of neural networks has increased significantly. Deep learning algorithms have enabled neural networks to serve in a spectrum of applications, including image and speech recognition and NLP. The idea is that more complex neural networks can learn more intricate patterns from the input data and achieve higher accuracy.

“A complex model can reason better and pick up nuanced differences,” said Ujwal Krothapalli, data science manager at EY. “However, a complex model can also ‘memorize’ the training samples and not work well on data that is very different from the training set.”

Larger is better

A paper presented in 2021 at the leading AI conference NeurIPS by Sébastien Bubeck of Microsoft Research and Mark Sellke of Stanford University explained why scaling an artificial neural network’s size leads to better results. They found that neural networks must be larger than conventionally expected to avoid specific fundamental problems.

However, this approach also comes with a few drawbacks. One of the main challenges of developing large neural networks is the amount of computing power and time required to train them. Additionally, large neural networks are often challenging to deploy in real-world scenarios, requiring significant resources.

“The larger the model, the more difficult it is to train and infer,” Kari Briski, VP of product management for AI software at Nvidia, told VentureBeat. “For training, you must have the expertise to scale algorithms to thousands of GPUs and for inference, you have to optimize for desired latency and retain the model’s accuracy.”

Briski explained that complex AI models such as large language models (LLMs) are autoregressive, and the compute context inputs decide which character or word is generated next. Therefore, the generative aspect could be challenging based on application specifications.

“Multi-GPU, multi-node inference are required to make these models generate responses in real-time," she said. "Also, reducing precision but maintaining accuracy and quality can be challenging, as training and inference with the same precision are preferred."

Best results from training techniques

Researchers are exploring new techniques for optimizing neural networks for deployment in resource-constrained environments. Another paper presented at NeurIPS 2021 by Stefanie Jegelka from MIT and researchers Andreas Loukas and Marinos Poiitis revealed that neural networks do not require to be complex and best results can be achieved alone from training techniques.

The paper revealed that the benefits of smaller-sized models are numerous. They are faster to train and easier to integrate into real-world applications. Moreover, they can be more interpretable, enabling researchers to understand how they make predictions and identify potential data biases.

Juan Jose Lopez Murphy, head of data science and artificial intelligence at software development firm Globant said he believes that the relationship between network complexity and performance is, well, complex.

“With the development of “scaling laws”, we’ve discovered that many models are heavily undertrained," Murphy told VentureBeat. "You need to leverage scaling laws for general known architectures and experiment on the performance from smaller models to find the suitable combination. Then you can scale the complexity for the expected performance."

He says that smaller models like Chinchilla or LLaMA — where greater performances were achieved with smaller models — make an interesting case that some of the potential embedded in larger networks might be wasted, and that part of the performance potential of more complex models is lost in undertraining.

“With larger models, what you gain in the specificity, you may lose in reliability,” he said.” We don’t yet fully understand how and why this happens — but a huge amount of research in the sector is going into answering those questions. We are learning more every day.”

Different jobs require different neural schemes

Developing the ideal neural architecture for AI models is a complex and ongoing process. There is no one-size-fits-all solution, as different tasks and datasets require different architectures. However, several key principles can guide the development process.

These include designing scalable, modular and efficient architectures, using techniques such as transfer learning to leverage pre-trained models and optimizing hyperparameters to improve performance. Another approach is to design specialized hardware, such as TPUs and GPUs, that can accelerate the training and inference of neural networks.

Ellen Campana, leader of enterprise AI at KPMG U.S., suggests that the ideal neural network architecture should be based on the data size, the problem to be solved and the available computing resources, ensuring that it can learn the relevant features efficiently and effectively.

“For most problems, it is best to consider incorporating already trained large models and fine-tuning them to do well with your use case," Campana told VentureBeat. "Training these models from scratch, especially for generative uses, is very costly in terms of compute. So smaller, simpler models are more suitable when data is an issue. Using pre-trained models can be another way to get around data limitations.”

More efficient architectures

The future of neural networks, Campana said, lies in developing more efficient architectures. Creating an optimized neural network architecture is crucial for achieving high performance.

“I think it’s going to continue with the trend toward larger models, but more and more they will be reusable," said Campana. "So they are trained by one company and then licensed for use like we are seeing with OpenAI’s Davinci models. This makes both the cost and the footprint very manageable for people who want to use AI, yet they get the complexity that is needed for using AI to solve challenging problems.”

Likewise, Kjell Carlsson, head of data science strategy and evangelism at enterprise MLOps platform Domino Data Lab, believes that smaller, simpler models are always more suitable for real-world applications.

“None of the headline-grabbing generative AI models is suitable for real-world applications in their raw state,” said Carlsson. “For real-world applications, they need to be optimized for a narrow set of use cases, which in turn reduces their size and the cost of using them. A successful example is GitHub Copilot, a version of OpenAI’s codex model optimized for auto-completing code.”

The future of neural network architectures

Carlsson says that OpenAI is making models like ChatGPT and GPT4 available, because we do not yet know more than a tiny fraction of the potential use cases.

“Once we know the use cases, we can train optimized versions of these models for them," he said. "As the cost of computing continues to come down, we can expect folks to continue the “brute force-ish” approach of leveraging existing neural network architectures trained with more and more parameters."

He believes that we should also expect breakthroughs where developers may come up with improvements and new architectures that dramatically improve these models’ efficiency while enabling them to perform an ever-increasing range of complex, human-like tasks.

Likewise, Amit Prakash, cofounder and CTO at AI-powered analytics platform ThoughtSpot, says that we will routinely see that larger and larger models show up with stronger capabilities. But, then there will be smaller versions of those models that will try to approximate the quality of the output of smaller models.

“We will see these larger models used to teach smaller models to emulate similar behavior,” Prakash told VentureBeat. “One exception to this could be sparse models or a mixture of expert models where a large model has layers that decide which part of the neural network should be used and which part should be turned off, and then only a small part of the model gets activated.”

He said that ultimately, the key to developing successful AI models would be striking the right balance between complexity, efficiency and interpretability.