Artificial intelligence (AI) large language models (LLM) like OpenAI’s hit GPT-3, 3.5, and 4, encode a wealth of information about how we live, communicate, and behave, and researchers are constantly finding new ways to put this knowledge to use.
A recent study conducted by Stanford University researchers has demonstrated that, with the right design, LLMs can be harnessed to simulate human behavior in a dynamic and convincingly realistic manner.
The study, titled “Generative Agents: Interactive Simulacra of Human Behavior,” explores the potential of generative models in creating an AI agent architecture that remembers its interactions, reflects on the information it receives, and plans long- and short-term goals based on an ever-expanding memory stream. These AI agents are capable of simulating the behavior of a human in their daily lives, from mundane tasks to complex decision-making processes.
Moreover, when these agents are combined, they can emulate the more intricate social behaviors that emerge from the interactions of a large population. This work opens up many possibilities, particularly in simulating population dynamics, offering valuable insights into societal behaviors and interactions.
A virtual environment for generative agents
In the study, the researchers simulated the generative agents in Smallville, a sandbox game environment composed of various objects such as buffets, schools, bars, and more.
The environment is inhabited by 25 generative agents powered by an LLM. The LLM is initiated with a prompt that includes a detailed description of the agent’s behavior, occupation, preferences, memories, and relationships with other agents. The LLM’s output is the agent’s behavior.
The agents interact with their environment through actions. Initially, they generate an action statement in natural language, such as “Isabella is drinking coffee.” This statement is then translated into concrete movements within Smallville.
Moreover, the agents communicate with each other through natural language dialog. Their conversations are influenced by their previous memories and past interactions.
Human users can also interact with the agents by speaking to them through a narrator’s voice, altering the state of the environment, or directly controlling an agent. The interactive design is meant to create a dynamic environment with many possibilities.
Remembering and reflecting
Each agent in the SmallVille environment is equipped with a memory stream, a comprehensive database that records the agent’s experiences in natural language. This memory stream plays a crucial role in the agent’s behavior.
For each action, the agent retrieves relevant memory records to aid in its planning. For instance, if an agent encounters another agent for the second time, it retrieves records of past interactions with that agent. This allows the agent to pick up on previous conversations or follow up on tasks that need to be completed together.
However, memory retrieval presents a significant challenge. As the simulation length increases, the agent’s memory stream becomes longer. Fitting the entire memory stream into the context of the LLM can distract the model. And once the memory stream becomes too lengthy, it won’t fit into the context window of the LLM. Therefore, for each interaction with the LLM, the agent must retrieve the most relevant bits from the memory stream and provide them to the model as context.
To address this, the researchers designed a retrieval function that weighs the relevance of each piece of the agent’s memory to its current situation. The relevance of each memory is measured by comparing its embedding with that of the current situation (embeddings are numerical values that represent different meanings of text and are used for similarity search). The recency of memory is also important, meaning more recent memories are given higher relevance.
In addition to this, the researchers designed a function that periodically summarizes parts of the memory stream into higher-level abstract thoughts, referred to as “reflections.” These reflections form layers on top of each other, contributing to a more nuanced picture of the agent’s personality and preferences, and enhancing the quality of memory retrieval for future actions.
Memory and reflections enable the AI system to craft a rich prompt for the LLM, which then uses it to plan each agent’s actions.
Putting agents into action
Planning is another intriguing aspect of the project. The researchers had to devise a system that enabled the agents to perform direct actions while also being able to plan for the long term. To achieve this, they adopted a hierarchical approach to planning.
The model first receives a summary of the agent’s status and is prompted to generate a high-level plan for a long-term goal. It then recursively takes each step and creates more detailed actions, first in hourly schedules, and then in 5-15 minute tasks. Agents also update their plans as their environment changes and they observe new situations or interact with other agents. This dynamic approach to planning ensures that the agents can adapt to their environment and interact with it in a realistic and believable manner.
What happens when the simulation is run? Each agent starts with some basic knowledge, daily routines, and goals to accomplish. They plan and carry out those goals and interact with each other. Through these interactions, agents might pass on information to each other. As new information is diffused across the population, the community’s behavior changes. Agents react by changing or adjusting their plans and goals as they become aware of the behavior of other agents.
The researchers’ experiments show that the generative agents learn to coordinate among themselves without being explicitly instructed to do so. For example, one of the agents started out with the goal of holding a Valentine’s Day party. This information eventually reached other agents and several ended up attending the party. (A demo has been released online.)
Despite the impressive results of the study, it’s important to acknowledge the limitations of the technique. The generative agents, while surpassing other LLM-based methods in simulating human behavior, occasionally falter in memory retrieval. They may overlook relevant memories or, conversely, “hallucinate” by adding non-existent details to their recollections. This can lead to inconsistencies in their behavior and interactions.
Furthermore, the researchers noted an unexpected quirk in the agents’ behavior: they were excessively polite and cooperative. While these traits might be desirable in an AI assistant, they don’t accurately reflect the full spectrum of human behavior, which includes conflict and disagreement.
Simulacra of human behavior
The study has sparked interest within the research community. The Stanford researchers recently released the source code for their virtual environment and generative agents.
This has allowed other researchers to build upon their work, with notable entities such as the famed venture capitalist firm Andreessen Horowitz (a16z) creating their own versions of the environment.
While the virtual agents of Smallville are entertaining, the researchers believe their work has far-reaching, practical applications.
One such application is prototyping the dynamics in mass-user products such as social networks. The researchers hope that these generative models could help predict and mitigate negative outcomes, such as the spread of misinformation or trolling. By creating a diverse population of agents and observing their interactions within the context of a product, researchers can study emerging behaviors, both positive and negative. The agents can also be used to experiment with counterfactuals and simulate how different policies and modifications in behavior can change outcomes. This concept forms the basis of social simulacra.
However, the potential of generative agents is not without its risks. They could be used to create bots that convincingly imitate real humans, potentially amplifying malicious activities like spreading misinformation on a large scale. To counteract this, the researchers propose maintaining audit logs of the agents’ behaviors to provide a level of transparency and accountability.
“Looking ahead, we suggest that generative agents can play roles in many interactive applications, ranging from design tools to social computing systems to immersive environments,” the researchers write.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.