LlamaSim is a multi-LLM framework that aims to simulate human behavior at scale. Given a specific environment (e.g., voters in Pennsylvania, students at CMU), we simulate how target groups would respond to important questions/events. This allows us to more accurately predict event and condition outcomes.

This project also has contributions from Thomas Bahmandeji, Sahra Mohamad, and Dhruvi Kadhiwala.

The challenge – simulating human behavior with complex scenarios

One use case of LlamaSim is for predicting elections. Traditional polls and static models often fail to capture dynamic voter behavior, especially in battleground states where every news cycle can shift the race. Mass simulating LLMs brings a cutting-edge solution to predict elections more accurately than conventional methods.

For example, to simulate election results in battleground states, we can create 100s of LLMs that behave and role-play like real voters.

  1. Replicate voter districts using US census data
  2. Develop agents with rich backstories, programming them to “think” like voters in various demographics
  3. Simulate conversations between agents to simulate how they would react to hypothetical scenarios (e.g., Donald Trump launching his own cryptocurrency)
  4. Quantitatively predict how voters change their opinions of true/false questions (e.g., are you voting for Donald Trump?) based on aforementioned hypothetical scenarios

How we built this

Simulating this type of interaction has traditionally been impractical due to slow inference speeds, which make group chat-style conversations cumbersome – especially at scale. This is very noticeable when using Autogen/CrewAI with base model providers. Moreover, simulating large groups of voters in real-time demands significant computational resources, with complexity only increasing as we scale to hundreds or even thousands of identities representing specific populations, such as voters from Pennsylvania.

With Cerebras, we were able to simulate 100 voters (from generation to group chat to prediction) in less than 4 minutes. Specifically, for each voter, we’re able to generate a synthetic identity with over 20 socioeconomic characteristics, then adding a unique backstory about their life experiences. These simulated individuals are then placed in a group chat with the other 99 voters to engage in a conversation around a specific prompt. Afterward, we conduct a true or false poll on a given question, both before and after presenting them with the same hypothetical scenario, to measure any shifts in their opinions.

LlamaSim leverages Llama 3.1 8B, hosted on Cerebras, which delivers over 2000 tok/s, significantly outperforming other model providers. LlamaSim currently uses GPT-4o-mini for voter characteristic generation and Cerebras for backstory generation, both for 100 identities. The performance comparison is striking: In a recent benchmark, GPT-4o-mini completed the task in 98.78 seconds, while Cerebras achieved the same result in just 41.53 seconds (nearly a ~60% reduction!). Needless to say, I’m super excited to be pushing an update to LlamaSim using Cerebras for characteristic generation as well!

One of the key challenges in this project was ensuring that the output adhered to a structured schema, particularly for identity generation. This was crucial for maintaining consistency and realism across the simulated agents. We used Cerebras integration with Instructor to efficiently generate structured outputs that met the required specifications, ensuring precise and coherent agent identities.

Another challenge we faced was managing conversation context in a native sliding window format. Since the model’s context length was limited, we had to dynamically handle the removal of older messages to ensure that the most relevant and recent information remained within the context window. This required careful balancing to maintain the flow of conversations while preventing any loss of critical context for coherent interactions. Super excited to be re-implementing this using mem0.ai!

What’s next?

I’m excited to continue enhancing LlamaSim by implementing both long-term and short-term memory storage using mem0.ai, and developing live news feeds to keep our agents updated with real-time information. Looking ahead, I’ll be expanding the framework to dynamically generate demographically aligned agents for any population, instantly, and will be launching a live public demo that anyone can interact with.

Try LlamaSim now on Github.

About Jet Wu

The Cerebras Fellows program has been invaluable in providing both technical support and increased rate limits for LlamaSim, allowing me to scale and optimize my project in ways I couldn’t have done alone. Beyond the technical resources, the mentorship and guidance I’ve received have been incredible, helping me navigate challenges and continuously refine my approach.

I’m currently a junior at Carnegie Mellon University studying Statistics & Machine Learning. Lately, I’ve been diving deep into behavioral economics, exploring how human decision-making intersects with data and AI. In my free time, I enjoy learning new things—most recently, trying to master golf (emphasis on trying) and plotting world domination through Catan.