Back to blog

Aug 28 2024

Integrating LLMs and Software Engineering for Self-Refining Copy Creation - Cerebras

AI agents are among the most exciting advancements in the field of large language models (LLMs). By integrating agentic workflows, these models can now better handle planning, reasoning, and decision-making. These workflows makes it possible to build applications that were once considered out of reach. Take, for example, the task of generating marketing copy—a process that involves multiple complex decisions: understanding a product’s value propositions, identifying target markets, selecting the right channels, and tailoring content for each platform. Traditionally, automating such decisions through software alone would have been a impossible. However, by developing hybrid systems that combine the reliability of traditional software engineering with the adaptive capabilities of LLMs, we can now automate this process in a way that is both robust and flexible.

In this blog post, we will guide you through the process of building such an AI agent. You’ll learn how to design a system that can handle the decision-making and execution required in marketing copy generation. We’ll also explore how fast inference, provided by the Cerebras Inference SDK, can significantly enhance the performance of these agents, enabling them to deliver results swiftly and efficiently.

This post is primarily aimed at developers looking to build AI agents, however, it also offers valuable insights for product managers who want to understand how these technologies can be leveraged to unlock new possibilities in product development.

Before diving in, let’s outline what we want our workflow for our marketing AI agent to be. It should include:

Brainstorming the key value propositions for a given product.
Using a search engine, identifying the main markets and audience demographics that the product would appeal to.
Determining which marketing channels are appropriate for reaching the target market segments.
Creating a strategy for generating content appropriate to each marketing channel.
Generating draft copy based for each marketing channel based on the strategy.
Iterating on the draft copy based on evaluation criteria that’s relevant to the corresponding strategy.

Quick Links

Designing the agent

Let’s start by considering the design of our agent. At a high level, we’ll split the agent design into three components:

The workflow that the agent needs to run through.
Utility functions that will be used across the whole workflow.
A plugin model to encode expert knowledge on each kind of copy we’ll want to generate (blog posts, tweets, and so on).

Each of these three components offer a way to make LLM use more robust and flexible.

The workflow lets us tune agent behavior across the whole process.
The utility functions let us tune agent behavior on each LLM invocation.
The plugin model lets us tune agent behavior for each copy type.

Encoding an Expert Workflow

The agent will work through the task in five phases.

Phase 1: Brainstorm the key value propositions of any product it’s given.
Phase 2: Use the generated value propositions to identify the main markets and audience demographics that the product would appeal to.
Phase 3: Identify appropriate marketing channels to reach the identified markets and audiences.
Phase 4: Generate a content strategy along with evaluation criteria to check if the generated copy is of high quality.
Phase 5: Generate and iterate on marketing copy.

At each phase of the workflow, it’s crucial to ensure that the AI agent approaches the problem with the same structured decision-making used by experts. We can achieve this in LLMs by enforcing structured outputs, which we’ll explore further below.

Some phases require market information from the web—data that a standard LLM API cannot provide. To address this, we’ll leverage Perplexity, an LLM interface that also functions as a search engine. Since we’ll be using multiple LLMs throughout the process, it’s essential to have a common interface that works seamlessly across them. Our utility function for generating structured outputs must be compatible with different LLMs, including Perplexity.

This multi-phase approach involves numerous sequential LLM calls, particularly in Phases 4 and 5, where content is generated and refined. Here, fast inference becomes critical. A high-speed LLM API, like the one offered by Cerebras, can dramatically reduce the agent’s overall execution time. Faster inference allows for more iterations in refining copy, broader exploration of marketing strategies, and quicker adaptation to different channels. By utilizing fast inference, we can build an AI agent that not only adheres to expert workflows but also operates at the speed required for practical, real-world marketing applications.

Structured Outputs

Structured outputs ensure that the LLM’s responses adhere to a predefined specification. For instance, when determining a product’s value proposition, it’s essential to first identify the problems the product addresses. We can encode this expert behavior by requiring the LLM to generate the list of problems before producing the value proposition. In pseudocode, this approach would look like this:

class LLMResponse:
     problem_addressed
     value_proposition

To ensure expert behavior is consistently encoded across all five phases, we’ll need a utility function that can pass structure specifications to the LLM. Ideally, this utility function should have the following form: def query_structured(output_specification, inputs)

Since we require structured output support across different LLMs, we’ll encapsulate this utility function within an LLMEngine class. This approach will automatically enable structured output capabilities for any LLM, providing a unified interface to handle structured data seamlessly across various models.

With this, we can create an LLMEngine for Cerebras and for Perplexity, which will allow us to get structured outputs from both.

class LLMEngine:
	def query(prompt):
		“““The implementation will be specific to each API.”””

	def query_structured(output_specification, inputs):
		“““This will convert the output_specification and inputs into prompts, and it will use query_llm to generate the actual output.”””

With this, we can create an LLMEngine for Cerebras and for Perplexity, which will allow us to get structured outputs from both.

Copy Plugin Model

Abstractly, we can model each instance of copy as consisting of two main components: content and metadata. The content represents the core information that the audience will receive, while the metadata includes additional data that constrains or shapes the content, either literally or metaphorically. For instance, the content of a blog post should be influenced by its title, which is part of the metadata.
For each type of copy we want to generate, our copy plugin model needs to define how both the metadata and the content are produced. We will achieve this using structured outputs, providing distinct specifications for both the metadata and content.

Ideally, each plugin should be able to specify the following:

class CopyPlugin:
    metadata_specification
    content_specification

When determining the appropriate marketing channels, the agent should check all available CopyPlugins to identify any that are relevant to the task. Additionally, when generating each piece of copy, the agent should reference the relevant CopyPlugin to guide the generation of both metadata and content specific to that copy type.

Overall design

We now have all the components necessary to define our agent’s workflow. The agent will:
Iterate through each phase of the workflow.
Whenever it needs to invoke the LLM, it will use the utility function for generating structured outputs with whichever LLM is appropriate for the task, whether it’s one hosted by Cerebras or Perplexity.
When deciding which kinds of copy to generate, it will check which copy types are supported by the copy plugin.
To generate the final copy, it will find the appropriate copy plugin and use that to guide the LLM in copy generation.

Building the Agent

Now that we have the design for our agent, let’s dive into the code.

The code we will be looking at throughout this section will be a simplified version of what’s in the repository, and we’ll only focus on the core functionality. As well, we won’t use asynchronous API calls in the blog post. Please refer to the repository to see the asynchronous version.

Before starting, please ensure that:

You have installed the Cerebras Inference SDK
Have API keys for Cerebras and Perplexity.

Generating structured outputs

We have two types of outputs we need to support: dataclasses, which are typed key-value pairs, and markdown blocks. Since both can be implemented in a similar way, we’ll focus on just dataclass support in this blog. You can find the code for both in the base_engine.py.

To create our desired interface for dataclass outputs, our functions needs to:

Accept a dataclass, along with any inputs that should be passed to the LLM
Convert the dataclass and inputs into LLM messages
Invoke the LLM with the resulting messages
Populate an instance of the dataclass with the response

To convert dataclasses to messages, we’ll be using Pydantic to generate JSON Schema specifications from dataclasses.

JSON Schema is a specification language that lets us indicate what fields a JSON-compatible object should contain. For example, the following dataclass:

from pydantic import BaseModel

class Response(BaseModel):
	header: str
	hashtags: list[str]

would get converted to the following JSON Schema:

{
  "properties": {
    "header": {
      "title": "Header",
      "type": "string"
    },
    "hashtags": {
      "items": {
        "type": "string"
      },
      "title": "Hashtags",
      "type": "array"
    }
  },
  "required": [
    "header",
    "hashtags"
  ],
  "title": "Response",
  "type": "object"
}

JSON Schema works particularly well for our use case since modern LLMs are great at handling JSON Schema.

To convert between dataclasses and JSON Schema, we’ll use the Pydantic library. Pydantic is a popular Python library that supports creating OpenAPI-compatible interfaces. As part of that, Pydantic provides utility functions for converting dataclasses to JSON Schema and converting compliant JSON back to instances of the dataclass.

json_schema = Response.model_json_schema()
response_obj = Response(**{
    "header": "Hello Cerebras",
    "hashtags":["#fastinference", "#llm"]
})

With this setup, we have everything needed to generate prompts for structured outputs. For each task, we’ll generate two types of messages: a system message and a user message. LLMs typically use system messages to establish constraints and guidelines for the output, while user messages specify the task at hand. To ensure robustness in our output generation, we’ll use both message types to clearly indicate to the LLM that structured outputs are required.

from pydantic import BaseModel
import json
from xml.sax.saxutils import escape as xml_escape


def generate_obj_query_messages(response_model: type[BaseModel], prompt_args: dict):
   user_prompt = compile_user_prompt(**prompt_args) + (
       "\n\nReturn the correct JSON response within a "`json codeblock, not the "
       "JSON_SCHEMA. Use only fields specified by the JSON_SCHEMA and nothing else.")
   system_prompt = compile_system_prompt(response_model
   )

   return [
       {"role": "system", "content": system_prompt},
       {"role": "user", "content": user_prompt},
   ]

def compile_system_prompt(response_model: type[BaseModel]):
   schema = response_model.model_json_schema()
   return (
       "Your task is to understand the content and provide "
       "the parsed objects in json that matches the following json_schema:\n\n"
       f"{json.dumps(schema, indent=2)}\n\n"
       "Make sure to return an instance of the JSON, not the schema itself."
   )


def compile_user_prompt(**kwargs):
   prompt_pieces = []
   for key, value in kwargs.items():
       value = escape(json.dump(value))
       prompt_pieces.append(f"<{key}>{value}</{key}>")


   return "\n\n".join(prompt_pieces)

The LLM prompt text in the above code was taken from instructor, a popular library for generating structured outputs.

Once we pass those messages to the LLM and get a response, we can convert the response back into a dataclass like this:

def parse_obj_response(response_model: type[BaseModel], content: str):
   json_start = content.find(""`json") + 7
   json_end = content.find(""`", json_start)
   obj = json.loads(content[json_start:json_end].strip())
   return response_model(**obj)

With these functions, we can create the LLMEngine class to add this functionality to any LLM.

from abc import ABC, abstractmethod

class LLMEngine(ABC):
   @abstractmethod
   def query(self, **kwargs):
        pass

   def query_object(self, response_model: type[BaseModel], **prompt_args):
       try:
           response = self.query(
               messages=generate_obj_query_messages(response_model, prompt_args)
           )

       return parse_obj_response(response_model, response)

We’ll use this class to add support for structured outputs to both the Cerebras API and the Perplexity API. As an example, here’s how it works for the Cerebras API:

from cerebras.cloud.sdk import Cerebras, AsyncCerebras
from .base_engine import LLMEngine

class CerebrasEngine(LLMEngine):
   def __init__(self, model: str):
       self.model = model
       self.client = Cerebras()

   def query(self, **kwargs):
       response = self.client.chat.completions.create(model=self.model, **kwargs)
       return response.choices[0].message.content

Following a Workflow

Since we have a clear workflow in mind, we should manage it using traditional software control flow. By using structured outputs to bridge the gap between traditional software and LLMs, the workflow becomes straightforward and familiar to any developer. The majority of the effort will involve defining the appropriate dataclasses to ensure that the LLM generates outputs in a structured and predictable manner.

Let’s first define the LLMs we’ll need for the full workflow.

reasoning_llm = CerebrasEngine("llama3.1-70b")
search_llm = PerplexityEngine("llama-3.1-sonar-large-128k-online")

In the first phase, we need to generate candidate “angles” to help identify the product’s value propositions, which will then lead into the second phase. As mentioned earlier, we’ll follow an expert-driven approach by having the LLM first consider the problem that the product addresses before generating the actual value proposition. Additionally, we’ll take into account the product’s interface, which is essential for determining the potential users of the product.

When field names might be ambiguous, we can leverage Pydantic and JSON Schema to add descriptions to those fields. We’ll use this feature to indicate to the LLM what “usage” means below.

class ProductAngle(BaseModel):
   problem_addressed: str
   value_proposition: str
   usage: str = Field(
       ...,
       description=(
           "How does someone use the product? Web interface, API, "
           "consulting services, or anything else."
       ),
   )
def generate_copy_for_campaign(product_description: str):
   # Get candidate angles for the product description
   class Response(BaseModel):
       candidates: list[ProductAngle]

   response = reasoning_llm.query_object(
       Response,
       PRODUCT_DESCRIPTION=product_description,
       TASK="List some candidate angles for PRODUCT_DESCRIPTION.",
   )

   product_angles = response.candidates

   # Kick off phase 2
   for angle in product_angles:
       create_copy_for_angle(product_description, angle)

In the second phase, our goal is to identify the markets and audiences that might be interested in the product. Once we have this information, we’ll iterate over the identified markets and audiences to transition into Phase 3, where we will determine the appropriate marketing channels for each segment.

def create_copy_for_angle(product_description: str, angle: ProductAngle):
   # Identify candidate markets and audiences for the value proposition
   known_markets = generate_market_analysis(angle)
   audiences = generate_audience_analysis(angle)


   # Kick off phase 3
   for market in known_markets:
       for audience in audiences:
           create_copy_for_market_audience(
               product_description, angle, market, audience
           )

class Market(BaseModel):
   market_description: str
   example_products: List[str]
   capturable_market_size_dollars: str
   market_growth_yoy: str

def generate_market_analysis(angle: ProductAngle) -> list[Market]:
   # Identify candidate markets for the value proposition
   class Response(BaseModel):
       markets: list[Market]

   response = search_llm.query_object(
       Response,
       VALUE_PROPOSITION=angle.value_proposition,
       USAGE_MODEL=angle.usage,
       TASK=(
           "Using available market research, suggest some markets where "
           "VALUE_PROPOSITION through USAGE would be useful."
       ),
   )

   return response.markets

class Audience(BaseModel):
   description: str
   profile: str
   profile_name: str
   decision_maker: str
   demographics: list[str]

def generate_audience_analysis(angle: ProductAngle) -> list[Audience]:
   # Identify candidate audiences for the value proposition
   class Response(BaseModel):
       audiences: list[Audience]

   response = reasoning_llm.query_object(
       Response,
       PROBLEM_STATEMENT=angle.problem_addressed,
       USAGE=angle.usage,
       TASK=(
           "Suggest some target audiences for the PROBLEM_STATEMENT with the "
           "USAGE model."
       ),
   )

   return response.audiences

In Phase 3, we identify the marketing channels that are best suited for reaching the target market and audience. Then we iterate over each channel to kick off Phase 4.

class Channel(BaseModel):
   name: str
   description: str
   # The CopyFormat contains all supported copy types according to
   # the copy plugin, which we’ll discuss later.
   copy_format: Union[CopyFormat, str]
   pros: List[str]
   cons: List[str]

def create_copy_for_market_audience(
   product_description: str,
   angle: ProductAngle,
   market: Market,
   audience: Audience,
):
   # Identify suitable channels for reaching the audience in the market
   class Response(BaseModel):
       channels: list[Channel]

   response = reasoning_llm.query_object(
       Response,
       VALUE_PROPOSITION=angle.value_proposition,
       AUDIENCE_PROFILE=audience.profile,
       MARKET=market,
       DEMOGRAPHICS=audience.demographics,
       TASK=(
           "Suggest some channels for reaching the DEMOGRAPHICS with "
           "VALUE_PROPOSITION in MARKET. Include various social media and "
           "physical channels where appropriate."
       ),
   )

   # Kick off phase 4
   for channel in response.channels:
       create_copy_for_channel(
           product_description, angle, market, audience, channel
       )

In Phase 4, we develop a specific strategy for each identified marketing channel, along with any criteria we can use to evaluate copy generated for the channel.

class CopyStrategy(BaseModel):
   strategy: str
   product_positioning: str
   competitive_claim: str
   review_criteria: list[str]

def create_copy_for_channel(
   product_description,
   angle: ProductAngle,
   market: Market,
   audience: Audience,
   channel: Channel,
):
   strategy = reasoning_llm.query_object(
       CopyStrategy,
       PROBLEM_STATEMENT=angle.problem_addressed,
       VALUE_PROPOSITION=angle.value_proposition,
       MARKETS=market.model_dump(),
       DEMOGRAPHICS=audience.demographics,
       CHANNEL=channel.name,
       TASK=(
           "Generate a strategy for generating a COPY_FORMAT for the "
           "VALUE_PROPOSITION targeting the DEMOGRAPHICS through the CHANNEL. "
           "Suggest whatever content format is appropriate for the CHANNEL, "
           "and suggest review criteria for making sure COPY_FORMAT is good."
       ),
   )

   # Kick off phase 5
   create_copy_for_strategy(
      product_description, angle, audience, channel, strategy
   )

And finally in Phase 5, we generate the actual copy. We do this by first generating a “draft” of the copy, then iteratively improving it. On each iteration for improving the copy, we use the LLM to generate criticism and suggestions based on the previously-generated evaluation criteria, then use the LLM again to generate an update based on the feedback.

def create_copy_for_strategy(
   product_description: str,
   angle: ProductAngle,
   audience: Audience,
   channel: Channel,
   copy_strategy: CopyStrategy,
):
   copy = Copy(product_description, angle, audience, channel, copy_strategy)
   copy.initialize()
   for _ in range(NUM_REVISIONS):
       copy.improve()

   # Save the copy
   submit_copy(channel, copy.metadata, copy.content)

class Copy:
   def __init__(self, ...):
       # After saving the parameters to self...

       # Retrieve the copy classes based on the channel's copy format
       copy_classes = copy_plugins[
           channel.copy_format
       ]

       self._copy_class_name = copy_classes.name
       self._metadata_class = copy_classes.metadata_class
       self._content_class = copy_classes.content_class

   def initialize(self):
       # Generate the metadata
       metadata = reasoning_llm.query_structured(
           self._metadata_class,
           ..., # Inputs for generating the copy
           TASK=(
               f"Generate a {self._copy_class_name} for the PROBLEM_STATEMENT and "
               "VALUE_PROPOSITION targeting the AUDIENCE_PROFILE with "
               "the DEMOGRAPHICS."
           ),
       )

       # Generate the content
       content = await self._llm.query_structured(
           self._content_class,
           ..., # Inputs for generating the copy
           METADATA=metadata.model_dump(),
           TASK=(
               f"Generate a {self._copy_class_name} with METADATA for the PROBLEM_STATEMENT "
               "and VALUE_PROPOSITION targeting the AUDIENCE_PROFILE with the "
               "DEMOGRAPHICS."
           ),
       )

       self.metadata = metadata
       self.content = content

   def improve(self):
       class Evaluation(BaseModel):
           pros: List[str]
           cons: List[str]
           suggestions: List[str]

       evaluation = reasoning_llm.query_object(
           Evaluation,
           PRODUCT=self._product,
           METADATA=self.metadata.model_dump(),
           CONTENT=self.content,
           TASK=(
               f"The METADATA and CONTENT are for a {self._copy_class_name} for Evaluate the "
               "METADATA and CONTENT. List the pros and cons of each. Then suggest "
               "improvements."
           ),
       )

       # Generate the updated metadata
       metadata = reasoning_llm.query_structured(
           self._metadata_class,
           ..., # Inputs for generating the copy
           METADATA=self.metadata.model_dump(),
           EVALUATION=evaluation,
           TASK=(
               f"The METADATA describes a {self._copy_class_name}. Improve the METADATA based on the "
               "EVALUATION."
           ),
       )

       # Generate the updated content
       content = await self._llm.query_structured(
           self._content_class,
           ..., # Inputs for generating the copy
           METADATA=metadata.model_dump(),
           CONTENT=self.content,
           TASK=(
               f"The CONTENT is for a {self._copy_class_name} with TITLE. Improve the CONTENT "
               "based on the EVALUATION."
           ),
       )

       self.metadata = metadata
       self.content = content

Since copy generation itself requires potentially a large number of iterations, we can realize large performance gains by taking advantage of faster inference.

In this implementation, we leveraged the Cerebras API for reasoning tasks and the Perplexity API for search-related operations. The workflow is primarily implemented using traditional software techniques, which allows us to encode expert knowledge and decision-making processes in a robust manner. This approach proves far more reliable than attempting to defer all aspects of control flow to the LLM. To create a seamless interface between our traditional software components and the LLMs, we utilize structured outputs, ensuring a clear and consistent interface between the LLM and the rest of the software. One of the key advantages of this hybrid approach is the ability to specify tasks, inputs, and outputs using natural language. This flexibility enables our workflow to handle the generation of marketing copy for a diverse range of products, making it a versatile tool for various marketing scenarios.

Copy plugins

In a few of the workflow phases above, we made reference to the copy plugins. In total, we needed two things from our copy plugins:

A list of copy formats supported, which was used to identify which channels we should target.
The metadata and content specifications, which was used by the Copy class to generate and iterate on the copy.

Since both requirements can be met with a simple dictionary, let’s use that.

from dataclasses import dataclass
from pydantic import BaseModel

@dataclass
class CopyPlugin:
   name: str
   metadata_class: Union[type[BaseModel], str]
   content_class: Union[type[BaseModel], str]

copy_plugins: dict[str, CopyPlugin] = {}

This takes advantage of the fact that our LLM can generate both objects and markdown blocks. If the class specifies a BaseModel, we’ll have it generate an object. If it specifies a string, we’ll have it generate a markdown block of the specified type.

We’ll walk through only a few examples so it’s clear how plugins are created. You can find the full list in the repo.

Here’s the specification for LinkedIn posts. The content class here takes advantage of the fact that the LLM can generate markdown using the markdown type “md”.

class LinkedInPost(BaseModel):
   title: str
   hashtags: List[str]
   mentions: List[str]
   image_description: Optional[str]

copy_plugins["LINKEDIN_POST"] = CopyPlugin(
   name="LinkedIn Post", metadata_class=LinkedInPost, content_class="md"
)

And for Twitter threads:

class Tweets(BaseModel):
   tweets: List[str]

class TwitterThread(BaseModel):
   image_description: Optional[str] = Field(
       ..., description="Describe a catchy image to include in the thread."
   )
   hashtags: List[str]
   mentions: List[str]

copy_plugins["TWITTER_THREAD"] = CopyPlugin(
   name="Twitter Thread", metadata_class=TwitterThread, content_class=Tweets
)

This structure gives us the flexibility we need to generate content for a wide variety of marketing channels while still allowing us to guide the LLM with hints to generate high-quality results.

Conclusion

In this blog post, we’ve walked through the process of building an AI agent for marketing tasks, and learned about how to orchestrate an agentic workflow with multiple LLMs.

Throughout this project, the benefits of fast inference solutions, such as Cerebras’, have been evident. In our marketing agent’s workflow, which involves multiple sequential LLM calls and iterative refinement processes, the speed of inference directly impacts the agent’s effectiveness. Faster inference allows for more comprehensive brainstorming, quicker market analysis, and crucially, more rounds of copy improvement. This translates to higher quality outputs and greater adaptability in real-world marketing scenarios, where time constraints often apply. As AI agents become more complex and are applied to time-sensitive tasks like marketing, the importance of fast, reliable inference will only grow.

Here are a few more important takeaways:

While we’ve demonstrated how to construct applications on top of LLMs, it’s important to note that this implementation is not production-ready. A comprehensive solution would require more extensive information beyond a brief product description.
Our marketing agent is designed around a hybrid approach, blending traditional software engineering with the capabilities of LLMs. By using conventional software control flow and structured outputs, we’ve encoded expert knowledge into the system. This approach allows us to leverage LLMs for a wide range of tasks while maintaining a structured and predictable workflow.
While we’ve covered the core components of each part of the system in this blog post, the actual repository implements everything asynchronously and expands on areas we’ve only partially explored here.

For developers, we hope this blog post has provided insights into creating your own AI agents, showcasing how to combine traditional software engineering practices with the power of LLMs. Similarly, we aim for product managers to gain a deeper understanding of how LLMs can unlock new possibilities in the product development process.

This project is just the beginning of exploring the potential of AI agents in various domains. Stay tuned for more blog posts as we continue to delve into the exciting world of AI-assisted software development and product creation.