OpenCall gives businesses an enterprise grade AI contact center, helping them handle complex workflows – like appointment booking or identity verification – over the phone for customers. By integrating OpenCall, businesses can answer customer phone calls 24/7, triage callers, route them appropriately, and even process payments.

Challenges

To provide a seamless and reliable customer experience, OpenCall’s AI systems must process and respond to calls in real-time. This requires handling complex workflows where the LLM behind the scenes is running a growing “stack” of tool calls and subprocesses. 

When a user connects (inbound or outbound), OpenCall kicks off with a standard greeting. From there, it enters the primary inference loop where a lightweight model analyzes speech patterns to optimize response timing. 

There are three critical decision points: human transfer triggers, context switching detection, and conversation flow management. These decision points determine when to reset workflows for topic changes or transition to human support when needed.

One level up is the state machine architecture. Rather than monolithic processing, specialized states are chained to achieve extraordinary reliability and behavioral specificity.

For example, sending a simple confirmation message (“Thank you for being one of our valued patients”) actually runs multiple inference chains.

Running these extensive processes requires many calls to the LLM inference endpoint. If this endpoint is slow or unreliable, the caller might receive a response immediately or wait an eternity for a response, and as a consequence, have a really inconsistent experience. Optimizing for speed and reliability are core to OpenCall’s architecture.

Solution

OpenCall achieved significant improvements to speed and reliability after switching to Cerebras’ LLM inference API.

While OpenCall could technically perform tasks like collecting and formatting street addresses in one shot, breaking it into atomic operations (street, city, state, etc.) provides bulletproof reliability. Before Cerebras, this kind of granular processing would’ve resulted in long processing times, but now they can run extensive validation chains without users even noticing the complexity.

With Cerebras, OpenCall reduced overall latency by a staggering 90%, enabling customers to get faster answers and a more reliable experience every time they call in. Resolutions which once took minutes now happen in seconds. This all ultimately contributed to customer satisfaction scores jumping up by 20%, from 7.5 to 9.4.

With reliable and fast inference from Cerebras, OpenCall continues to redefine what’s possible in the world of customer service. Whether it’s booking an appointment, rescheduling a meeting, or answering questions, OpenCall makes intelligent, real-time support effortless for businesses and their customers alike.

Experience Cerebras’s super fast inference speeds at cloud.cerebras.ai.