Workflows At Scale: How Conversion Powers Marketing Automation

Workflows are the new chat. Workflows are fundamentally replacing the way enterprise teams interact with AI and unlock advanced reasoning. At Conversion, we are building the infrastructure for marketers to easily build and run these workflows at massive scale – millions of contacts, hundreds of automations with no room for error. It’s a hard problem with deep technical challenges, and we want to share how we’re tackling it.

‍

This feature has three main components:

The UI Workflow builder
Workflow Triggering
Workflow Execution

‍

The UI builder and workflow triggering deserve their own blog, so we will only focus on the 3rd component - actually running a workflow for a contact.

The Problem Statement

Each Conversion workflow that you see in the UI is stored as a list of nodes and edges, so executing the workflow is as simple as moving through the DAG, executing nodes.

‍

func RunAutomation(ctx workflow.Context, nodeId, contactId string) (err error){
	for {
    	// Fetch the node
        node := fetchNode(nodeId)‍
        
        // Execute the node
        nodeId, err := ExecuteNode(ctx, input.ContactId, node)
        }
}

‍

What makes executing the nodes challenging is three functional requirements:

Users can interact with running workflows: pausing and updating the configuration of each node.
The “wait” node requires that this execution has to be able to run for weeks or even months.
The “wait for” node requires that this execution can wait forever, and continually check a condition, only continuing when the condition is met.

‍

Your first instinct might be a job system – using a cron job and some microservices to maintain state while kubernetes or airflow jobs run. But in reality any job-based solution is already ruled out due to the first requirement.

‍

With short lived jobs we are okay with the code used to run the job being pulled at the beginning, basically taking a snapshot of the binary. However, with long lived jobs exposed to users like this, we cannot limit the workflow builder to only support running code that was present when the execution first started. We found an underlying devops requirement: the jobs underlying code needs to be able to change while it is running.

‍

Introducing: Temporal

Luckily, this problem has already been tackled by some of the biggest tech companies out there – and most recently, by Temporal, an open-source workflow engine.

The key idea behind Temporal is to abstract away the concept of a long-lived job so that you can change the code underneath without breaking ongoing work. Temporal does this by breaking a long job into a series of smaller, independent steps called “activities” (things like API calls, database writes, etc.) and separating them from the surrounding workflow logic that decides what happens next.

Here’s the magic part: Temporal records the results of each activity. So, if your workflow is paused — maybe waiting days, weeks, or even months — you can “resume” it simply by replaying the logic. Temporal re-runs your workflow code from the beginning, but instead of actually calling the APIs again, it just returns the same results from before.

Imagine you have a workflow that runs Activity A, then B, then C, and then waits five days. When you come back later, Temporal replays A → B → C instantly using the stored results, skips the waiting period, and picks up right where it left off.

This makes Temporal agnostic to the underlying binary — it doesn’t care about your code version as long as the timeline of inputs and outputs stays consistent. As long as your workflow is deterministic (same inputs → same outputs → same decisions), the replay will be identical. This is crucial for our devops requirement that a cron job simply couldn’t solve for.

Temporal also comes pre-built with a signaling system, allowing us to interact with running workflows really easily. We can signal all running temporal workflows that the user wishes to pause execution, or that the user updated a node's data. This solves our first functional requirement.

‍

Temporal timers solve for the second functional requirement as they allow us to sleep for an arbitrary amount of time, replaying that event history we talked about when the timer goes off. We had to wrap this using the signal system for a pausable timer. This interface is shown below where our timer can handle UI state changes.

‍

type PausableTimer[T NodeData] interface {
	Update(ctx workflow.Context, node TypedNode[T])
	Pause(ctx workflow.Context)
	Resume(ctx workflow.Context)
	IsComplete() bool
	Complete(ctx workflow.Context)
	Listen(ctx workflow.Context, selector workflow.Selector)
}

‍

We can now handle when the user updates the node, pauses the workflow, resumes etc. for different timer nodes. So now users and devs can change how this node works in real time.

A final functional requirement also leverages the signaling system. Workflows can include logic such as “wait until a contact’s record is updated,” and must resume once that change occurs. To support this, we’ve implemented an event notification mechanism that alerts all relevant workflows whenever an activity takes place. When notified, each workflow wakes up, evaluates whether its continuation condition is met, and, if not, returns to a waiting state until it is.

‍

Hosting

‍

Temporal is written in go with a nice SDK that plays nice with our backend. We host our own instance on a postgres database, which has been super easy to manage with the open source helm chart and has given us exactly the kind of scale we require.

‍

There are tons of limits and guardrails hidden all over Temporal but after finding these gotchas we have been very happy with this service in general. We deployed elastic search with custom search attributes for easy debugging and exposed the UI behind google authentication for easy access by our backend team. A key decision we made here was deploying a different worker for each task queue, so that if a task queue gets bogged down, other workers will continue to process.

‍

Drawbacks and Challenges

‍

Temporal (and similar workflow engines) solves one of the hardest problems in distributed systems, reliably managing long-lived processes, but it comes with its own set of challenges. The biggest hurdle is enforcing backward compatibility across workflow versions. A single non-deterministic code change or incompatible deployment could break every running workflow in production. We’ve built a rigorous history replay testing system to constantly sample and replay current histories to catch these issues early, but this area will demand even more infrastructure investment as we scale.

Another pain point is Temporal’s signal system, which isn’t designed for the high-throughput signaling we need. Signal rate limits make large-scale fan-out difficult, and the batch signal API proved too slow and limited in parallelism. To work around this, we use Kafka to stream signals efficiently, both for workflow triggers and CRM event propagation, while caching workflow configurations to determine when a signal actually needs to be sent. This ensures we stay accurate without overloading the system.

Looking ahead, we’ll face new scaling challenges: managing more task queues, namespaces, and workers will require full infrastructure-as-code automation. We’ll also need reliable recovery systems for failed workflows, and worker versioning to reduce risk during large updates. Most of our future optimization will happen at the event-sending layer: making signaling faster, more selective, and more reliable so that workflows start and progress exactly when and how they should.

If any of these challenges sound interesting to you, Conversion is hiring! We’re constantly working to build new features, improve reliability, and scalability. Please reach out if you’d like to join us on the mission of building the future of marketing.

‍

More from the Wayfinder

Workflows At Scale: How Conversion Powers Marketing Automation

Tayler Dunn

Founding Engineer

October 9, 2025

Service Framework: The Journey Behind Service to Service Communication at Conversion

Jimmy Li

Platform Engineer

October 6, 2025

Behind the Curtain: How Conversion Syncs Millions of Customer Records a Day

Charlie He

Head of Engineering

October 6, 2025

Turn Every Form Fill-Out Into Your Next Customer

Trigger personalized emails and actions based on real-time behavior, not static lists.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Please enter your work email

Workflows At Scale: How Conversion Powers Marketing Automation

This feature has three main components:

The Problem Statement

Introducing: Temporal

Hosting

Drawbacks and Challenges

Related articles

More from the Wayfinder

Turn Every Form Fill-Out Into Your Next Customer