adjoe Engineers’ Blog

/ Infrastructure / Scaling Our Infrastructure for Super-Bowl Traffic

Infrastructure

Breaking the Scale Barrier: Preparing Our Infrastructure for SuperBowl Traffic

Marius Kimmina & Farhad Farahi & Tadeh Alexani & Maaz Malik & Ahmed Grati

March 30, 2026

When millions of users surge during a Super Bowl ad, autoscaling alone isn’t enough. Your infrastructure must be resilient and scalable to handle massive traffic spikes.

This is the story of how adjoe’s architecture went from 1,500 requests per second (RPS) to 100,000 and how we ensured our backend would live to tell the tale.

Last year, one of our publishers reached out to us with an interesting challenge. They would be running an ad during the Super Bowl, the largest sporting event in the US, and our backend would need to be able to handle that amount of traffic.

In this post, we’ll share how we scaled adjoe’s AWS infrastructure to handle 100k RPS and the lessons you can apply to prepare for high-traffic events.

How Big is the Super Bowl Traffic?

Let’s talk numbers. The Super Bowl in 2025 had 128 million viewers. Our app publisher expected to collect around 10 million users from their ad campaign and planned to funnel them onto our backend.

That translates to roughly 100,000 RPS we would need to handle continuously for several minutes.

To put those numbers into perspective, we also need to look at how many requests we usually handle on a normal day. It shows whether it’s just another day at the job or if it’s gonna hit us like a truck.

Normal Traffic vs. 100x Super Bowl Traffic

Test AWS Infrastructure for Super-Bowl Sized Traffic - Scale Infrastructure for High Traffic

As you can see, on a “business as usual” day, we handle somewhere from 1,400 to 1,600 RPS.

During the Super Bowl event, we would need to handle 100,000 RPS; that’s not a 2x or 5x increase, it’s nearly 100x our normal traffic. We seriously needed to start preparing our systems immediately.

While all of our services are generally able to automatically scale out when traffic starts to increase. This mechanism only works for a gradual increase, where the system has time to adapt.

In this case, our systems would be down, and the event would be a failure before auto-scaling would even start to kick in. We needed to manually scale up our systems beforehand to be ready to handle the traffic.

Which leads to the question: how do we know that we are ready?

The Preparations Begin

We had a little over three months before the Super Bowl would take place, and by then, we had to be ready to handle 100x our usual RPS. We already figured that we would not be able to rely on auto-scaling to handle this. We had to provision the resources we would need beforehand.

Now the question is: what resources do we need? There was only one way to know for sure.

Testing: Simulate Super Bowl Traffic Using k6 and Grafana

Being confident that our application could handle this amount of traffic is great, but we needed to be sure and to prove to our publisher that we would not falter during this event.

We decided to use k6, it’s a tool that simulates traffic from a lot of users and is relatively resource-efficient. You can actually generate a surprising amount of traffic from just one person’s laptop, but we still hit the limits of that during our Super Bowl preparation.

That’s where the k6 operator came in handy; it allowed us to run multiple instances of k6 on our Kubernetes cluster. With this setup, there was basically no limit to how much traffic we could generate.

We also hooked up the operator to Grafana and created a neat dashboard to visualize how our application is performing during the loadtests. It immediately shows us if response times start to drop or error rates increase.

AWS Infrastructure for Super-Bowl Sized Traffic - Scale Infrastructure for High Traffic

With this setup, we could now enter a stage of rapid iteration where we would run a test, see which part of our Infrastructure fails to handle it, fix the bottleneck, and run the next test.

Over the 3 months before the event, we went through countless iterations of these tests, and the test-fix-retest cycle became our daily routine.

All that was to ensure that we had found every last potential bottleneck before the real event.

Testing AWS Infrastructure for Super-Bowl Sized Traffic - Scale Infrastructure for High Traffic - AWS Load Balancing, EC2, SNS, DynamoDB

In doing so, we identified the following 4 parts as the main bottlenecks we needed to improve on.

Load Balancing – ALB
Compute Resources – EC2
Message Bus – SNS / SQS
Data Storage – DynamoDB

Let’s take a deeper look at the actions we had to take for each component to have them ready for the event.

1. Load Balancing the AWS Load Balancer

For the most part, Application Load Balancer has been a component of our system that “just worked”. But when we started running our load tests, we actually noticed that not only our backend services could be overwhelmed, but also the load balancer itself.

To understand why, it helps to know how AWS measures ALB capacity.

AWS uses something called Load Balancer Capacity Units (LCUs), which are calculated across four dimensions: new connections per second, active connections per minute, processed bytes per hour, and rule evaluations per second.

Under normal traffic, a single ALB comfortably handles our load, but at 100K RPS, the LCU requirements skyrocket. And just like every other AWS resource, ALBs scale behind the scenes and cannot keep up with an instant traffic spike.

Which raises the question: how do you load balance the load balancer?

The answer is to do it on the DNS level. Users trying to reach our backend first reach out to Route 53 to resolve our domain name to the IP address of a load balancer.

What we can do is configure Route 53 with weighted routing to answer with different IP addresses that belong to different load balancers, splitting the traffic between them.

This is usually referred to as load balancer sharding.

Test AWS Infrastructure for Super-Bowl Sized Traffic - Scale Infrastructure for High Traffic - Amazon Route 53

Based on our stress test measurements, we determined that we needed three ALBs to handle the expected traffic. As you can see from the diagram above, users are split equally between the three instances of our load balancer, each with its own set of registered ECS targets behind it. One more thing: just adding more ALBs isn’t enough on its own.

We also had to pre-warm each load balancer by reserving LCUs ahead of time. Without pre-warming, the ALBs would start cold and need time to scale up internally, exactly the kind of delay we couldn’t afford.

AWS lets you reserve capacity for anticipated traffic spikes, which ensures the load balancers are ready to handle full throughput from the very first request.

2. EC2 Compute Resources

Usually, we run ~90-100% of our backend on spot instances (yes, even Kafka), depending on the availability of AWS spot instances at the time.

💡 Learn more about how we use spot instances with Karpenter at adjoe’s talk at SREday 2025.

For traffic from the Super Bowl ad, we would need a lot more computing resources than usual, though, which raised two concerns:

There won’t be enough spot instances available to run our backend at the scale we need
It could even happen that there aren’t enough instances available in the eu-central-1 region at all

We had several calls with AWS in preparation for this event. While they assured us that they had enough instances available, we had already scaled up two days in advance on Friday, to the level that we would need for the event.

This way, we knew that we would be able to get all the necessary resources and still have time for any last adjustments if required.

Of course, this means that our backend was running at Super Bowl scale the whole weekend, using only on-demand nodes rather than our usual spot instances.

This cost us a lot more than simply scaling up a few minutes before the event. But it’s a trade-off you have to make between cost and stability. When the alternative is your system falling over in front of millions of users, the extra cloud spend is easy to justify.

3. Resolving Communication Bottlenecks

Our backend services communicate via SNS/SQS, and throughout our testing cycles, we repeatedly found ourselves hitting AWS quota limits.

Every time that happens, we go and request an increased quota from them, only to hit the new limit soon again.

To really resolve this situation, we had to improve our backend services to batch messages more efficiently.

We went from sending mostly single messages and small batches to consistently using the SendMessageBatch API to its full potential.

It’s packing up to 10 messages per API call and compressing the message payloads. This drastically reduced the number of API calls we were making for the same volume of events.

Once we had proper batching and compression of messages, we were actually able to serve the amount of traffic we wanted without getting close to our new quota limit.

The key takeaway here is that before blindly requesting more resources, we also need to pause and think about how we can do more with the resources we have.

4. Avoiding Hot Partitions and Pre-warming the DynamoDB tables

The last part I want to focus on is DynamoDB. During our load tests, we quite frequently ran into problems with throttling on DynamoDB, which would then slowly grind our backend to a halt.

What you need to know first is that DynamoDB can be run in two different modes.

On-Demand
Provisioned

In On-Demand mode, you have (kind of) limitless capacity, whereas in Provisioned mode, you have to set limits yourself for how much read & write capacity you’ll need.

The reason people choose Provisioned mode over On-Demand is because of the cost. On-Demand mode is roughly 5–7x more expensive than the provisioned per-unit price.

Throttling on DynamoDB occurs when you consume more capacity units than you have available, be it read capacity units (RCU) or write capacity units (WCU).

For this event, we ran some tables in On-Demand mode and kept others in Provisioned mode, depending on how critical and unpredictable their access patterns were.

But regardless of which mode a table was in, we ran into a problem that neither mode protects you from automatically: hot partitions.

DynamoDB Partitions and How Data Is Distributed

To understand this, you need to know a little about how DynamoDB stores data under the hood. DynamoDB spreads your data across multiple partitions, and the partition key of each item determines which partition it lands on.

DynamoDB Partitions - Test AWS Infrastructure for Super-Bowl Sized Traffic - Scale Infrastructure for High Traffic

Each partition can independently handle up to 1,000 WCU and 3,000 RCU. So even if your table as a whole has plenty of capacity, a single partition can become a bottleneck if too many requests target the same partition key.

That’s exactly what happened to us. During our stress tests, we noticed that one particular table kept throttling, even though its overall capacity looked fine. The tricky part is that AWS doesn’t expose per-partition metrics by default, so from the standard CloudWatch dashboard, everything appeared healthy.

It wasn’t until we enabled CloudWatch Contributor Insights on the affected table that we could see which specific partition keys were being hammered.

The culprit turned out to be a publisher-specific configuration record. Since we were simulating traffic for a single publisher, every one of those 100K virtual users per second was reading the same config entry, funneling all those reads into the same partition.

Rather than trying to work around DynamoDB’s partition limits, we moved this configuration data into ElastiCache (Valkey).

This solved the problem on multiple levels: it eliminated the hot partition, and it actually improved our application’s response times. Since reading from an in-memory cache is significantly faster than a DynamoDB read.

To keep things consistent, we set a TTL on the cached config entries. If a config changes, the stale entry gets evicted after a short window, and subsequent requests pick up the updated version. For our use case, this eventual consistency was perfectly acceptable.

Even after solving the hot partition issue, we still needed to deal with another subtlety: DynamoDB itself scales behind the scenes.

Just like our ALBs, if you suddenly hit a DynamoDB table with a massive spike in traffic, the underlying infrastructure may not scale fast enough. DynamoDB supports what it calls warm throughput.

When you pre-warm a table, you are essentially forcing DynamoDB to split its partitions in advance. By increasing the capacity requirements, DynamoDB allocates more physical storage nodes and spreads your data across them, ensuring that no single partition becomes a bottleneck during the actual event.

High-traffic Ready for Super Bowl Sunday

At this point, you know how much effort we put into our preparations for this event, how we went through an insurmountable amount of testing cycles to make sure we found every last bottleneck we possibly could. But now you might wonder – was it all worth it?

The Super Bowl kicked off at 6:30 PM Eastern Time, which meant it was 12:30 AM in Hamburg. By 4 AM, our entire team was in the office, eyes on dashboards, waiting for the ad to drop.

And then it did. Traffic spiked to 100,000 RPS. Our P99 response time stayed at 30–50 milliseconds. No throttling, no crashes, no panicked messages. The system just… handled it.

All the preparation, all the stress tests, all the late nights debugging hot partitions and tweaking batch sizes – it paid off. The backend didn’t even flinch.

What We Learned

If there’s one thing I’d want anyone to take away from this, who is preparing for a large-scale event, it’s that you cannot rely on auto-scaling alone when you know a massive traffic spike is coming.

Reactive scaling is great for organic growth and the occasional promotion, but when you’re looking at a 100x increase hitting you all at once, you need to be proactive.

Here’s what worked for us:

Start early. We had three months, and it was barely enough. Every round of stress testing uncovered new bottlenecks that took time to fix and then retest.
Stress test relentlessly. You don’t know what will break until you break it. The test-fix-retest loop was the single most valuable thing we did.
Invest in observability. Without proper metrics, we would never have found issues like the DynamoDB hot partition. You can’t fix what you can’t see.
Think about efficiency, not just capacity. Requesting higher quotas from AWS is the easy answer. Batching and compressing our SQS messages was the better one.
Accept the cost trade-off. Running on-demand instances for an entire weekend isn’t cheap. But when the alternative is failing in front of millions of users, it’s an easy call.

EndNote

In the end, this wasn’t just a story about scaling infrastructure. It was about our team coming together, working through problems methodically, and not stopping until we were confident the system could handle whatever came its way.

For more behind-the-scenes stories on how we build for scale, check out the adjoe engineer blog.

Build products that move markets

Role

Team

Location

Your Skills Have a Place at adjoe

Find a Position

Summarize at

ChatGPT Perplexity Grok Co-pilot Claude