29987
Education & Careers

Billing Slowdowns? A Hidden ClickHouse Bottleneck and Three Critical Fixes

Posted by u/Tiobasil · 2026-05-19 01:10:38

Introduction

At Cloudflare, every day we run millions of queries against ClickHouse to calculate usage-based billing for our customers. This pipeline handles hundreds of millions of dollars in revenue and also powers fraud detection systems. When a routine migration caused daily aggregation jobs to slow down significantly, it threatened to delay invoices and create reconciliation nightmares. We investigated typical culprits—I/O, memory, rows scanned, parts read—but everything appeared normal. The real culprit was a subtle, hidden bottleneck deep within ClickHouse’s internals. Here’s how we found it and the three patches we wrote to fix it.

Billing Slowdowns? A Hidden ClickHouse Bottleneck and Three Critical Fixes
Source: blog.cloudflare.com

The Ready-Analytics Platform: Petabyte-Scale Data, Simple Onboarding

ClickHouse powers dozens of clusters storing over a hundred petabytes of data. To reduce complexity for internal teams, we built “Ready-Analytics” in early 2022. Instead of designing custom tables, teams stream data into a single massive table. Datasets are identified by a namespace, and every record follows a standard schema: 20 float fields, 20 string fields, a timestamp, and an indexID.

ClickHouse’s performance depends heavily on how data is sorted. The primary key for this table is (namespace, indexID, timestamp). The indexID (a string) is part of the primary key, allowing each namespace to optimize its own sort order for the queries it runs. By December 2024, Ready-Analytics had grown to over 2 PiB of data, ingesting millions of rows per second. But the system had one critical flaw: a rigid retention policy.

The One-Size-Fits-All Retention Problem

Cloudflare has used ClickHouse since before native Time-to-Live (TTL) features existed. We built our own retention system based on partitioning: the Ready-Analytics table was partitioned by day, and a retention job simply dropped partitions older than 31 days. This “31-day-for-all” approach was a major limitation. Some teams needed to store data for years (due to legal or contractual obligations), while others needed only a few days. Because of this restriction, many teams couldn’t use Ready-Analytics and were forced into a conventional, more complex onboarding process. We needed a system that allowed per-namespace retention.

The Slowdown Investigation: When Nothing Looks Wrong

Following the migration, we noticed that daily aggregation jobs (which ensure bills go out) had become painfully slow. We checked all the usual suspects: I/O utilization was normal, memory pressure was low, rows scanned and partitions read were within expected ranges. Everything looked clean—yet queries were taking orders of magnitude longer. This was a classic case of a hidden bottleneck. It turned out that the per-namespace retention requirement had introduced a new, non-obvious overhead in ClickHouse’s query planning and execution.

A deep dive revealed that the bottleneck was buried in the way ClickHouse handles primary key indexes for tables with many distinct namespaces and frequent partition drops. The combination of a large, high-cardinality string field (indexID) and daily partition deletions caused index traversal to become extremely expensive. We isolated three specific areas and wrote patches to address them.

Billing Slowdowns? A Hidden ClickHouse Bottleneck and Three Critical Fixes
Source: blog.cloudflare.com

Three Patches to Unblock the Pipeline

Patch 1: Optimizing Index Lookup for High-Cardinality Keys

The first patch improved the efficiency of index lookups when the primary key includes a high-cardinality string field. Instead of scanning all index marks for a given namespace, we introduced a skip list that can quickly isolate relevant ranges. This cut index traversal time by over 80% for queries with a specific namespace.

Patch 2: Improving Partition Pruning After Dropped Partitions

Dropping partitions (as required by per-namespace retention) left behind metadata that confused ClickHouse’s partition pruning logic. The second patch ensures that after a partition drop, the system immediately recomputes partition boundaries for the affected namespace, reducing unnecessary scans of empty or irrelevant partitions.

Patch 3: Reducing Lock Contention in Background Merges

The third fix targeted lock contention during background merges. With many namespaces and frequent partition operations, the merge scheduler was spending excessive time waiting for locks on small parts. By introducing a more granular locking scheme, we allowed merges to proceed in parallel without blocking query execution.

Conclusion

These three patches restored the billing pipeline’s performance and, in some cases, made it even faster than before the migration. More importantly, they enabled per-namespace retention in Ready-Analytics, unlocking the platform for dozens of new use cases. The experience reinforced a key lesson: sometimes the most elusive bottlenecks hide not in obvious resources like I/O or memory, but in the subtle interactions between data characteristics and query engine internals. For any team running large-scale ClickHouse workloads, it’s worth looking beyond the usual metrics when performance degrades—especially after schema or retention changes.