Performance

eBPF Overhead in Production Kubernetes: What the Benchmarks Actually Show

April 2, 2026 Ben Calloway 7 min read

eBPF probe CPU and memory overhead benchmark results on production Kubernetes c5.xlarge nodes

Every time we demo Kubesentry to a new engineering team, someone on the call asks the same question before we get to threat detection. "What's the overhead?" Fair question. eBPF tools have a complicated reputation on performance, and it's worth addressing directly.

Here's the thing: most of the overhead anxiety comes from comparing the wrong tools. People conflate eBPF kernel probes with sidecar agents, with ptrace-based tracers, with strace, with full packet capture. Those are not the same thing. They don't cost the same thing. And the benchmarks telling you "eBPF is expensive" are often measuring a different architecture entirely.

Three Architectures, Three Very Different Cost Profiles

In our experience evaluating Kubernetes runtime security options, there are three distinct instrumentation models teams run into:

ptrace-based tracers

The oldest model. ptrace is a Linux syscall that lets one process observe and control another. Security tools built on ptrace attach to target processes and intercept system calls by stopping execution, reading the call, then resuming. The per-syscall context switch cost is real and non-trivial. In workloads with high I/O or network call volume, we've seen ptrace-based tools add 8-15% CPU overhead per traced container. At cluster scale, that's budget you can't get back.

Sidecar agents

The cloud-native era's answer to ptrace was to inject a lightweight container into every pod that collects telemetry from application-layer instrumentation. This avoids the ptrace penalty but adds a different cost: memory per pod, CPU for the agent process, and latency in the data path if the sidecar is in-process. Across a cluster running 150 pods, even a 20MB-per-sidecar baseline adds 3GB of resident memory before any application workload runs. And some sidecar models require container image modifications or admission webhooks that break existing deployment pipelines.

eBPF CO-RE probes

The model Kubesentry uses. eBPF programs run inside the Linux kernel's sandboxed execution environment. CO-RE (Compile Once, Run Everywhere) means the probe compiles once against the kernel's BTF type information and runs natively on any Linux kernel 5.8 or higher without per-host recompilation. One DaemonSet pod per node, not per application pod. No process injection. No sidecar. The probe hooks into the kernel system-call interface, captures the event, and hands it off to user-space within 50 milliseconds. The kernel does the work it was already doing; the eBPF probe just reads along.

What We Actually Measured

Our data shows consistent results across c5.xlarge equivalent nodes (4 vCPU, 8GB RAM) running production-representative Kubernetes workloads. Specifically: a mix of HTTP API services (high network call volume), batch processing jobs (high I/O), and background queue workers (moderate syscall rate).

Instrumentation Model	CPU Overhead per Node	Memory Footprint	Requires Image Change?
ptrace-based tracer	8-15% per traced container	Low (host-side only)	No
Sidecar agent	1-5% per pod	15-40MB per pod	Often yes (or admission webhook)
eBPF CO-RE (Kubesentry)	<1.2% per node	<80MB per node (DaemonSet only)	No

The <1.2% CPU figure and <80MB RAM per node are numbers we've validated on our own test clusters and continue to check on every release. These cover the full Kubesentry probe stack: syscall capture, event filtering, user-space analysis pipeline, and outbound alert delivery. Not a stripped benchmark. The whole thing.

Why Benchmark Methodology Matters

Here's where teams get tripped up. A lot of published eBPF overhead benchmarks are measuring strace-style full syscall tracing: capturing every single syscall from every process with no filtering. That's an intentionally adversarial workload for any kernel tracer. No production security tool does this.

What matters is filtered event capture. Kubesentry doesn't record every read() call from your nginx workers. It captures the specific system calls relevant to security: exec, connect, open on sensitive paths, clone, ptrace (the syscall, used by attackers), mount, and service account token access patterns. High-volume benign calls are filtered at the kernel eBPF probe before they ever reach user-space. The event rate you're paying to process is the security-relevant subset, not the full syscall stream.

Before choosing any runtime security tool, these are the questions worth asking:

Is the benchmark measuring filtered events or full syscall capture?
What node size and workload mix was the test run on?
Is the overhead figure per-node or per-pod? (They compound very differently.)
Does the benchmark include the full analysis pipeline or just raw capture?
What kernel version was tested? Pre-5.8 kernels lack BTF support and require fallback mechanisms with different cost profiles.

The Comparison That Actually Matters

Teams sometimes frame the overhead question as "eBPF vs. nothing." That's the wrong comparison. The actual question is: what's the cost of an undetected runtime incident?

In our tracking of publicly documented Kubernetes security incidents, a cryptomining deployment running undetected in a misconfigured namespace costs an average of 11 days of cloud compute before a billing anomaly surfaces it. At current EC2 pricing for GPU-capable instances, an attacker running a medium-scale xmrig operation can generate $4,000-$9,000 in compute charges before any existing cloud-provider alert fires. That's before accounting for the lateral movement risk if the namespace had access to secrets in adjacent workloads.

Against that baseline, a <1.2% CPU overhead per node is not a cost. It's a hedge.

What to Measure Before You Deploy

Honestly, don't take any vendor's benchmark numbers uncritically, including ours. Here's a methodology we recommend for your own evaluation on your own infrastructure:

Baseline your current node CPU and memory utilization over a 72-hour window covering both peak and off-peak traffic. Use Datadog or Prometheus. Record p50, p95, and p99 CPU.
Deploy the DaemonSet to a single node in a non-production namespace first. Monitor for 24 hours against your baseline.
Run a load test against a workload on that node at 90% of your normal production throughput. Measure application latency (p99) and node CPU side-by-side. The latency delta should be within measurement noise.
Check event latency end-to-end: trigger a known-suspicious action (spawn a shell in a running container via kubectl exec) and measure the time from exec to alert delivery in your SIEM. Our target is 50ms at the kernel capture layer; add your SIEM ingest latency on top.
Review the DaemonSet resource requests and limits in the deployment manifest. Any well-built eBPF security tool should have explicit CPU and memory limits set. If limits aren't set, the tool has no production deployment discipline.

We've run that evaluation process with teams deploying Kubesentry for the first time. In most cases, the measured overhead on their actual workloads lands below what our published numbers show, because their syscall event rates are lower than our benchmark workload mix. Real production is often less noisy than a benchmark tuned to stress-test the tracer.

The Practical Takeaway

eBPF CO-RE probes are not free. Nothing is free. But they are the most efficient instrumentation model available for Kubernetes runtime security today, by a significant margin over sidecar agents and by a factor of 5-10x over ptrace-based approaches in high-throughput workloads.

The <1.2% CPU and <80MB RAM figures are the ceiling on a production-representative node, not a best-case number pulled from a synthetic benchmark. We publish them because we've verified them, and we think teams making runtime security decisions should have real data to work with, not marketing copy.

If you want to run the evaluation on your own cluster, the Kubesentry DaemonSet deploys in about 10 minutes and doesn't require any container image changes or application restarts. Your existing workloads keep running. You just start getting visibility into what they're doing at the kernel level.

Ready to see what eBPF telemetry looks like on your own cluster? Request a demo and we'll walk through the deployment on a test node together.