eBPF and Kubernetes Runtime Security: What Mid-Size SaaS Teams Need to Know

eBPF kernel probe architecture diagram for Kubernetes runtime security

Kubernetes gave mid-size SaaS teams the ability to run production workloads at a scale that would have required a dedicated infrastructure team five years ago. That is genuinely good. What it also did, though, is create a detection gap that most teams do not discover until they are already in the middle of an incident.

At Kubesentry, we talk to engineering and DevSecOps leads from SaaS companies in the 15-150 engineer range every week. The same pattern comes up. They have a CSPM tool. They have cloud-provider security scores. They might even have Falco running somewhere, generating noise no one has time to tune. And they have zero visibility into what is actually running inside their containers right now.

This post explains what eBPF-based runtime telemetry is, why it matters specifically for Kubernetes, and what a mid-size team should think about before deploying it.

Why Configuration Scanners Miss the Attack

CSPM tools are good at what they do. They tell you that your S3 bucket is public, that your security group allows 0.0.0.0/0 on port 22, that your EKS nodes have not been patched. Configuration drift detection is valuable.

The problem is that runtime attacks do not look like configuration drift. An attacker who deploys a cryptominer inside a legitimately configured container, under a valid service account, using a real container image that passed your registry scan, is invisible to a configuration scanner. The configuration is fine. The behavior is not.

In our experience, this is exactly how the worst incidents unfold. The container image is clean. The RBAC policy is technically correct. The namespace has no obvious misconfigurations. And the attacker is running xmrig inside your data-processing workload, pinning two CPU cores, and slowly listing Secrets in adjacent namespaces.

Mean time to detect a cryptomining deployment in an unmonitored namespace is 11 days when billing anomalies are the primary signal. Eleven days. That is not a detection gap; that is an open door.

What eBPF Actually Does at the Kernel Level

eBPF (extended Berkeley Packet Filter) is a Linux kernel feature that lets you attach small programs to kernel events without modifying kernel source code or loading a kernel module. It has been shipping in production Linux kernels since 3.15, but the modern safety-checked, CO-RE (Compile Once, Run Everywhere) variant that makes it practical for production security use requires Linux kernel 5.8 or higher.

For Kubernetes runtime security, the relevant kernel events are system calls. Every exec, connect, open, clone, mount, and ptrace call issued from inside any container on a node passes through the kernel. An eBPF probe attached to those syscall entry and exit points can record who made the call, what arguments were passed, and what the result was, in near real time.

No sidecar container. No application-level agent. No container image modification. Nothing that requires a pod restart.

This matters a lot in practice. The sidecar model, which some earlier container security tools used, requires injecting a container into every pod spec and convincing your cluster administrators that a third-party process should run alongside every application. That is a hard organizational sell. eBPF via DaemonSet is a single deployment per node, owned by the security team, invisible to application developers.

The Behavioral Baseline: What Normal Looks Like

Raw syscall streams are noise. A busy Kubernetes node generates thousands of system calls per second across dozens of containers. Without a model of what is expected, every event is meaningless.

The useful signal comes from behavioral profiling: learning what a specific workload normally does, then flagging deviations. During an initial 7-14 day learning window, we build a profile for each Deployment, DaemonSet, or StatefulSet in your cluster. The profile captures the typical syscall patterns, the expected outbound network destinations, the process-ancestry trees (which parent process spawns which child), and the Kubernetes API calls made by the workload's service account.

After that baseline is established, the detection model looks for deviations, not just known-bad signatures. A web-tier pod that has never made an outbound connection to an IP outside your AWS VPC suddenly opening a connect() to an external IP is anomalous. Full stop. Whether or not that IP is on a threat-intelligence blocklist. Whether or not the pod's image has a known CVE.

Fact: signature-only detection misses novel attack variants almost by definition. Behavioral detection catches them because the behavior is wrong, regardless of whether the threat actor's tooling is already documented.

What Changes When a New Image Version Deploys

One concern teams raise when we explain behavioral baselining: what happens at deploy time? If you push a new container image every day, does the baseline become stale overnight?

Kubesentry watches the Kubernetes API for deployment rollouts and automatically transitions the affected workload into a short re-learning window when a new image version is detected. The old baseline is retained for comparison during the transition, so you get alerts on the new version's deviations, not false positives caused by the version change itself. Once the new image's behavior stabilizes, the baseline updates. The whole process takes 2-4 hours for a typical workload, not another full 14-day window.

MITRE ATT&CK for Containers: Why Alert-Time Context Matters

Here is the thing about runtime alerts without tactic context: they require your analyst to do the kill-chain mapping manually, under pressure, during an active incident.

Every alert Kubesentry fires includes the applicable MITRE ATT&CK for Containers tactic and technique, computed inline during detection, not added as a post-processing enrichment step. So when a kubectl exec drops an interactive shell into a production pod, the alert that lands in your Slack or Datadog channel says: Execution / T1609 (Container Administration Command), pod name, namespace, node, and the syscall sequence that triggered it.

That context is not cosmetic. It tells your on-call engineer whether this is an isolated exec anomaly or the beginning of a lateral-movement chain. It lets you write tactic-level correlation rules in your SIEM without a separate enrichment pipeline. And it means a DevSecOps team of two can maintain detection coverage across the full ATT&CK matrix without a dedicated threat-analyst role.

We've seen teams go from "we have no idea what tactic classification our Kubernetes alerts map to" to "we have coverage gaps documented against the ATT&CK matrix" in the first week after deployment. That is a real shift in posture.

Service Account Abuse: The Kubernetes-Specific Attack You Probably Are Not Watching

Service accounts are the identity system inside Kubernetes. Every pod gets one. Most teams set up RBAC reasonably and then stop thinking about service accounts, because RBAC policies do not change that often.

What changes constantly is how pods use those service accounts at runtime. A compromised application container can use its mounted service account token to make Kubernetes API calls that are technically within the account's RBAC permissions but are far outside what the workload normally does. A frontend-api pod that lists Secrets in the database namespace has valid credentials for that operation if someone misconfigured the RoleBinding. But it has never done that in six months of production traffic. That deviation is the signal.

Container-escape-to-cluster-admin privilege escalation takes an average of 4-8 minutes in an unmonitored Kubernetes environment. By the time that shows up in your cloud provider's audit logs, the attacker has already pivoted. Runtime correlation between syscall events and Kubernetes API audit logs catches the escalation chain as it is forming, not after the fact.

Practical Deployment Notes for Mid-Size Teams

A few things worth knowing before you start a proof of concept.

  • Node requirements: Linux kernel 5.8+ for CO-RE probes. Amazon Linux 2023, Ubuntu 22.04, and Debian 12 all meet this requirement out of the box. Amazon Linux 2 (AL2) does not; if you are running AL2 node groups on EKS, plan a node group migration as part of your rollout.
  • Overhead: Our benchmarks show under 1.2% CPU and 80MB RAM on a c5.xlarge-equivalent node under typical SaaS production workloads. That is the overhead of the DaemonSet probe itself, not the total observability stack. If you are running close to node capacity, account for it.
  • Baseline window: Plan 7-14 days before you start tuning alert thresholds. Deploying on a Monday and expecting tuned alerts within the same week is unrealistic. Deploy before a quiet period, if possible, to get a clean baseline uncontaminated by deployment churn.
  • Alert routing: Decide before you deploy which alert classes go to PagerDuty versus Slack versus SIEM-only. Not every MITRE ATT&CK tactic warrants a 3 AM page. Privilege Escalation and Lateral Movement should. Discovery and Execution require analyst review but not necessarily an immediate page.

Honestly, the hardest part of deploying eBPF-based runtime detection is not the technical setup. It is aligning your team on what a runtime alert means and who is responsible for responding to it. That organizational question should be resolved before the first alert fires.

The Gap Nobody Budgets For

Our data shows that 60-75% of runtime security events at mid-size SaaS companies are first discovered during post-incident forensics. That number is not a failure of individual engineers. It is a structural problem: the detection tooling most teams can afford is configuration-focused, and runtime visibility has historically required a dedicated security engineering team to build and maintain.

eBPF changed the technical feasibility side of that equation. A single DaemonSet deployment, a 14-day baseline window, and tactic-enriched alerts that route to the tools your team already uses (Datadog, Splunk, PagerDuty) gets a 2-person DevSecOps team to a meaningfully different detection posture without a six-month implementation project.

That is the gap we built Kubesentry to close. Not the large-org, 50-person security org gap. The mid-size SaaS team running EKS in production, with one or two people who own security outcomes and need runtime visibility that matches what attackers actually do.

If your team is evaluating Kubernetes runtime detection options, we are happy to walk through a proof-of-concept deployment in your environment. Reach out and request a demo.