In 2019, I was running a global trading platform at Deutsche Bank. We had 200+ microservices on Kubernetes across three data centers. Our observability stack? Datadog agents on every node, Jaeger sidecars injected into every pod, custom StatsD instrumentation in every service. The result: 15% of our cluster capacity consumed purely by monitoring infrastructure. When something went wrong at 3 AM, we were drowning in telemetry but still couldn't answer the most basic questions in under 30 minutes.
That was the state of the art five years ago. Today, teams running eBPF-based observability get the same — often better — visibility for a fraction of the resource cost. And they don't change a single line of application code to get it.
eBPF (extended Berkeley Packet Filter) has evolved from a niche networking hack into the foundational technology behind a new generation of Kubernetes observability tools. In 2026, if you're still relying exclusively on sidecar-based tracing and log scraping to understand your cluster, you're operating at a significant competitive disadvantage.
What Is eBPF and Why Does It Matter for Kubernetes?
eBPF is a revolutionary Linux kernel technology that allows you to run sandboxed programs inside the kernel — safely, efficiently, and without modifying kernel source code or loading kernel modules. Think of it as a programmable hook into the operating system's most fundamental layer.
Originally designed for packet filtering (hence the name), eBPF has expanded into a general-purpose kernel extension mechanism. An eBPF program can attach to kernel events — system calls, network packet processing, function entry/exit points, hardware performance counters — and execute logic when those events fire. The kernel verifier ensures eBPF programs can't crash the kernel, loop infinitely, or access unauthorized memory.
Why This Is Revolutionary for Kubernetes
Every container in your Kubernetes cluster is ultimately just a set of Linux processes running on a host operating system. When those processes make network connections, read files, execute system calls, or consume CPU, those events pass through the kernel. eBPF sits at that chokepoint and watches everything — for every container, for every pod — from a single program running per node.
The implications are significant:
- No application changes required. The kernel sees everything regardless of whether your application is instrumented. Rust services, Go binaries, Python scripts, legacy Java monoliths — eBPF observes them all equally.
- No sidecar overhead. One eBPF program per node replaces dozens of proxy containers, each consuming 50–200MB of memory.
- Kernel-accurate timestamps. Latency measurements at the nanosecond resolution, not subject to user-space clock drift.
- Security observability. Every syscall, every file access, every network connection — visible and enforceable at the kernel layer.
eBPF in 2026: From Experimental to Production Standard
Three years ago, eBPF was still considered adventurous for production use. In 2026, it's table stakes. The CNCF Observability Technical Advisory Group's annual survey shows that 67% of teams running Kubernetes at scale have adopted at least one eBPF-based observability tool. Cilium — the most prominent eBPF-based CNI — is now the default networking plugin for GKE, EKS, and AKS (all with eBPF-mode enabled). The question is no longer "should we use eBPF?" but "how do we build the right stack around it?"
The Hidden Observability Tax: Why Traditional Approaches Fall Short
To appreciate what eBPF solves, you need to understand the resource cost of traditional Kubernetes observability. Let me put some real numbers on it.
The Sidecar Sprawl Problem
A typical enterprise Kubernetes cluster running a service mesh (Istio or Linkerd) with distributed tracing injects a sidecar proxy into every pod. Each Envoy proxy consumes approximately:
- 50–150 MB RAM baseline, scaling with connection count
- 0.5–2% CPU overhead per pod, per RPS
- Additional latency: 1–5ms per service hop (two proxies per connection)
On a cluster with 500 pods, that's potentially 75 GB of RAM consumed purely by proxy sidecars — before a single line of your business application code executes. For high-throughput, low-latency services (financial trading systems, real-time ML inference), the added latency alone is unacceptable.
The Log Cardinality Explosion
Traditional pod-level logging with FluentBit or Fluentd was designed for monoliths. In a microservices cluster, log volume scales with the product of services × instances × request rate. Teams routinely hit:
- 10–50 TB/day of raw log data in large deployments
- $50,000–$200,000/month in log ingestion costs (Datadog, Splunk, Elastic)
- Signal-to-noise ratio below 5% — most logs are noise
The Instrumentation Coverage Gap
SDK-based tracing (OpenTelemetry, Jaeger) requires developers to instrument their code. In practice, this never happens uniformly. Legacy services written before distributed tracing existed are black boxes. Third-party dependencies never expose spans. The result: your service map has gaps exactly where incidents tend to originate.
eBPF eliminates all three of these problems. It observes everything, from the kernel up, without requiring developer action.
The 2026 eBPF Observability Toolkit: Cilium, Pixie, Tetragon & Beyla
The eBPF ecosystem has matured significantly. These four tools form a complete, complementary observability stack for Kubernetes in 2026.
1. Cilium + Hubble: Network Observability
Cilium is an eBPF-based CNI (Container Network Interface) plugin that replaces iptables with eBPF programs for network policy enforcement. Hubble is its built-in observability layer, providing real-time flow visibility across the cluster.
What Hubble gives you:
- Every TCP/UDP flow between pods, with source/destination, latency, and drop reason
- HTTP/gRPC layer-7 visibility (path, method, response code, latency) without sidecars
- DNS request tracking — see what external domains every pod is resolving
- Network policy drop logs with full context
# Install Cilium with Hubble enabled
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--version 1.16.0 \
--namespace kube-system \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}"
# Verify eBPF datapath is active
cilium status --brief
cilium connectivity test
2. Pixie: Auto-Instrumented Application Telemetry
Pixie (now a CNCF incubating project, developed at New Relic) goes beyond network flows into application-layer telemetry. Using eBPF uprobes and uretprobes, Pixie automatically traces:
- HTTP/1.1, HTTP/2, gRPC request/response bodies and headers
- PostgreSQL, MySQL, Redis, Cassandra, Kafka protocol messages
- JVM, Python, Go, Node.js runtime metrics
- Full-body request capture (configurable, filtered at edge for PII)
The killer feature: Pixie captures 100% of requests for 60 seconds in a rolling in-memory buffer per node (8 GB default), then aggregates statistics for long-term storage. You get full request visibility for debugging without the storage cost of traditional APM.
# Deploy Pixie (requires Linux kernel ≥ 4.14, recommended ≥ 5.8)
bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"
px deploy
# Run a pre-built script to see HTTP traffic
px run px/http_data -- -start_time '-5m' -namespace 'production'
# Custom PxL script for service latency breakdown
px run px/service_stats -- -service 'checkout-service' -start_time '-10m'
3. Tetragon: Security Observability and Runtime Enforcement
Tetragon (from Isovalent, the Cilium company) adds security observability and policy enforcement using eBPF. Unlike network-layer tools, Tetragon operates at the syscall level — giving you visibility into what processes are doing inside containers.
Tetragon catches what other tools miss:
- A compromised container executing
curlto exfiltrate data - Privilege escalation attempts via
setuidsyscalls - Unexpected file access patterns (reading
/etc/passwd, writing to/tmp) - Cryptominer detection via CPU time + network pattern analysis
# Tetragon TracingPolicy: Alert on shell execution in containers
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: detect-shell-execution
spec:
kprobes:
- call: "sys_execve"
syscall: true
args:
- index: 0
type: "string"
selectors:
- matchArgs:
- index: 0
operator: "Postfix"
values:
- "/sh"
- "/bash"
- "/dash"
- "/zsh"
matchActions:
- action: Sigkill # Block it entirely, or use Post for audit-only
4. Grafana Beyla: OpenTelemetry Spans Without SDK
Beyla is Grafana's eBPF-based auto-instrumentation agent. Where Pixie keeps telemetry in-cluster, Beyla emits standard OpenTelemetry traces and metrics — making it a drop-in replacement for manual OpenTelemetry SDK instrumentation.
Beyla uses uprobes to hook into Go, Python, Node.js, Java, and Rust runtime internals, automatically generating OTLP spans that flow directly into your Grafana Tempo, Jaeger, or any OTLP-compatible backend.
# Deploy Beyla as a DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: beyla
namespace: monitoring
spec:
selector:
matchLabels:
app: beyla
template:
metadata:
labels:
app: beyla
spec:
hostPID: true # Required for eBPF process visibility
containers:
- name: beyla
image: grafana/beyla:1.8.0
env:
- name: BEYLA_TRACE_PRINTER
value: "otel"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://tempo.monitoring.svc.cluster.local:4317"
- name: BEYLA_OPEN_PORT
value: "8080,3000,9090"
securityContext:
privileged: false
capabilities:
add:
- BPF
- PERFMON
- NET_ADMIN
- SYS_PTRACE
Building a Production eBPF Observability Stack
Each tool above is powerful alone — but together, they form a layered observability architecture that covers every tier of your stack.
The Reference Architecture
┌─────────────────────────────────────────────────────────────────┐
│ VISUALIZATION LAYER │
│ Grafana (dashboards) + Hubble UI (flows) + Pixie UI (APM) │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ STORAGE LAYER │
│ Prometheus (metrics) │ Grafana Tempo (traces) │ Loki (logs) │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ COLLECTION LAYER │
│ Cilium/Hubble │ Pixie Agents │ Beyla │ Tetragon │
│ (L3/L4/L7 net) │ (APM traces) │ (OTEL) │ (syscalls/sec) │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ KERNEL LAYER (eBPF) │
│ kprobes │ uprobes │ tracepoints │ XDP │ TC hooks │
└─────────────────────────────────────────────────────────────────┘
Resource Requirements and Node Sizing
One of the most common questions I get in training: "How much overhead does this actually add?" Here's what to expect per node running the full stack:
- Cilium/Hubble: ~100–200 MB RAM, 0.1–0.3 CPU cores (negligible per pod)
- Pixie node agent: 8 GB RAM per node (for rolling telemetry buffer), 0.5–1 CPU core
- Beyla: ~50 MB RAM, <0.05 CPU cores
- Tetragon: ~80 MB RAM, 0.05–0.1 CPU cores
Compare this to the sidecar approach: a 500-pod cluster with Istio + Datadog agent consumes 15–20 GB RAM and 8–12 CPU cores in monitoring infrastructure. The eBPF stack for the same cluster: approximately 12 GB RAM and 3–4 CPU cores total — across the entire cluster, not per pod.
Kernel Version Requirements
eBPF capabilities are gated by kernel version. Here's the minimum matrix:
- Kernel 5.4+: Cilium, basic Hubble metrics
- Kernel 5.8+: Pixie (requires BTF support), Beyla auto-instrumentation
- Kernel 5.10+ (LTS): Recommended baseline — full feature set for all tools
- Kernel 6.x: Best performance, CO-RE (Compile Once, Run Everywhere) portability
In AWS EKS, this means Amazon Linux 2023 (kernel 6.1) or Bottlerocket. In GKE, Container-Optimized OS ships kernel 6.1+. For on-premises clusters, Ubuntu 22.04 LTS ships 5.15; Ubuntu 24.04 ships 6.8.
Step-by-Step Implementation Guide for Production Clusters
Here's the battle-tested approach I use when deploying eBPF observability for enterprise teams during our Kubernetes and DevOps training engagements.
Phase 1: Replace Your CNI with Cilium (Week 1)
If you're running flannel, calico, or weave, migration to Cilium is the highest-leverage first step. Cilium gives you eBPF networking AND Hubble observability in one move.
# 1. Drain nodes and migrate CNI (use cluster-managed migration in EKS/GKE)
# For self-managed clusters:
kubectl -n kube-system delete ds kube-flannel-ds
kubectl delete clusterrole flannel
kubectl delete clusterrolebinding flannel
# 2. Install Cilium with Hubble and Prometheus metrics
helm install cilium cilium/cilium \
--version 1.16.0 \
--namespace kube-system \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set prometheus.enabled=true \
--set operator.prometheus.enabled=true \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"
# 3. Verify flows are being captured
cilium hubble port-forward &
hubble observe --follow --type l7
Phase 2: Deploy Beyla for Auto-Instrumented Traces (Week 2)
Once Cilium is stable, add Beyla to auto-generate OpenTelemetry spans from your existing services — no SDK required:
# Add Grafana Helm repo
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Deploy Beyla with Tempo backend
helm install beyla grafana/beyla \
--namespace monitoring \
--set beyla.config.data.otel_traces_export.endpoint="http://tempo.monitoring.svc:4317" \
--set beyla.config.data.open_port="8080,3000,9090,50051" \
--set beyla.config.data.service_name_attribute="auto"
# Verify traces appearing in Tempo
kubectl logs -n monitoring -l app=beyla --tail=50 | grep "span"
Phase 3: Deploy Tetragon for Security Observability (Week 3)
# Install Tetragon
helm repo add cilium https://helm.cilium.io/
helm install tetragon cilium/tetragon \
--namespace kube-system \
--set tetragon.enableProcessCred=true \
--set tetragon.enableProcessNs=true
# Apply the shell-execution detection policy (from earlier example)
kubectl apply -f detect-shell-execution.yaml
# Stream security events in real time
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact --pods checkout-service
Phase 4: Unified Dashboards in Grafana (Week 4)
Import these Grafana dashboard IDs to get started immediately:
- 16611 — Cilium/Hubble network flows overview
- 16612 — Hubble DNS metrics
- 18814 — Beyla RED metrics (Rate, Errors, Duration) per service
- 20193 — Tetragon security events
# Import dashboards via Grafana API
for dashboard_id in 16611 16612 18814 20193; do
curl -s -X POST \
-H "Content-Type: application/json" \
-d "{\"dashboard\": {\"id\": null}, \"folderId\": 0, \"overwrite\": false, \"inputs\": [{\"name\": \"DS_PROMETHEUS\", \"type\": \"datasource\", \"pluginId\": \"prometheus\", \"value\": \"Prometheus\"}]}" \
"http://admin:admin@grafana.monitoring.svc.cluster.local:3000/api/dashboards/import"
done
Gotchas and Production Considerations
In my experience deploying this stack across enterprise clients, here are the three issues you'll hit and how to solve them:
-
PodSecurityAdmission (PSA) conflicts. Beyla and Tetragon require elevated capabilities (
BPF,PERFMON,SYS_PTRACE). In clusters running restricted PSA profiles, you need aPodSecurityPolicyexception or namespace label override:pod-security.kubernetes.io/enforce: privilegedon the monitoring namespace. -
Pixie memory pressure on small nodes. Pixie's 8 GB per-node buffer will OOMKill on t3.medium instances. Either set
table_store_data_limit_mb: 2048in the Pixie config or ensure nodes have ≥ 16 GB RAM. -
Cilium migration causing brief connectivity drops. Never migrate the CNI on production during business hours. Use a canary node group, validate with
cilium connectivity test, and migrate one node pool at a time.
Integration with Existing Prometheus/Grafana Stack
If you're already running the kube-prometheus-stack, Cilium and Beyla both expose native Prometheus metrics. Add these scrape configs to your prometheus-operator ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cilium-hubble
namespace: monitoring
spec:
selector:
matchLabels:
k8s-app: hubble
namespaceSelector:
matchNames:
- kube-system
endpoints:
- port: hubble-metrics
interval: 15s
path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: beyla
namespace: monitoring
spec:
selector:
matchLabels:
app: beyla
endpoints:
- port: metrics
interval: 15s
Frequently Asked Questions
What is eBPF and why is it important for Kubernetes observability?
eBPF (extended Berkeley Packet Filter) is a Linux kernel technology that runs sandboxed programs in kernel space without modifying kernel source code or loading kernel modules. For Kubernetes observability, it captures network flows, system calls, and performance metrics at the kernel level with near-zero overhead — eliminating the need to instrument individual applications or inject sidecar proxies. Because every container on a Linux node shares the same kernel, a single eBPF program can observe all containers simultaneously.
What is the difference between eBPF observability and traditional sidecar-based monitoring?
Traditional sidecar-based monitoring (like Envoy in a service mesh) injects a proxy container into every pod, adding memory overhead of 50–150 MB per pod and CPU overhead on every network packet. eBPF observability runs a single kernel-level program per node, capturing the same data for all pods with under 1% CPU overhead per node, no code changes, and no pod restarts required. For a 500-pod cluster, that's the difference between >75 GB RAM for sidecars vs. ~12 GB for the entire eBPF stack.
Which eBPF tools should I use for Kubernetes monitoring in production?
The most production-ready eBPF stack for Kubernetes in 2026 combines: Cilium + Hubble for CNI and L3-L7 network visibility, Pixie for auto-instrumented APM traces and service maps, Tetragon for security observability and runtime enforcement, and Grafana Beyla for standard OpenTelemetry span emission without SDK instrumentation. All four integrate with Prometheus and Grafana, and all are CNCF projects with production-grade stability.
Do I need to modify my application code to use eBPF monitoring?
No — this is the primary advantage of eBPF observability. Tools like Cilium/Hubble, Pixie, and Beyla observe your applications purely from the kernel and runtime layers. Your Go, Python, Java, or Node.js services require zero code changes. This is especially valuable for teams with legacy services that cannot easily add OpenTelemetry SDKs, and for third-party containers where you don't have source code access.
What Linux kernel version is required for eBPF Kubernetes observability?
The minimum practical kernel for the full stack is 5.10 (LTS). For Cilium basics, 5.4 is sufficient. For Pixie auto-instrumentation, 5.8+ with BTF (BPF Type Format) enabled is required. The recommended kernel for 2026 production use is 6.1+ (Ubuntu 24.04, Amazon Linux 2023, Bottlerocket, Container-Optimized OS), which supports CO-RE (Compile Once, Run Everywhere) portability and the latest eBPF map types.
Conclusion: eBPF Is the Observability Standard for Kubernetes in 2026
The shift to eBPF-based Kubernetes observability is not a future trend — it's happening now, in production, at scale. The four-tool stack I've outlined here (Cilium + Hubble, Pixie, Tetragon, Beyla) gives you more comprehensive observability than traditional sidecar-based approaches at a fraction of the resource cost, with zero application changes required.
From my years running infrastructure at JPMorgan Chase and Deutsche Bank, I've seen firsthand how observability gaps cost companies millions in incident resolution time, regulatory exposure, and customer trust. eBPF closes those gaps at the kernel level — where the truth actually lives.
The migration path is phased and manageable: start with Cilium as your CNI (this is increasingly the default anyway), add Beyla for auto-instrumented traces, Pixie for deep APM sessions, and Tetragon for security compliance. Within four weeks, you can have a production-grade eBPF observability stack that outperforms systems that took years to build with the old approach.
If your team is ready to master Kubernetes observability, security, and platform engineering — including hands-on eBPF labs — explore our enterprise training programs. Our Kubernetes Advanced certification program includes dedicated modules on eBPF tooling and production deployment patterns.