Quick take: Linux capabilities are checked by the host kernel. With Docker’s default namespaces, many actions are contained to the container’s view of the system. Some capabilities, however, still impact the whole host and should be treated as near-root. This guide keeps your container perms tight and your host safe.
Understanding namespaces & capabilities (and why it matters)
Docker isolates processes using Linux namespaces (PID, NET, MNT, IPC, UTS, USER…), while capabilities split the all-powerful root into fine-grained privileges (e.g., CAP_NET_ADMIN
, CAP_SYS_ADMIN
). Namespaces define where an action applies; capabilities define what the process may do. Together, they allow “least privilege” containers instead of full root.
Important: some capabilities remain effectively global in common Docker setups (for example, host time isn’t namespaced by default, so CAP_SYS_TIME
can affect the host). Treat these like sharp tools.
Security baseline: Drop everything and add back only what you need (--cap-drop ALL
then selective --cap-add
), and avoid --privileged
entirely unless you truly need host-level access.
Bonus hardening: Consider User Namespace Remapping so “root” inside the container maps to an unprivileged UID on the host.
Mostly namespaced / safer (with Docker’s default isolation)
Capability | What it allows | Scope (typical) | Risk | Typical uses |
---|---|---|---|---|
CAP_NET_ADMIN | Change IPs, routes, firewall rules, links. | Network namespace | 🟧 Medium | VPN clients, routing, tc/iptables tweaks inside container netns. |
CAP_NET_RAW | Raw sockets (ping, packet capture). | Network namespace | 🟧 Medium | Ping, tcpdump, IDS/packet tools. |
CAP_NET_BIND_SERVICE | Bind to ports < 1024. | Network namespace | 🟩 Low | Listening on :80/:443 as non-root. |
CAP_NET_BROADCAST | Send broadcast/multicast. | Network namespace | 🟧 Medium | Service discovery, custom L2/L3 testing. |
CAP_CHOWN | Change file owners. | Mount namespace | 🟧 Medium | Installers, packaging steps. |
CAP_DAC_OVERRIDE | Bypass file R/W/X perms. | Mount namespace | 🟧 Medium | Backup/restore tools, init scripts. |
CAP_DAC_READ_SEARCH | Bypass read/search perms. | Mount namespace | 🟧 Medium | Indexing, scanning. |
CAP_FOWNER | Bypass file ownership checks. | Mount namespace | 🟧 Medium | Fixing perms across app files. |
CAP_FSETID | Preserve/set setuid/setgid bits. | Mount namespace | 🟧 Medium | Packaging special binaries. |
CAP_MKNOD | Create special device files. | Mount namespace | 🟧 Medium | Init systems, device simulations. |
CAP_LINUX_IMMUTABLE | Set immutable/append-only flags. | Mount namespace | 🟧 Medium | Hardening files (careful with bind mounts!). |
CAP_SETUID | Set process UID arbitrarily. | PID namespace | 🟧 Medium | Daemons dropping/raising privileges. |
CAP_SETGID | Set process GID arbitrarily. | PID namespace | 🟧 Medium | Same as above (groups). |
CAP_SETPCAP | Grant/remove process capabilities. | PID namespace | 🟧 Medium | Init wrappers, launchers. |
CAP_SETFCAP | Set file xattrs for capabilities. | Mount namespace | 🟧 Medium | Packaging binaries with caps. |
CAP_KILL | Signal any process (in same PID ns). | PID namespace | 🟧 Medium | Supervisors, debuggers. |
CAP_SYS_NICE | Change priorities/scheduling. | PID namespace | 🟧 Medium | Low-latency apps. |
CAP_SYS_RESOURCE | Override resource limits (ulimits). | PID namespace | 🟧 Medium | DBs, HPC, large RLIMITs. |
CAP_SYS_TTY_CONFIG | Configure TTYs. | PID/UTS namespaces | 🟧 Medium | Interactive daemons, serial tools. |
CAP_IPC_LOCK | Lock memory into RAM. | IPC/PID namespaces | 🟧 Medium | Crypto, low-latency apps. |
CAP_IPC_OWNER | Bypass IPC ownership checks. | IPC namespace | 🟧 Medium | Legacy IPC mgmt. |
CAP_SYS_CHROOT | Use chroot . | Mount/PID namespaces | 🟧 Medium | Build systems, init tools. |
CAP_SYS_PTRACE | Trace/debug processes (same PID ns). | PID namespace | 🟧 Medium | Debuggers, profilers. |
CAP_AUDIT_READ | Read audit logs (where available). | Namespaced/filtered | 🟧 Medium | Security tooling (limited in containers). |
CAP_CHECKPOINT_RESTORE | CRIU checkpoint/restore operations. | PID/NET/MNT interplay | 🟧 Medium | Live-migrate processes, fast restarts. |
Dangerous / global-effect (host-level impact; avoid unless you truly need them)
Capability | What it allows | Why dangerous | Risk | Typical uses |
---|---|---|---|---|
CAP_SYS_ADMIN | Mounts, many ioctls, namespace ops; huge surface. | Swiss-army knife; many actions leak outside namespaces, often near-root. | 🟥 High | FUSE, loop mounts, advanced storage/network — prefer alternatives. |
CAP_SYS_MODULE | Load/unload kernel modules. | Direct kernel modification on the host. | 🟥 High | Kernel dev (not for typical containers). |
CAP_SYS_BOOT | Reboot the machine. | Affects entire host. | 🟥 High | Almost never in containers. |
CAP_SYS_TIME | Set system clock/timers. | Host time (often) isn’t namespaced in Docker; changes the host clock. | 🟥 High | Time-sync daemons (not recommended in containers). |
CAP_SYS_RAWIO | Direct hardware I/O port access. | Bypasses driver abstractions. | 🟥 High | Specialized HW tooling. |
CAP_SYSLOG | Read kernel logs (dmesg ), control klog. | Leaky visibility into host kernel state. | 🟥 High | Kernel troubleshooting (host tools preferred). |
CAP_AUDIT_CONTROL | Configure audit subsystem. | Global security policy changes. | 🟥 High | Host security mgmt, not containers. |
CAP_AUDIT_WRITE | Write to audit logs. | Global audit channel abuse possible. | 🟥 High | Rare in containers. |
CAP_MAC_ADMIN | Configure MAC (e.g., SELinux/Smack). | Alters host security policy. | 🟥 High | Security frameworks (host-level). |
CAP_MAC_OVERRIDE | Bypass MAC restrictions. | Defeats host MAC confinement. | 🟥 High | Almost never safe. |
CAP_PERFMON | Advanced perf events tracing. | Can observe host kernel/other tasks. | 🟥 High | Low-level perf analysis (host tools better). |
CAP_BPF | Load/manage (e)BPF programs/maps. | Hooks into host kernel paths; potential escapes/DoS. | 🟥 High | Net observability/firewalls — prefer dedicated agents. |
CAP_BLOCK_SUSPEND | Block system suspend. | Impacts host power mgmt. | 🟥 High | Power daemons (host-level). |
CAP_WAKE_ALARM | Schedule RTC wakeups. | Host power/timing side effects. | 🟥 High | Embedded/power mgmt; avoid in containers. |
Practical tips (production-ready)
- Prefer least privilege:
--cap-drop ALL
then add only what you truly need with--cap-add
. - Avoid
--privileged
: it adds all capabilities and lifts device cgroup restrictions—near host-root. - Beware bind mounts: file-system caps (
CHOWN
,LINUX_IMMUTABLE
, etc.) can modify host files if you’ve mounted host paths. - Network safety: if you grant strong NET caps, avoid
network_mode: host
so you keep the network namespace boundary. - Layer controls: combine capabilities with seccomp (syscall allow-list) and AppArmor/SELinux for defense-in-depth.
Docker Compose example (safe pattern)
services:
app:
image: your/image:stable
# 1) Drop everything, add back only what you need
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
- NET_ADMIN # only if you really need it
# 2) Keep default bridged networking to preserve NET namespace
# network_mode: bridge
# 3) Optional: run as non-root user inside the container
user: "1000:1000"
# 4) Optional: use a restrictive seccomp profile
security_opt:
- seccomp:default
By the way, check out our Docker post about syslog-ng and our other docker posts.