Security Checklist

    This checklist aims at providing a basic list of guidance with links to more comprehensive documentation on each topic. It does not claim to be exhaustive and is meant to evolve.

    On how to read and use this document:

    • The order of topics does not reflect an order of priority.
    • Some checklist items are detailed in the paragraph below the list of each section.

    Caution: Checklists are not sufficient for attaining a good security posture on their own. A good security posture requires constant attention and improvement, but a checklist can be the first step on the never-ending journey towards security preparedness. Some of the recommendations in this checklist may be too restrictive or too lax for your specific security needs. Since Kubernetes security is not “one size fits all”, each category of checklist items should be evaluated on its merits.

    • group is not used for user or component authentication after bootstrapping.
    • The kube-controller-manager is running with --use-service-account-credentials enabled.
    • Intermediate and leaf certificates have an expiry date no more than 3 years in the future.
    • A process exists for periodic access review, and reviews occur no more than 24 months apart.
    • The is followed for guidance related to authentication and authorization.

    After bootstrapping, neither users nor components should authenticate to the Kubernetes API as system:masters. Similarly, running all of kube-controller-manager as system:masters should be avoided. In fact, system:masters should only be used as a break-glass mechanism, as opposed to an admin user.

    Network security

    • CNI plugins in-use supports network policies.
    • Ingress and egress network policies are applied to all workloads in the cluster.
    • Default network policies within each namespace, selecting all pods, denying everything, are in place.
    • If appropriate, a service mesh is used to encrypt all communications inside of the cluster.
    • The Kubernetes API, kubelet API and etcd are not exposed publicly on Internet.
    • Access from the workloads to the cloud metadata API is filtered.
    • Use of LoadBalancer and ExternalIPs is restricted.

    A number of plugins provide the functionality to restrict network resources that pods may communicate with. This is most commonly done through Network Policies which provide a namespaced resource to define rules. Default network policies blocking everything egress and ingress, in each namespace, selecting all the pods, can be useful to adopt an allow list approach, ensuring that no workloads is missed.

    Not all CNI plugins provide encryption in transit. If the chosen plugin lacks this feature, an alternative solution could be to use a service mesh to provide that functionality.

    The etcd datastore of the control plane should have controls to limit access and not be publicly exposed on the Internet. Furthermore, mutual TLS (mTLS) should be used to communicate securely with it. The certificate authority for this should be unique to etcd.

    External Internet access to the Kubernetes API server should be restricted to not expose the API publicly. Be careful as many managed Kubernetes distribution are publicly exposing the API server by default. You can then use a bastion host to access the server.

    The API access should be restricted and not publicly exposed, the defaults authentication and authorization settings, when no configuration file specified with the --config flag, are overly permissive.

    If a cloud provider is used for hosting Kubernetes, the access from pods to the cloud metadata API 169.254.169.254 should also be restricted or blocked if not needed because it may leak information.

    For restricted LoadBalancer and ExternalIPs use, see CVE-2020-8554: Man in the middle using LoadBalancer or ExternalIPs and the for further information.

    Pod security

    • RBAC rights to create, update, patch, workloads is only granted if necessary.
    • Appropriate Pod Security Standards policy is applied for all namespaces and enforced.
    • Memory limit is set for the workloads with a limit equal or inferior to the request.
    • CPU limit might be set on sensitive workloads.
    • For nodes that support it, Seccomp is enabled with appropriate syscalls profile for programs.
    • For nodes that support it, AppArmor or SELinux is enabled with appropriate profile for programs.

    RBAC authorization is crucial but (or on any resource that manages Pods). The only granularity is the API verbs on the resource itself, for example, create on Pods. Without additional admission, the authorization to create these resources allows direct unrestricted access to the schedulable nodes of a cluster.

    The Pod Security Standards define three different policies, privileged, baseline and restricted that limit how fields can be set in the PodSpec regarding security. These standards can be enforced at the namespace level with the new admission, enabled by default, or by third-party admission webhook. Please note that, contrary to the removed PodSecurityPolicy admission it replaces, Pod Security admission can be easily combined with admission webhooks and external services.

    Pod Security admission restricted policy, the most restrictive policy of the set, can operate in several modes, warn, audit or enforce to gradually apply the most appropriate according to security best practices. Nevertheless, pods’ security context should be separately investigated to limit the privileges and access pods may have on top of the predefined security standards, for specific use cases.

    For a hands-on tutorial on , see the blog post Kubernetes 1.23: Pod Security Graduates to Beta.

    should be set in order to restrict the memory and CPU resources a pod can consume on a node, and therefore prevent potential DoS attacks from malicious or breached workloads. Such policy can be enforced by an admission controller. Please note that CPU limits will throttle usage and thus can have unintended effects on auto-scaling features or efficiency i.e. running the process in best effort with the CPU resource available.

    Caution: Memory limit superior to request can expose the whole node to OOM issues.

    Seccomp can improve the security of your workloads by reducing the Linux kernel syscall attack surface available inside containers. The seccomp filter mode leverages BPF to create an allow or deny list of specific syscalls, named profiles. Those seccomp profiles can be enabled on individual workloads, . In addition, the Kubernetes Security Profiles Operator is a project to facilitate the management and use of seccomp in clusters.

    For historical context, please note that Docker has been using to only allow a restricted set of syscalls since 2016 from Docker Engine 1.10, but Kubernetes is still not confining workloads by default. The default seccomp profile can be found as well. Fortunately, Seccomp Default, a new alpha feature to use a default seccomp profile for all workloads can now be enabled and tested.

    Note: Seccomp is only available on Linux nodes.

    Enabling AppArmor or SELinux

    AppArmor

    is a Linux kernel security module that can provide an easy way to implement Mandatory Access Control (MAC) and better auditing through system logs. To enable AppArmor in Kubernetes, at least version 1.4 is required. Like seccomp, AppArmor is also configured through profiles, where each profile is either running in enforcing mode, which blocks access to disallowed resources or complain mode, which only reports violations. AppArmor profiles are enforced on a per-container basis, with an annotation, allowing for processes to gain just the right privileges.

    Note: AppArmor is only available on Linux nodes, and enabled in .

    SELinux

    is also a Linux kernel security module that can provide a mechanism for supporting access control security policies, including Mandatory Access Controls (MAC). SELinux labels can be assigned to containers or pods via their securityContext section.

    • Audit logs, if enabled, are protected from general access.

    Allowing broad access to Kubernetes logs can make security information available to a potential attacker.

    As a good practice, set up a separate means to collect and aggregate control plane logs, and do not use the /logs API endpoint. Alternatively, if you run your control plane with the /logs API endpoint and limit the content of /var/log (within the host or container where the API server is running) to Kubernetes API server logs only.

    Pod placement

    • Pod placement is done in accordance with the tiers of sensitivity of the application.
    • Sensitive applications are running isolated on nodes or with specific sandboxed runtimes.

    Pods that are on different tiers of sensitivity, for example, an application pod and the Kubernetes API server, should be deployed onto separate nodes. The purpose of node isolation is to prevent an application container breakout to directly providing access to applications with higher level of sensitivity to easily pivot within the cluster. This separation should be enforced to prevent pods accidentally being deployed onto the same node. This could be enforced with the following features:

    Key-value pairs, as part of the pod specification, that specify which nodes to deploy onto. These can be enforced at the namespace and cluster level with the PodNodeSelector admission controller.

    An admission controller that allows administrators to restrict permitted tolerations within a namespace. Pods within a namespace may only utilize the tolerations specified on the namespace object annotation keys that provide a set of default and allowed tolerations.

    RuntimeClass is a feature for selecting the container runtime configuration. The container runtime configuration is used to run a Pod’s containers and can provide more or less isolation from the host at the cost of performance overhead.

    Secrets

    • ConfigMaps are not used to hold confidential data.
    • Encryption at rest is configured for the Secret API.
    • If appropriate, a mechanism to inject secrets stored in third-party storage is deployed and available.
    • Service account tokens are not mounted in pods that don’t require them.
    • is in-use instead of non-expiring tokens.

    Secrets required for pods should be stored within Kubernetes Secrets as opposed to alternatives such as ConfigMap. Secret resources stored within etcd should be encrypted at rest.

    Pods needing secrets should have these automatically mounted through volumes, preferably stored in memory like with the . Mechanism can be used to also inject secrets from third-party storages as volume, like the Secrets Store CSI Driver. This should be done preferentially as compared to providing the pods service account RBAC access to secrets. This would allow adding secrets into the pod as environment variables or files. Please note that the environment variable method might be more prone to leakage due to crash dumps in logs and the non-confidential nature of environment variable in Linux, as opposed to the permission mechanism on files.

    Service account tokens should not be mounted into pods that do not require them. This can be configured by setting to false either within the service account to apply throughout the namespace or specifically for a pod. For Kubernetes v1.22 and above, use Bound Service Accounts for time-bound service account credentials.

    • Minimize unnecessary content in container images.
    • Container images are configured to be run as unprivileged user.
    • References to container images are made by sha256 digests (rather than tags) or the provenance of the image is validated by verifying the image’s digital signature at deploy time via admission control.
    • Container images are regularly scanned during creation and in deployment, and known vulnerable software is patched.

    Container image should contain the bare minimum to run the program they package. Preferably, only the program and its dependencies, building the image from the minimal possible base. In particular, image used in production should not contain shells or debugging utilities, as an can be used for troubleshooting.

    Build images to directly start with an unprivileged user by using the USER instruction in Dockerfile. The allows a container image to be started with a specific user and group with runAsUser and runAsGroup, even if not specified in the image manifest. However, the file permissions in the image layers might make it impossible to just start the process with a new unprivileged user without image modification.

    Avoid using image tags to reference an image, especially the latest tag, the image behind a tag can be easily modified in a registry. Prefer using the complete sha256 digest which is unique to the image manifest. This policy can be enforced via an ImagePolicyWebhook. Image signatures can also be automatically at deploy time to validate their authenticity and integrity.

    Scanning a container image can prevent critical vulnerabilities from being deployed to the cluster alongside the container image. Image scanning should be completed before deploying a container image to a cluster and is usually done as part of the deployment process in a CI/CD pipeline. The purpose of an image scan is to obtain information about possible vulnerabilities and their prevention in the container image, such as a Common Vulnerability Scoring System (CVSS) score. If the result of the image scans is combined with the pipeline compliance rules, only properly patched container images will end up in Production.

    Admission controllers

    • An appropriate selection of admission controllers is enabled.
    • A pod security policy is enforced by the Pod Security Admission or/and a webhook admission controller.
    • The admission chain plugins and webhooks are securely configured.

    Admission controllers can help to improve the security of the cluster. However, they can present risks themselves as they extend the API server and should be properly secured.

    The following lists present a number of admission controllers that could be considered to enhance the security posture of your cluster and application. It includes controllers that may be referenced in other parts of this document.

    This first group of admission controllers includes plugins , consider to leave them enabled unless you know what you are doing:

    CertificateApproval

    Performs additional authorization checks to ensure the signing user has permission to sign certificate requests.

    CertificateSubjectRestriction

    Rejects any certificate request that specifies a ‘group’ (or ‘organization attribute’) of system:masters.

    Enforce the LimitRange API constraints.

    MutatingAdmissionWebhook

    Allows the use of custom controllers through webhooks, these controllers may mutate requests that it reviews.

    Replacement for Pod Security Policy, restricts security contexts of deployed Pods.

    ResourceQuota

    Enforces resource quotas to prevent over-usage of resources.

    Allows the use of custom controllers through webhooks, these controllers do not mutate requests that it reviews.

    The second group includes plugin that are not enabled by default but in general availability state and recommended to improve your security posture:

    DenyServiceExternalIPs

    Rejects all net-new usage of the field. This is a mitigation for .

    NodeRestriction

    Restricts kubelet’s permissions to only modify the pods API resources they own or the node API ressource that represent themselves. It also prevents kubelet from using the node-restriction.kubernetes.io/ annotation, which can be used by an attacker with access to the kubelet’s credentials to influence pod placement to the controlled node.

    The third group includes plugins that are not enabled by default but could be considered for certain use cases:

    Enforces the usage of the latest version of a tagged image and ensures that the deployer has permissions to use the image.

    ImagePolicyWebhook

    What’s next