Pod Security Policies

    Caution: PodSecurityPolicy is deprecated as of Kubernetes v1.21, and will be removed in v1.25. We recommend migrating to Pod Security Admission, or a 3rd party admission plugin. For a migration guide, see . For more information on the deprecation, see PodSecurityPolicy Deprecation: Past, Present, and Future.

    Pod Security Policies enable fine-grained authorization of pod creation and updates.

    A Pod Security Policy is a cluster-level resource that controls security sensitive aspects of the pod specification. The objects define a set of conditions that a pod must run with in order to be accepted into the system, as well as defaults for the related fields. They allow an administrator to control the following:

    Enabling Pod Security Policies

    Pod security policy control is implemented as an optional . PodSecurityPolicies are enforced by enabling the admission controller, but doing so without authorizing any policies will prevent any pods from being created in the cluster.

    Since the pod security policy API (policy/v1beta1/podsecuritypolicy) is enabled independently of the admission controller, for existing clusters it is recommended that policies are added and authorized before enabling the admission controller.

    When a PodSecurityPolicy resource is created, it does nothing. In order to use it, the requesting user or target pod’s must be authorized to use the policy, by allowing the use verb on the policy.

    Most Kubernetes pods are not created directly by users. Instead, they are typically created indirectly as part of a Deployment, , or other templated controller via the controller manager. Granting the controller access to the policy would grant access for all pods created by that controller, so the preferred method for authorizing policies is to grant access to the pod’s service account (see example).

    is a standard Kubernetes authorization mode, and can easily be used to authorize use of policies.

    First, a Role or ClusterRole needs to grant access to use the desired policies. The rules to grant access look like this:

    Then the (Cluster)Role is bound to the authorized user(s):

    1. apiVersion: rbac.authorization.k8s.io/v1
    2. kind: ClusterRoleBinding
    3. metadata:
    4. name: <binding name>
    5. roleRef:
    6. kind: ClusterRole
    7. name: <role name>
    8. apiGroup: rbac.authorization.k8s.io
    9. subjects:
    10. # Authorize all service accounts in a namespace (recommended):
    11. - kind: Group
    12. apiGroup: rbac.authorization.k8s.io
    13. name: system:serviceaccounts:<authorized namespace>
    14. # Authorize specific service accounts (not recommended):
    15. - kind: ServiceAccount
    16. name: <authorized service account name>
    17. namespace: <authorized pod namespace>
    18. # Authorize specific users (not recommended):
    19. - kind: User
    20. apiGroup: rbac.authorization.k8s.io
    21. name: <authorized user name>

    If a RoleBinding (not a ClusterRoleBinding) is used, it will only grant usage for pods being run in the same namespace as the binding. This can be paired with system groups to grant access to all pods run in the namespace:

    1. # Authorize all service accounts in a namespace:
    2. - kind: Group
    3. apiGroup: rbac.authorization.k8s.io
    4. name: system:serviceaccounts
    5. # Or equivalently, all authenticated users in a namespace:
    6. - kind: Group
    7. apiGroup: rbac.authorization.k8s.io
    8. name: system:authenticated

    For more examples of RBAC bindings, see RoleBinding examples. For a complete example of authorizing a PodSecurityPolicy, see .

    PodSecurityPolicy is being replaced by a new, simplified PodSecurity . For more details on this change, see PodSecurityPolicy Deprecation: Past, Present, and Future. Follow these guidelines to simplify migration from PodSecurityPolicy to the new admission controller:

    1. Limit your PodSecurityPolicies to the policies defined by the :

    2. Only bind PSPs to entire namespaces, by using the system:serviceaccounts:<namespace> group (where <namespace> is the target namespace). For example:

      1. apiVersion: rbac.authorization.k8s.io/v1
      2. # This cluster role binding allows all pods in the "development" namespace to use the baseline PSP.
      3. kind: ClusterRoleBinding
      4. metadata:
      5. name: psp-baseline-namespaces
      6. roleRef:
      7. kind: ClusterRole
      8. name: psp-baseline
      9. apiGroup: rbac.authorization.k8s.io
      10. subjects:
      11. - kind: Group
      12. name: system:serviceaccounts:development
      13. apiGroup: rbac.authorization.k8s.io
      14. - kind: Group
      15. name: system:serviceaccounts:canary
      16. apiGroup: rbac.authorization.k8s.io

    Troubleshooting

    • The controller manager must be run against the secured API port and must not have superuser permissions. See to learn about API server access controls.
      If the controller manager connected through the trusted API port (also known as the localhost listener), requests would bypass authentication and authorization modules; all PodSecurityPolicy objects would be allowed, and users would be able to create grant themselves the ability to create privileged containers.

      For more details on configuring controller manager authorization, see Controller Roles.

    Policy Order

    In addition to restricting pod creation and update, pod security policies can also be used to provide default values for many of the fields that it controls. When multiple policies are available, the pod security policy controller selects policies according to the following criteria:

    1. PodSecurityPolicies which allow the pod as-is, without changing defaults or mutating the pod, are preferred. The order of these non-mutating PodSecurityPolicies doesn’t matter.
    2. If the pod must be defaulted or mutated, the first PodSecurityPolicy (ordered by name) to allow the pod is selected.

    Note: During update operations (during which mutations to pod specs are disallowed) only non-mutating PodSecurityPolicies are used to validate the pod.

    This example assumes you have a running cluster with the PodSecurityPolicy admission controller enabled and you have cluster admin privileges.

    Set up

    Set up a namespace and a service account to act as for this example. We’ll use this service account to mock a non-admin user.

    1. kubectl create namespace psp-example
    2. kubectl create serviceaccount -n psp-example fake-user
    3. kubectl create rolebinding -n psp-example fake-editor --clusterrole=edit --serviceaccount=psp-example:fake-user

    To make it clear which user we’re acting as and save some typing, create 2 aliases:

    1. alias kubectl-admin='kubectl -n psp-example'
    2. alias kubectl-user='kubectl --as=system:serviceaccount:psp-example:fake-user -n psp-example'

    Create a policy and a pod

    Define the example PodSecurityPolicy object in a file. This is a policy that prevents the creation of privileged pods. The name of a PodSecurityPolicy object must be a valid DNS subdomain name.

    1. apiVersion: policy/v1beta1
    2. kind: PodSecurityPolicy
    3. metadata:
    4. name: example
    5. spec:
    6. privileged: false # Don't allow privileged pods!
    7. seLinux:
    8. rule: RunAsAny
    9. supplementalGroups:
    10. rule: RunAsAny
    11. runAsUser:
    12. rule: RunAsAny
    13. fsGroup:
    14. rule: RunAsAny
    15. volumes:
    16. - '*'

    And create it with kubectl:

    1. kubectl-admin create -f example-psp.yaml

    Now, as the unprivileged user, try to create a simple pod:

    1. kubectl-user create -f- <<EOF
    2. apiVersion: v1
    3. kind: Pod
    4. metadata:
    5. name: pause
    6. spec:
    7. containers:
    8. - name: pause
    9. image: k8s.gcr.io/pause
    10. EOF

    The output is similar to this:

    1. Error from server (Forbidden): error when creating "STDIN": pods "pause" is forbidden: unable to validate against any pod security policy: []

    What happened? Although the PodSecurityPolicy was created, neither the pod’s service account nor fake-user have permission to use the new policy:

    1. kubectl-user auth can-i use podsecuritypolicy/example
    2. no

    Create the rolebinding to grant fake-user the use verb on the example policy:

    Now retry creating the pod:

    1. kubectl-user create -f- <<EOF
    2. apiVersion: v1
    3. kind: Pod
    4. metadata:
    5. name: pause
    6. spec:
    7. containers:
    8. - name: pause
    9. image: k8s.gcr.io/pause

    The output is similar to this

    1. pod "pause" created

    It works as expected! But any attempts to create a privileged pod should still be denied:

    1. kubectl-user create -f- <<EOF
    2. apiVersion: v1
    3. kind: Pod
    4. metadata:
    5. name: privileged
    6. spec:
    7. containers:
    8. - name: pause
    9. image: k8s.gcr.io/pause
    10. securityContext:
    11. privileged: true
    12. EOF

    The output is similar to this:

    1. Error from server (Forbidden): error when creating "STDIN": pods "privileged" is forbidden: unable to validate against any pod security policy: [spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]

    Delete the pod before moving on:

    1. kubectl-user delete pod pause

    Run another pod

    Let’s try that again, slightly differently:

    1. kubectl-user create deployment pause --image=k8s.gcr.io/pause
    1. deployment "pause" created
    1. kubectl-user get pods
    1. No resources found.
    1. kubectl-user get events | head -n 2

    What happened? We already bound the psp:unprivileged role for our fake-user, why are we getting the error Error creating: pods "pause-7774d79b5-" is forbidden: no providers available to validate pod request? The answer lies in the source - replicaset-controller. Fake-user successfully created the deployment (which successfully created a replicaset), but when the replicaset went to create the pod it was not authorized to use the example podsecuritypolicy.

    In order to fix this, bind the psp:unprivileged role to the pod’s service account instead. In this case (since we didn’t specify it) the service account is default:

    1. kubectl-admin create rolebinding default:psp:unprivileged \
    2. --role=psp:unprivileged \
    3. --serviceaccount=psp-example:default
    1. rolebinding "default:psp:unprivileged" created

    Now if you give it a minute to retry, the replicaset-controller should eventually succeed in creating the pod:

    1. kubectl-user get pods --watch
    1. NAME READY STATUS RESTARTS AGE
    2. pause-7774d79b5-qrgcb 0/1 Pending 0 1s
    3. pause-7774d79b5-qrgcb 0/1 Pending 0 1s
    4. pause-7774d79b5-qrgcb 0/1 ContainerCreating 0 1s
    5. pause-7774d79b5-qrgcb 1/1 Running 0 2s

    Delete the namespace to clean up most of the example resources:

    1. kubectl-admin delete ns psp-example
    1. namespace "psp-example" deleted

    Note that PodSecurityPolicy resources are not namespaced, and must be cleaned up separately:

    1. kubectl-admin delete psp example
    1. podsecuritypolicy "example" deleted

    Example Policies

    This is the least restrictive policy you can create, equivalent to not using the pod security policy admission controller:

    policy/privileged-psp.yaml

    1. apiVersion: policy/v1beta1
    2. kind: PodSecurityPolicy
    3. metadata:
    4. name: privileged
    5. annotations:
    6. seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
    7. spec:
    8. privileged: true
    9. allowPrivilegeEscalation: true
    10. allowedCapabilities:
    11. - '*'
    12. volumes:
    13. - '*'
    14. hostNetwork: true
    15. hostPorts:
    16. - min: 0
    17. max: 65535
    18. hostIPC: true
    19. hostPID: true
    20. runAsUser:
    21. rule: 'RunAsAny'
    22. seLinux:
    23. rule: 'RunAsAny'
    24. rule: 'RunAsAny'
    25. fsGroup:
    26. rule: 'RunAsAny'

    This is an example of a restrictive policy that requires users to run as an unprivileged user, blocks possible escalations to root, and requires use of several security mechanisms.

    1. apiVersion: policy/v1beta1
    2. kind: PodSecurityPolicy
    3. metadata:
    4. name: restricted
    5. annotations:
    6. # docker/default identifies a profile for seccomp, but it is not particularly tied to the Docker runtime
    7. seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
    8. apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
    9. apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
    10. spec:
    11. privileged: false
    12. # Required to prevent escalations to root.
    13. allowPrivilegeEscalation: false
    14. requiredDropCapabilities:
    15. - ALL
    16. # Allow core volume types.
    17. volumes:
    18. - 'configMap'
    19. - 'emptyDir'
    20. - 'projected'
    21. - 'secret'
    22. - 'downwardAPI'
    23. # Assume that ephemeral CSI drivers & persistentVolumes set up by the cluster admin are safe to use.
    24. - 'csi'
    25. - 'persistentVolumeClaim'
    26. - 'ephemeral'
    27. hostNetwork: false
    28. hostIPC: false
    29. hostPID: false
    30. runAsUser:
    31. # Require the container to run without root privileges.
    32. rule: 'MustRunAsNonRoot'
    33. seLinux:
    34. # This policy assumes the nodes are using AppArmor rather than SELinux.
    35. rule: 'RunAsAny'
    36. supplementalGroups:
    37. rule: 'MustRunAs'
    38. # Forbid adding the root group.
    39. - min: 1
    40. max: 65535
    41. fsGroup:
    42. rule: 'MustRunAs'
    43. ranges:
    44. # Forbid adding the root group.
    45. - min: 1
    46. max: 65535
    47. readOnlyRootFilesystem: false

    See Pod Security Standards for more examples.

    Policy Reference

    Privileged

    Privileged - determines if any container in a pod can enable privileged mode. By default a container is not allowed to access any devices on the host, but a “privileged” container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices.

    Host namespaces

    HostPID - Controls whether the pod containers can share the host process ID namespace. Note that when paired with ptrace this can be used to escalate privileges outside of the container (ptrace is forbidden by default).

    HostIPC - Controls whether the pod containers can share the host IPC namespace.

    HostNetwork - Controls whether the pod may use the node network namespace. Doing so gives the pod access to the loopback device, services listening on localhost, and could be used to snoop on network activity of other pods on the same node.

    HostPorts - Provides a list of ranges of allowable ports in the host network namespace. Defined as a list of HostPortRange, with min(inclusive) and max(inclusive). Defaults to no allowed host ports.

    Volumes and file systems

    Volumes - Provides a list of allowed volume types. The allowable values correspond to the volume sources that are defined when creating a volume. For the complete list of volume types, see . Additionally, * may be used to allow all volume types.

    The recommended minimum set of allowed volumes for new PSPs are:

    • configMap
    • downwardAPI
    • emptyDir
    • persistentVolumeClaim
    • secret
    • projected

    Warning: PodSecurityPolicy does not limit the types of PersistentVolume objects that may be referenced by a PersistentVolumeClaim, and hostPath type PersistentVolumes do not support read-only access mode. Only trusted users should be granted permission to create PersistentVolume objects.

    FSGroup - Controls the supplemental group applied to some volumes.

    • MustRunAs - Requires at least one range to be specified. Uses the minimum value of the first range as the default. Validates against all ranges.
    • MayRunAs - Requires at least one range to be specified. Allows FSGroups to be left unset without providing a default. Validates against all ranges if FSGroups is set.
    • RunAsAny - No default provided. Allows any fsGroup ID to be specified.

    AllowedHostPaths - This specifies a list of host paths that are allowed to be used by hostPath volumes. An empty list means there is no restriction on host paths used. This is defined as a list of objects with a single pathPrefix field, which allows hostPath volumes to mount a path that begins with an allowed prefix, and a readOnly field indicating it must be mounted read-only. For example:

    Warning:

    There are many ways a container with unrestricted access to the host filesystem can escalate privileges, including reading data from other containers, and abusing the credentials of system services, such as Kubelet.

    Writeable hostPath directory volumes allow containers to write to the filesystem in ways that let them traverse the host filesystem outside the pathPrefix. readOnly: true, available in Kubernetes 1.11+, must be used on all allowedHostPaths to effectively limit access to the specified pathPrefix.

    FlexVolume drivers

    This specifies a list of FlexVolume drivers that are allowed to be used by flexvolume. An empty list or nil means there is no restriction on the drivers. Please make sure field contains the flexVolume volume type; no FlexVolume driver is allowed otherwise.

    For example:

    1. apiVersion: policy/v1beta1
    2. kind: PodSecurityPolicy
    3. metadata:
    4. name: allow-flex-volumes
    5. spec:
    6. # ... other spec fields
    7. volumes:
    8. - flexVolume
    9. allowedFlexVolumes:
    10. - driver: example/lvm
    11. - driver: example/cifs

    RunAsUser - Controls which user ID the containers are run with.

    • MustRunAs - Requires at least one range to be specified. Uses the minimum value of the first range as the default. Validates against all ranges.
    • MustRunAsNonRoot - Requires that the pod be submitted with a non-zero runAsUser or have the USER directive defined (using a numeric UID) in the image. Pods which have specified neither runAsNonRoot nor runAsUser settings will be mutated to set runAsNonRoot=true, thus requiring a defined non-zero numeric USER directive in the container. No default provided. Setting allowPrivilegeEscalation=false is strongly recommended with this strategy.
    • RunAsAny - No default provided. Allows any runAsUser to be specified.

    RunAsGroup - Controls which primary group ID the containers are run with.

    • MustRunAs - Requires at least one range to be specified. Uses the minimum value of the first range as the default. Validates against all ranges.
    • MayRunAs - Does not require that RunAsGroup be specified. However, when RunAsGroup is specified, they have to fall in the defined range.
    • RunAsAny - No default provided. Allows any runAsGroup to be specified.

    SupplementalGroups - Controls which group IDs containers add.

    • MustRunAs - Requires at least one range to be specified. Uses the minimum value of the first range as the default. Validates against all ranges.
    • MayRunAs - Requires at least one range to be specified. Allows supplementalGroups to be left unset without providing a default. Validates against all ranges if supplementalGroups is set.
    • RunAsAny - No default provided. Allows any supplementalGroups to be specified.

    Privilege Escalation

    These options control the allowPrivilegeEscalation container option. This bool directly controls whether the flag gets set on the container process. This flag will prevent setuid binaries from changing the effective user ID, and prevent files from enabling extra capabilities (e.g. it will prevent the use of the ping tool). This behavior is required to effectively enforce MustRunAsNonRoot.

    AllowPrivilegeEscalation - Gates whether or not a user is allowed to set the security context of a container to allowPrivilegeEscalation=true. This defaults to allowed so as to not break setuid binaries. Setting it to false ensures that no child process of a container can gain more privileges than its parent.

    DefaultAllowPrivilegeEscalation - Sets the default for the allowPrivilegeEscalation option. The default behavior without this is to allow privilege escalation so as to not break setuid binaries. If that behavior is not desired, this field can be used to default to disallow, while still permitting pods to request allowPrivilegeEscalation explicitly.

    Capabilities

    Linux capabilities provide a finer grained breakdown of the privileges traditionally associated with the superuser. Some of these capabilities can be used to escalate privileges or for container breakout, and may be restricted by the PodSecurityPolicy. For more details on Linux capabilities, see .

    The following fields take a list of capabilities, specified as the capability name in ALL_CAPS without the CAP_ prefix.

    AllowedCapabilities - Provides a list of capabilities that are allowed to be added to a container. The default set of capabilities are implicitly allowed. The empty set means that no additional capabilities may be added beyond the default set. * can be used to allow all capabilities.

    RequiredDropCapabilities - The capabilities which must be dropped from containers. These capabilities are removed from the default set, and must not be added. Capabilities listed in RequiredDropCapabilities must not be included in AllowedCapabilities or DefaultAddCapabilities.

    DefaultAddCapabilities - The capabilities which are added to containers by default, in addition to the runtime defaults. See the the documentation for your container runtime for information on working with Linux capabilities.

    SELinux

    • MustRunAs - Requires seLinuxOptions to be configured. Uses seLinuxOptions as the default. Validates against seLinuxOptions.
    • RunAsAny - No default provided. Allows any seLinuxOptions to be specified.

    AllowedProcMountTypes

    allowedProcMountTypes is a list of allowed ProcMountTypes. Empty or nil indicates that only the DefaultProcMountType may be used.

    DefaultProcMount uses the container runtime defaults for readonly and masked paths for /proc. Most container runtimes mask certain paths in /proc to avoid accidental security exposure of special devices or information. This is denoted as the string Default.

    The only other ProcMountType is UnmaskedProcMount, which bypasses the default masking behavior of the container runtime and ensures the newly created /proc the container stays intact with no modifications. This is denoted as the string Unmasked.

    AppArmor

    Controlled via annotations on the PodSecurityPolicy. Refer to the .

    As of Kubernetes v1.19, you can use the seccompProfile field in the securityContext of Pods or containers to control use of seccomp profiles. In prior versions, seccomp was controlled by adding annotations to a Pod. The same PodSecurityPolicies can be used with either version to enforce how these fields or annotations are applied.

    seccomp.security.alpha.kubernetes.io/defaultProfileName - Annotation that specifies the default seccomp profile to apply to containers. Possible values are:

    • unconfined - Seccomp is not applied to the container processes (this is the default in Kubernetes), if no alternative is provided.

    • runtime/default - The default container runtime profile is used.

    • docker/default - The Docker default seccomp profile is used. Deprecated as of Kubernetes 1.11. Use runtime/default instead.

    • localhost/<path> - Specify a profile as a file on the node located at <seccomp_root>/<path>, where <seccomp_root> is defined via the --seccomp-profile-root flag on the Kubelet. If the --seccomp-profile-root flag is not defined, the default path will be used, which is <root-dir>/seccomp where <root-dir> is specified by the --root-dir flag.

      Note: The --seccomp-profile-root flag is deprecated since Kubernetes v1.19. Users are encouraged to use the default path.

    seccomp.security.alpha.kubernetes.io/allowedProfileNames - Annotation that specifies which values are allowed for the pod seccomp annotations. Specified as a comma-delimited list of allowed values. Possible values are those listed above, plus * to allow all profiles. Absence of this annotation means that the default cannot be changed.

    Sysctl

    By default, all safe sysctls are allowed.

    • forbiddenSysctls - excludes specific sysctls. You can forbid a combination of safe and unsafe sysctls in the list. To forbid setting any sysctls, use * on its own.
    • allowedUnsafeSysctls - allows specific sysctls that had been disallowed by the default list, so long as these are not listed in .

    Refer to the Sysctl documentation.

    • See to learn about the future of pod security policy.

    • See Pod Security Standards for policy recommendations.

    Last modified April 26, 2022 at 1:39 PM PST: