Configure a Security Context for a Pod or Container

    • Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).

    • : Objects are assigned security labels.

    • Running as privileged or unprivileged.

    • Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.

    • : Use program profiles to restrict the capabilities of individual programs.

    • Seccomp: Filter a process’s system calls.

    • : Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the flag gets set on the container process. allowPrivilegeEscalation is always true when the container:

      • is run as privileged, or
      • has CAP_SYS_ADMIN
    • readOnlyRootFilesystem: Mounts the container’s root filesystem as read-only.

    The above bullets are not a complete set of security context settings — please see SecurityContext for a comprehensive list.

    You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:

    To check the version, enter kubectl version.

    Set the security context for a Pod

    To specify security settings for a Pod, include the securityContext field in the Pod specification. The securityContext field is a PodSecurityContext object. The security settings that you specify for a Pod apply to all Containers in the Pod. Here is a configuration file for a Pod that has a securityContext and an emptyDir volume:

    In the configuration file, the runAsUser field specifies that for any Containers in the Pod, all processes run with user ID 1000. The runAsGroup field specifies the primary group ID of 3000 for all processes within any containers of the Pod. If this field is omitted, the primary group ID of the containers will be root(0). Any files created will also be owned by user 1000 and group 3000 when runAsGroup is specified. Since fsGroup field is specified, all processes of the container are also part of the supplementary group ID 2000. The owner for volume /data/demo and any files created in that volume will be Group ID 2000.

    Create the Pod:

    1. kubectl apply -f https://k8s.io/examples/pods/security/security-context.yaml

    Verify that the Pod’s Container is running:

    1. kubectl get pod security-context-demo

    Get a shell to the running Container:

    1. kubectl exec -it security-context-demo -- sh

    In your shell, list the running processes:

    1. ps

    The output shows that the processes are running as user 1000, which is the value of runAsUser:

    1. PID USER TIME COMMAND
    2. 1 1000 0:00 sleep 1h
    3. 6 1000 0:00 sh
    4. ...

    In your shell, navigate to /data, and list the one directory:

    1. cd /data
    2. ls -l

    The output shows that the /data/demo directory has group ID 2000, which is the value of fsGroup.

    1. drwxrwsrwx 2 root 2000 4096 Jun 6 20:08 demo

    In your shell, navigate to /data/demo, and create a file:

    1. cd demo
    2. echo hello > testfile

    List the file in the /data/demo directory:

    1. ls -l

    The output shows that testfile has group ID 2000, which is the value of fsGroup.

    1. -rw-r--r-- 1 1000 2000 6 Jun 6 20:08 testfile

    Run the following command:

    1. id

    The output is similar to this:

    1. uid=1000 gid=3000 groups=2000

    Exit your shell:

    Configure volume permission and ownership change policy for Pods

    FEATURE STATE: Kubernetes v1.23 [stable]

    By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod’s securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup. You can use the fsGroupChangePolicy field inside a securityContext to control the way that Kubernetes checks and manages ownership and permissions for a volume.

    fsGroupChangePolicy - fsGroupChangePolicy defines behavior for changing ownership and permission of the volume before being exposed inside a Pod. This field only applies to volume types that support fsGroup controlled ownership and permissions. This field has two possible values:

    • OnRootMismatch: Only change permissions and ownership if the permission and the ownership of root directory does not match with expected permissions of the volume. This could help shorten the time it takes to change ownership and permission of a volume.
    • Always: Always change permission and ownership of the volume when volume is mounted.

    For example:

    1. securityContext:
    2. runAsUser: 1000
    3. runAsGroup: 3000
    4. fsGroupChangePolicy: "OnRootMismatch"

    Note: This field has no effect on ephemeral volume types such as , configMap, and .

    FEATURE STATE: Kubernetes v1.26 [stable]

    If you deploy a driver which supports the VOLUME_MOUNT_GROUP NodeServiceCapability, the process of setting file ownership and permissions based on the fsGroup specified in the securityContext will be performed by the CSI driver instead of Kubernetes. In this case, since Kubernetes doesn’t perform any ownership and permission change, fsGroupChangePolicy does not take effect, and as specified by CSI, the driver is expected to mount the volume with the provided fsGroup, resulting in a volume that is readable/writable by the .

    Set the security context for a Container

    To specify security settings for a Container, include the securityContext field in the Container manifest. The securityContext field is a object. Security settings that you specify for a Container apply only to the individual Container, and they override settings made at the Pod level when there is overlap. Container settings do not affect the Pod’s Volumes.

    Here is the configuration file for a Pod that has one Container. Both the Pod and the Container have a securityContext field:

    pods/security/security-context-2.yaml Configure a Security Context for a Pod or Container - 图2

    1. apiVersion: v1
    2. kind: Pod
    3. metadata:
    4. name: security-context-demo-2
    5. spec:
    6. securityContext:
    7. runAsUser: 1000
    8. containers:
    9. - name: sec-ctx-demo-2
    10. image: gcr.io/google-samples/node-hello:1.0
    11. securityContext:
    12. runAsUser: 2000
    13. allowPrivilegeEscalation: false

    Create the Pod:

    1. kubectl apply -f https://k8s.io/examples/pods/security/security-context-2.yaml

    Verify that the Pod’s Container is running:

    1. kubectl get pod security-context-demo-2

    Get a shell into the running Container:

    1. kubectl exec -it security-context-demo-2 -- sh

    In your shell, list the running processes:

    1. ps aux

    The output shows that the processes are running as user 2000. This is the value of runAsUser specified for the Container. It overrides the value 1000 that is specified for the Pod.

    1. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
    2. 2000 1 0.0 0.0 4336 764 ? Ss 20:36 0:00 /bin/sh -c node server.js
    3. 2000 8 0.1 0.5 772124 22604 ? Sl 20:36 0:00 node server.js
    4. ...

    Exit your shell:

    1. exit

    Set capabilities for a Container

    With Linux capabilities, you can grant certain privileges to a process without granting all the privileges of the root user. To add or remove Linux capabilities for a Container, include the capabilities field in the securityContext section of the Container manifest.

    First, see what happens when you don’t include a capabilities field. Here is configuration file that does not add or remove any Container capabilities:

    1. apiVersion: v1
    2. kind: Pod
    3. metadata:
    4. name: security-context-demo-3
    5. spec:
    6. containers:
    7. - name: sec-ctx-3
    8. image: gcr.io/google-samples/node-hello:1.0

    Create the Pod:

    1. kubectl apply -f https://k8s.io/examples/pods/security/security-context-3.yaml

    Verify that the Pod’s Container is running:

    1. kubectl get pod security-context-demo-3

    Get a shell into the running Container:

    1. kubectl exec -it security-context-demo-3 -- sh

    In your shell, list the running processes:

    The output shows the process IDs (PIDs) for the Container:

    1. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
    2. root 1 0.0 0.0 4336 796 ? Ss 18:17 0:00 /bin/sh -c node server.js
    3. root 5 0.1 0.5 772124 22700 ? Sl 18:17 0:00 node server.js
    1. cd /proc/1
    2. cat status

    The output shows the capabilities bitmap for the process:

    1. ...
    2. CapPrm: 00000000a80425fb
    3. CapEff: 00000000a80425fb
    4. ...

    Make a note of the capabilities bitmap, and then exit your shell:

    1. exit

    Next, run a Container that is the same as the preceding container, except that it has additional capabilities set.

    Here is the configuration file for a Pod that runs one Container. The configuration adds the CAP_NET_ADMIN and CAP_SYS_TIME capabilities:

    pods/security/security-context-4.yaml Configure a Security Context for a Pod or Container - 图4

    1. apiVersion: v1
    2. kind: Pod
    3. name: security-context-demo-4
    4. spec:
    5. containers:
    6. - name: sec-ctx-4
    7. image: gcr.io/google-samples/node-hello:1.0
    8. securityContext:
    9. add: ["NET_ADMIN", "SYS_TIME"]

    Create the Pod:

    1. kubectl apply -f https://k8s.io/examples/pods/security/security-context-4.yaml

    Get a shell into the running Container:

    1. kubectl exec -it security-context-demo-4 -- sh

    In your shell, view the capabilities for process 1:

    1. cd /proc/1
    2. cat status

    The output shows capabilities bitmap for the process:

    1. ...
    2. CapPrm: 00000000aa0435fb
    3. CapEff: 00000000aa0435fb
    4. ...

    Compare the capabilities of the two Containers:

    1. 00000000a80425fb
    2. 00000000aa0435fb

    In the capability bitmap of the first container, bits 12 and 25 are clear. In the second container, bits 12 and 25 are set. Bit 12 is CAP_NET_ADMIN, and bit 25 is CAP_SYS_TIME. See for definitions of the capability constants.

    Note: Linux capability constants have the form CAP_XXX. But when you list capabilities in your container manifest, you must omit the CAP_ portion of the constant. For example, to add CAP_SYS_TIME, include SYS_TIME in your list of capabilities.

    To set the Seccomp profile for a Container, include the seccompProfile field in the securityContext section of your Pod or Container manifest. The seccompProfile field is a object consisting of type and localhostProfile. Valid options for type include RuntimeDefault, Unconfined, and Localhost. localhostProfile must only be set if type: Localhost. It indicates the path of the pre-configured profile on the node, relative to the kubelet’s configured Seccomp profile location (configured with the --root-dir flag).

    Here is an example that sets the Seccomp profile to the node’s container runtime default profile:

    1. ...
    2. securityContext:
    3. seccompProfile:
    4. type: RuntimeDefault

    Here is an example that sets the Seccomp profile to a pre-configured file at <kubelet-root-dir>/seccomp/my-profiles/profile-allow.json:

    1. ...
    2. securityContext:
    3. seccompProfile:
    4. type: Localhost
    5. localhostProfile: my-profiles/profile-allow.json

    Assign SELinux labels to a Container

    To assign SELinux labels to a Container, include the seLinuxOptions field in the securityContext section of your Pod or Container manifest. The seLinuxOptions field is an object. Here’s an example that applies an SELinux level:

    Note: To assign SELinux labels, the SELinux security module must be loaded on the host operating system.

    FEATURE STATE: Kubernetes v1.25 [alpha]

    By default, the contrainer runtime recursively assigns SELinux label to all files on all Pod volumes. To speed up this process, Kubernetes can change the SELinux label of a volume instantly by using a mount option -o context=<label>.

    To benefit from this speedup, all these conditions must be met:

    • Alpha feature gates ReadWriteOncePod and SELinuxMountReadWriteOncePod must be enabled.
    • Pod must use PersistentVolumeClaim with accessModes: ["ReadWriteOncePod"].
    • Pod (or all its Containers that use the PersistentVolumeClaim) must have seLinuxOptions set.
    • The corresponding PersistentVolume must be either a volume that uses a driver, or a volume that uses the legacy iscsi volume type.
      • If you use a volume backed by a CSI driver, that CSI driver must announce that it supports mounting with -o context by setting spec.seLinuxMount: true in its CSIDriver instance.

    For any other volume types, SELinux relabelling happens another way: the container runtime recursively changes the SELinux label for all inodes (files and directories) in the volume. The more files and directories in the volume, the longer that relabelling takes.

    Note: In Kubernetes 1.25, the kubelet loses track of volume labels after restart. In other words, then kubelet may refuse to start Pods with errors similar to “conflicting SELinux labels of volume”, while there are no conflicting labels in Pods. Make sure nodes are fully drained before restarting kubelet.

    Discussion

    The security context for a Pod applies to the Pod’s Containers and also to the Pod’s Volumes when applicable. Specifically fsGroup and seLinuxOptions are applied to Volumes as follows:

    • fsGroup: Volumes that support ownership management are modified to be owned and writable by the GID specified in fsGroup. See the Ownership Management design document for more details.

    • seLinuxOptions: Volumes that support SELinux labeling are relabeled to be accessible by the label specified under seLinuxOptions. Usually you only need to set the level section. This sets the label given to all Containers in the Pod as well as the Volumes.

    Delete the Pod:

    1. kubectl delete pod security-context-demo
    2. kubectl delete pod security-context-demo-2
    3. kubectl delete pod security-context-demo-3

    What’s next