Tang server encryption key management

    You must perform the rekeying operation for every node before you can delete the old key from the Tang server. The following sections provide procedures for rekeying and deleting old keys.

    The Tang server uses to generate new keys and stores them in the /var/db/tang directory by default. To recover the Tang server in the event of a failure, back up this directory. The keys are sensitive and because they are able to perform the boot disk decryption of all hosts that have used them, the keys must be protected accordingly.

    Procedure

    • Copy the backup key from the /var/db/tang directory to the temp directory from which you can restore the key.

    You can recover the keys for a Tang server by accessing the keys from a backup.

    Procedure

    • Restore the key from your backup folder to the /var/db/tang/ directory.

      When the Tang server starts up, it advertises and uses these restored keys.

    This procedure uses a set of three Tang servers, each with unique keys, as an example.

    Using redundant Tang servers reduces the chances of nodes failing to boot automatically.

    Rekeying a Tang server, and all associated NBDE-encrypted nodes, is a three-step procedure.

    Prerequisites

    • A working Network-Bound Disk Encryption (NBDE) installation on one or more nodes.

    Procedure

    1. Generate a new Tang server key.

    2. Rekey all NBDE-encrypted nodes so they use the new key.

    3. Delete the old Tang server key.

    Figure 1. Example workflow for rekeying a Tang server

    Prerequisites

    • A root shell on the Linux machine running the Tang server.

    • To facilitate verification of the Tang server key rotation, encrypt a small test file with the old key:

    • Verify that the encryption succeeded and the file can be decrypted to produce the same string plaintext:

      1. # clevis decrypt </tmp/encrypted.oldkey

    Procedure

    1. Locate and access the directory that stores the Tang server key. This is usually the /var/db/tang directory. Check the currently advertised key thumbprint:

      1. # tang-show-keys 7500

      Example output

      1. 36AHjNH3NZDSnlONLz1-V4ie6t8
    2. Enter the Tang server key directory:

      1. # cd /var/db/tang/
    3. List the current Tang server keys:

      1. # ls -A1

      Example output

      1. 36AHjNH3NZDSnlONLz1-V4ie6t8.jwk
      2. gJZiNPMLRBnyo_ZKfK4_5SrnHYo.jwk

      During normal Tang server operations, there are two .jwk files in this directory: one for signing and verification, and another for key derivation.

    4. Disable advertisement of the old keys:

      1. # for key in *.jwk; do \
      2. mv -- "$key" ".$key"; \
      3. done

      New clients setting up Network-Bound Disk Encryption (NBDE) or requesting keys will no longer see the old keys. Existing clients can still access and use the old keys until they are deleted. The Tang server reads but does not advertise keys stored in UNIX hidden files, which start with the . character.

    5. Generate a new key:

      1. # /usr/libexec/tangd-keygen /var/db/tang
    6. List the current Tang server keys to verify the old keys are no longer advertised, as they are now hidden files, and new keys are present:

      1. # ls -A1

      Example output

      1. .36AHjNH3NZDSnlONLz1-V4ie6t8.jwk
      2. .gJZiNPMLRBnyo_ZKfK4_5SrnHYo.jwk
      3. Bp8XjITceWSN_7XFfW7WfJDTomE.jwk
      4. WOjQYkyK7DxY_T5pMncMO5w0f6E.jwk

      More recent Tang server installations include a helper /usr/libexec/tangd-rotate-keys directory that takes care of disabling advertisement and generating the new keys simultaneously.

    7. If you are running multiple Tang servers behind a load balancer that share the same key material, ensure the changes made here are properly synchronized across the entire set of servers before proceeding.

    Verification

    1. Verify that the Tang server is advertising the new key, and not advertising the old key:

      1. # tang-show-keys 7500

      Example output

    2. Verify that the old key, while not advertised, is still available to decryption requests:

      1. # clevis decrypt </tmp/encrypted.oldkey

    You can rekey all of the nodes on a remote cluster by using a DaemonSet object without incurring any downtime to the remote cluster.

    Prerequisites

    • cluster-admin access to all clusters with Network-Bound Disk Encryption (NBDE) nodes.

    • All Tang servers must be accessible to every NBDE node undergoing rekeying, even if the keys of a Tang server have not changed.

    • Obtain the Tang server URL and key thumbprint for every Tang server.

    Procedure

    1. Create a DaemonSet object based on the following template. This template sets up three redundant Tang servers, but can be easily adapted to other situations. Change the Tang server URLs and thumbprints in the NEW_TANG_PIN environment to suit your environment:

      1. apiVersion: apps/v1
      2. kind: DaemonSet
      3. metadata:
      4. name: tang-rekey
      5. namespace: openshift-machine-config-operator
      6. spec:
      7. selector:
      8. matchLabels:
      9. name: tang-rekey
      10. template:
      11. metadata:
      12. name: tang-rekey
      13. spec:
      14. containers:
      15. - name: tang-rekey
      16. image: registry.access.redhat.com/ubi9/ubi-minimal:latest
      17. command:
      18. - "/sbin/chroot"
      19. - "/host"
      20. - "/bin/bash"
      21. - "-ec"
      22. args:
      23. - |
      24. rm -f /tmp/rekey-complete || true
      25. echo "Current tang pin:"
      26. clevis-luks-list -d $ROOT_DEV -s 1
      27. echo "Applying new tang pin: $NEW_TANG_PIN"
      28. clevis-luks-edit -f -d $ROOT_DEV -s 1 -c "$NEW_TANG_PIN"
      29. echo "Pin applied successfully"
      30. touch /tmp/rekey-complete
      31. sleep infinity
      32. readinessProbe:
      33. exec:
      34. command:
      35. - cat
      36. - /host/tmp/rekey-complete
      37. initialDelaySeconds: 30
      38. periodSeconds: 10
      39. env:
      40. - name: ROOT_DEV
      41. value: /dev/disk/by-partlabel/root
      42. - name: NEW_TANG_PIN
      43. value: >-
      44. {"t":1,"pins":{"tang":[
      45. {"url":"http://tangserver01:7500","thp":"WOjQYkyK7DxY_T5pMncMO5w0f6E"},
      46. {"url":"http://tangserver02:7500","thp":"I5Ynh2JefoAO3tNH9TgI4obIaXI"},
      47. {"url":"http://tangserver03:7500","thp":"38qWZVeDKzCPG9pHLqKzs6k1ons"}
      48. ]}}
      49. volumeMounts:
      50. - name: hostroot
      51. mountPath: /host
      52. securityContext:
      53. privileged: true
      54. volumes:
      55. - name: hostroot
      56. hostPath:
      57. path: /
      58. nodeSelector:
      59. kubernetes.io/os: linux
      60. priorityClassName: system-node-critical
      61. restartPolicy: Always
      62. serviceAccount: machine-config-daemon
      63. serviceAccountName: machine-config-daemon

      In this case, even though you are rekeying , you must specify not only the new thumbprint for tangserver01, but also the current thumbprints for all other Tang servers. Failure to specify all thumbprints for a rekeying operation opens up the opportunity for a man-in-the-middle attack.

    2. To distribute the daemon set to every cluster that must be rekeyed, run the following command:

      1. $ oc apply -f tang-rekey.yaml

      However, to run at scale, wrap the daemon set in an ACM policy. This ACM configuration must contain one policy to deploy the daemon set, a second policy to check that all the daemon set pods are READY, and a placement rule to apply it to the appropriate set of clusters.

    After validating that the daemon set has successfully rekeyed all servers, delete the daemon set. If you do not delete the daemon set, it must be deleted before the next rekeying operation.

    Verification

    After you distribute the daemon set, monitor the daemon sets to ensure that the rekeying has completed successfully. The script in the example daemon set terminates with an error if the rekeying failed, and remains in the CURRENT state if successful. There is also a readiness probe that marks the pod as READY when the rekeying has completed successfully.

    • This is an example of the output listing for the daemon set before the rekeying has completed:

      1. $ oc get -n openshift-machine-config-operator ds tang-rekey

      Example output

      1. NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
      2. tang-rekey 1 1 0 1 0 kubernetes.io/os=linux 11s

    Rekeying usually takes a few minutes to complete.

    To determine if the error condition from rekeying the Tang servers is temporary, perform the following procedure. Temporary error conditions might include:

    • Temporary network outages

    • Tang server maintenance

    Generally, when these types of temporary error conditions occur, you can wait until the daemon set succeeds in resolving the error or you can delete the daemon set and not try again until the temporary error condition has been resolved.

    Procedure

    1. Restart the pod that performs the rekeying operation using the normal Kubernetes pod restart policy.

    2. If any of the associated Tang servers are unavailable, try rekeying until all the servers are back online.

    If, after rekeying the Tang servers, the READY count does not equal the DESIRED count after an extended period of time, it might indicate a permanent failure condition. In this case, the following conditions might apply:

    • The Tang server is decommissioned or the keys are permanently lost.

    Prerequisites

    • The commands shown in this procedure can be run on the Tang server or on any Linux system that has network access to the Tang server.

    Procedure

    1. Validate the Tang server configuration by performing a simple encrypt and decrypt operation on each Tang server’s configuration as defined in the daemon set.

      This is an example of an encryption and decryption attempt with a bad thumbprint:

      1. $ echo "okay" | clevis encrypt tang \
      2. '{"url":"http://tangserver02:7500","thp":"badthumbprint"}' | \
      3. clevis decrypt

      Example output

      1. Unable to fetch advertisement: 'http://tangserver02:7500/adv/badthumbprint'!

      This is an example of an encryption and decryption attempt with a good thumbprint:

      1. $ echo "okay" | clevis encrypt tang \
      2. '{"url":"http://tangserver03:7500","thp":"goodthumbprint"}' | \
      3. clevis decrypt

      Example output

      1. okay
    2. After you identify the root cause, remedy the underlying situation:

      1. Delete the non-working daemon set.

      2. Edit the daemon set definition to fix the underlying issue. This might include any of the following actions:

        • Edit a Tang server entry to correct the URL and thumbprint.

        • Remove a Tang server that is no longer in service.

        • Add a new Tang server that is a replacement for a decommissioned server.

    1. Distribute the updated daemon set again.

    When replacing, removing, or adding a Tang server from a configuration, the rekeying operation will succeed as long as at least one original server is still functional, including the server currently being rekeyed. If none of the original Tang servers are functional or can be recovered, recovery of the system is impossible and you must redeploy the affected nodes.

    Verification

    Check the logs from each pod in the daemon set to determine whether the rekeying completed successfully. If the rekeying is not successful, the logs might indicate the failure condition.

    1. Locate the name of the container that was created by the daemon set:

      Example output

      1. openshift-machine-config-operator tang-rekey-7ks6h 1/1 Running 20 (8m39s ago) 89m
    2. Print the logs from the container. The following log is from a completed successful rekeying operation:

      1. $ oc logs tang-rekey-7ks6h

      Example output

      1. Current tang pin:
      2. 1: sss '{"t":1,"pins":{"tang":[{"url":"http://10.46.55.192:7500"},{"url":"http://10.46.55.192:7501"},{"url":"http://10.46.55.192:7502"}]}}'
      3. Applying new tang pin: {"t":1,"pins":{"tang":[
      4. {"url":"http://tangserver01:7500","thp":"WOjQYkyK7DxY_T5pMncMO5w0f6E"},
      5. {"url":"http://tangserver02:7500","thp":"I5Ynh2JefoAO3tNH9TgI4obIaXI"},
      6. {"url":"http://tangserver03:7500","thp":"38qWZVeDKzCPG9pHLqKzs6k1ons"}
      7. ]}}
      8. Updating binding...
      9. Binding edited successfully
      10. Pin applied successfully

    Prerequisites

    • A root shell on the Linux machine running the Tang server.

    Procedure

    1. Locate and access the directory where the Tang server key is stored. This is usually the /var/db/tang directory:

      1. # cd /var/db/tang/
    2. List the current Tang server keys, showing the advertised and unadvertised keys:

      1. # ls -A1

      Example output

      1. .36AHjNH3NZDSnlONLz1-V4ie6t8.jwk
      2. .gJZiNPMLRBnyo_ZKfK4_5SrnHYo.jwk
      3. Bp8XjITceWSN_7XFfW7WfJDTomE.jwk
      4. WOjQYkyK7DxY_T5pMncMO5w0f6E.jwk
    3. Delete the old keys:

      1. # rm .*.jwk
    4. List the current Tang server keys to verify the unadvertised keys are no longer present:

      1. # ls -A1

      Example output

      1. Bp8XjITceWSN_7XFfW7WfJDTomE.jwk
      2. WOjQYkyK7DxY_T5pMncMO5w0f6E.jwk

    Verification

    At this point, the server still advertises the new keys, but an attempt to decrypt based on the old key will fail.

    1. Query the Tang server for the current advertised key thumbprints:

      1. # tang-show-keys 7500

      Example output

      1. WOjQYkyK7DxY_T5pMncMO5w0f6E
    2. Decrypt the test file created earlier to verify decryption against the old keys fails:

      If you are running multiple Tang servers behind a load balancer that share the same key material, ensure the changes made are properly synchronized across the entire set of servers before proceeding.