Troubleshooting the control plane machine set

    You can verify the existence and state of the custom resource (CR).

    Procedure

    • Determine the state of the CR by running the following command:

      • A result of Active indicates that the ControlPlaneMachineSet CR exists and is activated. No administrator action is required.

      • A result of Inactive indicates that a ControlPlaneMachineSet CR exists but is not activated.

      • A result of NotFound indicates that there is no existing ControlPlaneMachineSet CR.

    Next steps

    To use the control plane machine set, you must ensure that a ControlPlaneMachineSet CR with the correct settings for your cluster exists.

    • If your cluster has an existing CR, you must verify that the configuration in the CR is correct for your cluster.

    • If your cluster does not have an existing CR, you must create one with the correct configuration for your cluster.

    Additional resources

    The internalLoadBalancer parameter is required in both the and control plane Machine custom resources (CRs) for Azure. If this parameter is not preconfigured on your cluster, you must add it to both CRs.

    For more information about where this parameter is located in the Azure provider specification, see the sample Azure provider specification. The placement in the control plane Machine CR is similar.

    Procedure

    1. List the control plane machines in your cluster by running the following command:

      1. $ oc get machines \
      2. -l machine.openshift.io/cluster-api-machine-role==master \
      3. -n openshift-machine-api
    2. For each control plane machine, edit the CR by running the following command:

    3. Add the internalLoadBalancer parameter with the correct details for your cluster and save your changes.

    4. Edit your control plane machine set CR by running the following command:

      1. $ oc edit controlplanemachineset.machine.openshift.io cluster \
      2. -n openshift-machine-api
    5. Add the parameter with the correct details for your cluster and save your changes.

    Next steps

    • For clusters that are configured to use the OnDelete update strategy, you must replace your control plane machines manually.

    Additional resources

    For example, while performing remediation, the machine health check might delete a control plane machine that is hosting etcd. If the etcd member is not reachable at that time, the etcd Operator becomes degraded.

    When the etcd Operator is degraded, manual intervention is required to force the Operator to remove the failed member and restore the cluster state.

    Procedure

    1. List the control plane machines in your cluster by running the following command:

      Any of the following conditions might indicate a failed control plane machine:

      • The STATE value is stopped.

      • The PHASE value is Failed.

      • The PHASE value is Deleting for more than ten minutes.

    2. Edit the machine CR for the failed control plane machine by running the following command:

    3. Remove the contents of the lifecycleHooks parameter from the failed control plane machine and save your changes.

    Additional resources