Troubleshooting node network configuration

    • The configuration fails to be applied on the host.

    • The host loses connection to the default gateway.

    • The host loses connection to the API server.

    You can apply changes to the node network configuration across your entire cluster by applying a node network configuration policy. If you apply an incorrect configuration, you can use the following example to troubleshoot and correct the failed node network policy.

    In this example, a Linux bridge policy is applied to an example cluster that has three control plane nodes (master) and three compute (worker) nodes. The policy fails to be applied because it references an incorrect interface. To find the error, investigate the available NMState resources. You can then update the policy with the correct configuration.

    Procedure

    1. Create a policy and apply it to your cluster. The following example creates a simple bridge on the interface:

      1. $ oc apply -f ens01-bridge-testfail.yaml

      Example output

      1. nodenetworkconfigurationpolicy.nmstate.io/ens01-bridge-testfail created
      1. $ oc get nncp

      The output shows that the policy failed:

      Example output

      1. NAME STATUS
      2. ens01-bridge-testfail FailedToConfigure

      However, the policy status alone does not indicate if it failed on all nodes or a subset of nodes.

    2. List the node network configuration enactments to see if the policy was successful on any of the nodes. If the policy failed for only a subset of nodes, it suggests that the problem is with a specific node configuration. If the policy failed on all nodes, it suggests that the problem is with the policy.

      The output shows that the policy failed on all nodes:

      Example output

      1. NAME STATUS
      2. control-plane-1.ens01-bridge-testfail FailedToConfigure
      3. control-plane-2.ens01-bridge-testfail FailedToConfigure
      4. control-plane-3.ens01-bridge-testfail FailedToConfigure
      5. compute-1.ens01-bridge-testfail FailedToConfigure
      6. compute-2.ens01-bridge-testfail FailedToConfigure
      7. compute-3.ens01-bridge-testfail FailedToConfigure
    3. View one of the failed enactments and look at the traceback. The following command uses the output tool jsonpath to filter the output:

      1. $ oc get nnce compute-1.ens01-bridge-testfail -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}'

      This command returns a large traceback that has been edited for brevity:

      1. error reconciling NodeNetworkConfigurationPolicy at desired state apply: , failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1' ''
      2. ...
      3. libnmstate.error.NmstateVerificationError:
      4. desired
      5. =======
      6. ---
      7. name: br1
      8. type: linux-bridge
      9. state: up
      10. bridge:
      11. options:
      12. group-forward-mask: 0
      13. mac-ageing-time: 300
      14. multicast-snooping: true
      15. stp:
      16. enabled: false
      17. hello-time: 2
      18. max-age: 20
      19. priority: 32768
      20. port:
      21. - name: ens01
      22. description: Linux bridge with the wrong port
      23. address: []
      24. auto-dns: true
      25. auto-gateway: true
      26. auto-routes: true
      27. dhcp: true
      28. enabled: true
      29. ipv6:
      30. enabled: false
      31. mac-address: 01-23-45-67-89-AB
      32. mtu: 1500
      33. current
      34. =======
      35. ---
      36. name: br1
      37. type: linux-bridge
      38. state: up
      39. bridge:
      40. options:
      41. group-forward-mask: 0
      42. mac-ageing-time: 300
      43. multicast-snooping: true
      44. stp:
      45. enabled: false
      46. forward-delay: 15
      47. hello-time: 2
      48. max-age: 20
      49. priority: 32768
      50. port: []
      51. description: Linux bridge with the wrong port
      52. address: []
      53. auto-dns: true
      54. auto-gateway: true
      55. auto-routes: true
      56. dhcp: true
      57. enabled: true
      58. enabled: false
      59. mac-address: 01-23-45-67-89-AB
      60. mtu: 1500
      61. difference
      62. ==========
      63. --- desired
      64. +++ current
      65. @@ -13,8 +13,7 @@
      66. hello-time: 2
      67. max-age: 20
      68. priority: 32768
      69. - port:
      70. - - name: ens01
      71. + port: []
      72. description: Linux bridge with the wrong port
      73. ipv4:
      74. address: []
      75. line 651, in _assert_interfaces_equal\n current_state.interfaces[ifname],\nlibnmstate.error.NmstateVerificationError:

      The NmstateVerificationError lists the desired policy configuration, the current configuration of the policy on the node, and the difference highlighting the parameters that do not match. In this example, the port is included in the difference, which suggests that the problem is the port configuration in the policy.

    4. To ensure that the policy is configured properly, view the network configuration for one or all of the nodes by requesting the NodeNetworkState object. The following command returns the network configuration for the control-plane-1 node:

      1. $ oc get nns control-plane-1 -o yaml

      The output shows that the interface name on the nodes is ens1 but the failed policy incorrectly uses ens01:

      Example output

    5. Correct the error by editing the existing policy:

      1. $ oc edit nncp ens01-bridge-testfail
      1. ...
      2. port:
      3. - name: ens1

      Save the policy to apply the correction.

    6. Check the status of the policy to ensure it updated successfully:

      1. $ oc get nncp

      Example output

      1. NAME STATUS
      2. ens01-bridge-testfail SuccessfullyConfigured