MetalLB logging, troubleshooting, and support

    MetalLB uses FRRouting (FRR) in a container with the default setting of generates a lot of logging. You can control the verbosity of the logs generated by setting the logLevel as illustrated in this example.

    Gain a deeper insight into MetalLB by setting the logLevel to debug as follows:

    Prerequisites

    • You have access to the cluster as a user with the cluster-admin role.

    • You have installed the OpenShift CLI (oc).

    Procedure

    1. Create a file, such as setdebugloglevel.yaml, with content like the following example:

    2. Apply the configuration:

      1. $ oc replace -f setdebugloglevel.yaml
    3. Display the names of the speaker pods:

      1. $ oc get -n metallb-system pods -l component=speaker

      Example output

      1. NAME READY STATUS RESTARTS AGE
      2. speaker-2m9pm 4/4 Running 0 9m19s
      3. speaker-7m4qw 3/4 Running 0 19s
      4. speaker-szlmx 4/4 Running 0 9m19s

      Speaker and controller pods are recreated to ensure the updated logging level is applied. The logging level is modified for all the components of MetalLB.

    4. View the speaker logs:

      1. $ oc logs -n metallb-system speaker-7m4qw -c speaker

      Example output

      1. {"branch":"main","caller":"main.go:92","commit":"3d052535","goversion":"gc / go1.17.1 / amd64","level":"info","msg":"MetalLB speaker starting (commit 3d052535, branch main)","ts":"2022-05-17T09:55:05Z","version":""}
      2. {"caller":"announcer.go:110","event":"createARPResponder","interface":"ens4","level":"info","msg":"created ARP responder for interface","ts":"2022-05-17T09:55:05Z"}
      3. {"caller":"announcer.go:119","event":"createNDPResponder","interface":"ens4","level":"info","msg":"created NDP responder for interface","ts":"2022-05-17T09:55:05Z"}
      4. {"caller":"announcer.go:110","event":"createARPResponder","interface":"tun0","level":"info","msg":"created ARP responder for interface","ts":"2022-05-17T09:55:05Z"}
      5. {"caller":"announcer.go:119","event":"createNDPResponder","interface":"tun0","level":"info","msg":"created NDP responder for interface","ts":"2022-05-17T09:55:05Z"}
      6. I0517 09:55:06.515686 95 request.go:665] Waited for 1.026500832s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s
      7. {"Starting Manager":"(MISSING)","caller":"k8s.go:389","level":"info","ts":"2022-05-17T09:55:08Z"}
      8. {"caller":"speakerlist.go:310","level":"info","msg":"node event - forcing sync","node addr":"10.0.128.4","node event":"NodeJoin","node name":"ci-ln-qb8t3mb-72292-7s7rh-worker-a-vvznj","ts":"2022-05-17T09:55:08Z"}
      9. {"caller":"service_controller.go:113","controller":"ServiceReconciler","enqueueing":"openshift-kube-controller-manager-operator/metrics","epslice":"{\"metadata\":{\"name\":\"metrics-xtsxr\",\"generateName\":\"metrics-\",\"namespace\":\"openshift-kube-controller-manager-operator\",\"uid\":\"ac6766d7-8504-492c-9d1e-4ae8897990ad\",\"resourceVersion\":\"9041\",\"generation\":4,\"creationTimestamp\":\"2022-05-17T07:16:53Z\",\"labels\":{\"app\":\"kube-controller-manager-operator\",\"endpointslice.kubernetes.io/managed-by\":\"endpointslice-controller.k8s.io\",\"kubernetes.io/service-name\":\"metrics\"},\"annotations\":{\"endpoints.kubernetes.io/last-change-trigger-time\":\"2022-05-17T07:21:34Z\"},\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"name\":\"metrics\",\"uid\":\"0518eed3-6152-42be-b566-0bd00a60faf8\",\"controller\":true,\"blockOwnerDeletion\":true}],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"discovery.k8s.io/v1\",\"time\":\"2022-05-17T07:20:02Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:addressType\":{},\"f:endpoints\":{},\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:endpoints.kubernetes.io/last-change-trigger-time\":{}},\"f:generateName\":{},\"f:labels\":{\".\":{},\"f:app\":{},\"f:endpointslice.kubernetes.io/managed-by\":{},\"f:kubernetes.io/service-name\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"0518eed3-6152-42be-b566-0bd00a60faf8\\\"}\":{}}},\"f:ports\":{}}}]},\"addressType\":\"IPv4\",\"endpoints\":[{\"addresses\":[\"10.129.0.7\"],\"conditions\":{\"ready\":true,\"serving\":true,\"terminating\":false},\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"openshift-kube-controller-manager-operator\",\"name\":\"kube-controller-manager-operator-6b98b89ddd-8d4nf\",\"uid\":\"dd5139b8-e41c-4946-a31b-1a629314e844\",\"resourceVersion\":\"9038\"},\"nodeName\":\"ci-ln-qb8t3mb-72292-7s7rh-master-0\",\"zone\":\"us-central1-a\"}],\"ports\":[{\"name\":\"https\",\"protocol\":\"TCP\",\"port\":8443}]}","level":"debug","ts":"2022-05-17T09:55:08Z"}
    5. View the FRR logs:

      Example output

      1. Started watchfrr
      2. 2022/05/17 09:55:05 ZEBRA: client 16 says hello and bids fair to announce only bgp routes vrf=0
      3. 2022/05/17 09:55:05 ZEBRA: client 31 says hello and bids fair to announce only vnc routes vrf=0
      4. 2022/05/17 09:55:05 ZEBRA: client 38 says hello and bids fair to announce only static routes vrf=0
      5. 2022/05/17 09:55:05 ZEBRA: client 43 says hello and bids fair to announce only bfd routes vrf=0
      6. 2022/05/17 09:57:25.089 BGP: Creating Default VRF, AS 64500
      7. 2022/05/17 09:57:25.090 BGP: dup addr detect enable max_moves 5 time 180 freeze disable freeze_time 0
      8. 2022/05/17 09:57:25.090 BGP: bgp_get: Registering BGP instance (null) to zebra
      9. 2022/05/17 09:57:25.090 BGP: Registering VRF 0
      10. 2022/05/17 09:57:25.091 BGP: Rx Router Id update VRF 0 Id 10.131.0.1/32
      11. 2022/05/17 09:57:25.091 BGP: RID change : vrf VRF default(0), RTR ID 10.131.0.1
      12. 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF br0
      13. 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF ens4
      14. 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF ens4 addr 10.0.128.4/32
      15. 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF ens4 addr fe80::c9d:84da:4d86:5618/64
      16. 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF lo
      17. 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF ovs-system
      18. 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF tun0
      19. 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF tun0 addr 10.131.0.1/23
      20. 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF tun0 addr fe80::40f1:d1ff:feb6:5322/64
      21. 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth2da49fed
      22. 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth2da49fed addr fe80::24bd:d1ff:fec1:d88/64
      23. 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth2fa08c8c
      24. 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth2fa08c8c addr fe80::6870:ff:fe96:efc8/64
      25. 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth41e356b7
      26. 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth41e356b7 addr fe80::48ff:37ff:fede:eb4b/64
      27. 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth1295c6e2
      28. 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth1295c6e2 addr fe80::b827:a2ff:feed:637/64
      29. 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth9733c6dc
      30. 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth9733c6dc addr fe80::3cf4:15ff:fe11:e541/64
      31. 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth336680ea
      32. 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth336680ea addr fe80::94b1:8bff:fe7e:488c/64
      33. 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vetha0a907b7
      34. 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vetha0a907b7 addr fe80::3855:a6ff:fe73:46c3/64
      35. 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vethf35a4398
      36. 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vethf35a4398 addr fe80::40ef:2fff:fe57:4c4d/64
      37. 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vethf831b7f4
      38. 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vethf831b7f4 addr fe80::f0d9:89ff:fe7c:1d32/64
      39. 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vxlan_sys_4789
      40. 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vxlan_sys_4789 addr fe80::80c1:82ff:fe4b:f078/64
      41. 2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] Timer (start timer expire).
      42. 2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] BGP_Start (Idle->Connect), fd -1
      43. 2022/05/17 09:57:26.094 BGP: Allocated bnc 10.0.0.1/32(0)(VRF default) peer 0x7f807f7631a0
      44. 2022/05/17 09:57:26.094 BGP: sendmsg_zebra_rnh: sending cmd ZEBRA_NEXTHOP_REGISTER for 10.0.0.1/32 (vrf VRF default)
      45. 2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] Waiting for NHT
      46. 2022/05/17 09:57:26.094 BGP: bgp_fsm_change_status : vrf default(0), Status: Connect established_peers 0
      47. 2022/05/17 09:57:26.094 BGP: 10.0.0.1 went from Idle to Connect
      48. 2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] TCP_connection_open_failed (Connect->Active), fd -1
      49. 2022/05/17 09:57:26.094 BGP: 10.0.0.1 went from Connect to Active
      50. 2022/05/17 09:57:26.094 ZEBRA: rnh_register msg from client bgp: hdr->length=8, type=nexthop vrf=0
      51. 2022/05/17 09:57:26.094 ZEBRA: 0: Add RNH 10.0.0.1/32 type Nexthop
      52. 2022/05/17 09:57:26.094 ZEBRA: 0:10.0.0.1/32: Evaluate RNH, type Nexthop (force)
      53. 2022/05/17 09:57:26.094 ZEBRA: 0:10.0.0.1/32: NH has become unresolved
      54. 2022/05/17 09:57:26.094 ZEBRA: 0: Client bgp registers for RNH 10.0.0.1/32 type Nexthop
      55. 2022/05/17 09:57:26.094 BGP: VRF default(0): Rcvd NH update 10.0.0.1/32(0) - metric 0/0 #nhops 0/0 flags 0x6
      56. 2022/05/17 09:57:26.094 BGP: NH update for 10.0.0.1/32(0)(VRF default) - flags 0x6 chgflags 0x0 - evaluate paths
      57. 2022/05/17 09:57:26.094 BGP: evaluate_paths: Updating peer (10.0.0.1(VRF default)) status with NHT
      58. 2022/05/17 09:57:30.081 ZEBRA: Event driven route-map update triggered
      59. 2022/05/17 09:57:30.081 ZEBRA: Event handler for route-map: 10.0.0.1-out
      60. 2022/05/17 09:57:30.081 ZEBRA: Event handler for route-map: 10.0.0.1-in
      61. 2022/05/17 09:57:31.104 ZEBRA: Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring
      62. 2022/05/17 09:57:31.105 ZEBRA: netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0
      63. 2022/05/17 09:57:31.105 ZEBRA: Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring

    The following table describes the FRR logging levels.

    The BGP implementation that Red Hat supports uses FRRouting (FRR) in a container in the speaker pods. As a cluster administrator, if you need to troubleshoot BGP configuration issues, you need to run commands in the FRR container.

    Prerequisites

    • You have access to the cluster as a user with the cluster-admin role.

    • You have installed the OpenShift CLI (oc).

    Procedure

    1. Display the names of the speaker pods:

      1. $ oc get -n metallb-system pods -l component=speaker

      Example output

      1. NAME READY STATUS RESTARTS AGE
      2. speaker-66bth 4/4 Running 0 56m
      3. speaker-gvfnf 4/4 Running 0 56m
      4. ...
    2. Display the running configuration for FRR:

      1. $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show running-config"

      Example output

      1. Building configuration...
      2. Current configuration:
      3. !
      4. frr version 7.5.1_git
      5. frr defaults traditional
      6. hostname some-hostname
      7. log file /etc/frr/frr.log informational
      8. log timestamp precision 3
      9. service integrated-vtysh-config
      10. !
      11. router bgp 64500 (1)
      12. bgp router-id 10.0.1.2
      13. no bgp ebgp-requires-policy
      14. no bgp default ipv4-unicast
      15. no bgp network import-check
      16. neighbor 10.0.2.3 remote-as 64500 (2)
      17. neighbor 10.0.2.3 bfd profile doc-example-bfd-profile-full (3)
      18. neighbor 10.0.2.3 timers 5 15
      19. neighbor 10.0.2.4 remote-as 64500 (2)
      20. neighbor 10.0.2.4 bfd profile doc-example-bfd-profile-full (3)
      21. neighbor 10.0.2.4 timers 5 15
      22. !
      23. address-family ipv4 unicast
      24. network 203.0.113.200/30 (4)
      25. neighbor 10.0.2.3 activate
      26. neighbor 10.0.2.3 route-map 10.0.2.3-in in
      27. neighbor 10.0.2.4 activate
      28. neighbor 10.0.2.4 route-map 10.0.2.4-in in
      29. exit-address-family
      30. !
      31. address-family ipv6 unicast
      32. network fc00:f853:ccd:e799::/124 (4)
      33. neighbor 10.0.2.3 activate
      34. neighbor 10.0.2.3 route-map 10.0.2.3-in in
      35. neighbor 10.0.2.4 activate
      36. neighbor 10.0.2.4 route-map 10.0.2.4-in in
      37. exit-address-family
      38. !
      39. route-map 10.0.2.3-in deny 20
      40. !
      41. route-map 10.0.2.4-in deny 20
      42. !
      43. ip nht resolve-via-default
      44. !
      45. ipv6 nht resolve-via-default
      46. !
      47. line vty
      48. !
      49. bfd
      50. profile doc-example-bfd-profile-full (3)
      51. receive-interval 35
      52. passive-mode
      53. echo-mode
      54. echo-interval 35
      55. minimum-ttl 10
      56. !
      57. !
      58. end
      1The router bgp section indicates the ASN for MetalLB.
      2Confirm that a neighbor <ip-address> remote-as <peer-ASN> line exists for each BGP peer custom resource that you added.
      3If you configured BFD, confirm that the BFD profile is associated with the correct BGP peer and that the BFD profile appears in the command output.
      4Confirm that the network <ip-address-range> lines match the IP address ranges that you specified in address pool custom resources that you added.
    3. Display the BGP summary:

      Example output

      1. IPv4 Unicast Summary:
      2. BGP table version 1
      3. RIB entries 1, using 192 bytes of memory
      4. Peers 2, using 29 KiB of memory
      5. Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
      6. 10.0.2.3 4 64500 387 389 0 0 0 00:32:02 0 1 (1)
      7. 10.0.2.4 4 64500 0 0 0 0 0 never Active 0 (2)
      8. Total number of neighbors 2
      9. IPv6 Unicast Summary:
      10. BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0
      11. BGP table version 1
      12. RIB entries 1, using 192 bytes of memory
      13. Peers 2, using 29 KiB of memory
      14. Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
      15. 10.0.2.3 4 64500 387 389 0 0 0 00:32:02 NoNeg (1)
      16. 10.0.2.4 4 64500 0 0 0 0 0 never Active 0 (2)
      17. Total number of neighbors 2
    4. Display the BGP peers that received an address pool:

      1. $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bgp ipv4 unicast 203.0.113.200/30"

      Replace ipv4 with ipv6 to display the BGP peers that received an IPv6 address pool. Replace 203.0.113.200/30 with an IPv4 or IPv6 IP address range from an address pool.

      Example output

      1. BGP routing table entry for 203.0.113.200/30
      2. Paths: (1 available, best #1, table default)
      3. Advertised to non peer-group peers:
      4. 10.0.2.3 (1)
      5. Local
      6. 0.0.0.0 from 0.0.0.0 (10.0.1.2)
      7. Origin IGP, metric 0, weight 32768, valid, sourced, local, best (First path received)
      8. Last update: Mon Jan 10 19:49:07 2022
      1Confirm that the output includes an IP address for a BGP peer.

    The Bidirectional Forwarding Detection (BFD) implementation that Red Hat supports uses FRRouting (FRR) in a container in the speaker pods. The BFD implementation relies on BFD peers also being configured as BGP peers with an established BGP session. As a cluster administrator, if you need to troubleshoot BFD configuration issues, you need to run commands in the FRR container.

    Prerequisites

    • You have access to the cluster as a user with the cluster-admin role.

    • You have installed the OpenShift CLI (oc).

    Procedure

    1. Display the names of the speaker pods:

      1. $ oc get -n metallb-system pods -l component=speaker

      Example output

      1. NAME READY STATUS RESTARTS AGE
      2. speaker-66bth 4/4 Running 0 26m
      3. speaker-gvfnf 4/4 Running 0 26m
      4. ...
    2. Display the BFD peers:

      Example output

      1. Session count: 2
      2. SessionId LocalAddress PeerAddress Status
      3. ========= ============ =========== ======
      4. 3909139637 10.0.1.2 10.0.2.3 up (1)

    OKD captures the following metrics that are related to MetalLB and BGP peers and BFD profiles:

    • metallb_bfd_control_packet_output counts the number of BFD control packets sent to each BFD peer.

    • metallb_bfd_echo_packet_input counts the number of BFD echo packets received from each BFD peer.

    • metallb_bfd_echo_packet_output counts the number of BFD echo packets sent to each BFD peer.

    • metallb_bfd_session_down_events counts the number of times the BFD session with a peer entered the down state.

    • metallb_bfd_session_up indicates the connection state with a BFD peer. 1 indicates the session is up and 0 indicates the session is down.

    • metallb_bfd_session_up_events counts the number of times the BFD session with a peer entered the up state.

    • metallb_bfd_zebra_notifications counts the number of BFD Zebra notifications for each BFD peer.

    • metallb_bgp_announced_prefixes_total counts the number of load balancer IP address prefixes that are advertised to BGP peers. The terms prefix and aggregated route have the same meaning.

    • metallb_bgp_session_up indicates the connection state with a BGP peer. 1 indicates the session is up and 0 indicates the session is down.

    • metallb_bgp_updates_total counts the number of BGP update messages that were sent to a BGP peer.

    Additional resources

    • See for information about using the monitoring dashboard.

    You can use the oc adm must-gather CLI command to collect information about your cluster, your MetalLB configuration, and the MetalLB Operator. The following features and objects are associated with MetalLB and the MetalLB Operator:

    • The namespace and child objects that the MetalLB Operator is deployed in

    • All MetalLB Operator custom resource definitions (CRDs)

    The oc adm must-gather CLI command collects the following information from FRRouting (FRR) that Red Hat uses to implement BGP and BFD:

    • /etc/frr/frr.conf

    • /etc/frr/frr.log

    • /etc/frr/daemons configuration file

    • /etc/frr/vtysh.conf

    The log and configuration files in the preceding list are collected from the frr container in each speaker pod.

    In addition to the log and configuration files, the oc adm must-gather CLI command collects the output from the following vtysh commands:

    • show running-config

    • show bgp ipv4

    • show bgp ipv6

    • show bgp neighbor

    Additional resources