Monitoring bare-metal events with the Bare Metal Event Relay

    Use the Bare Metal Event Relay to subscribe applications that run in your OKD cluster to events that are generated on the underlying bare-metal host. The Redfish service publishes events on a node and transmits them on an advanced message queue to subscribed applications.

    Bare-metal events are based on the open Redfish standard that is developed under the guidance of the Distributed Management Task Force (DMTF). Redfish provides a secure industry-standard protocol with a REST API. The protocol is used for the management of distributed, converged or software-defined resources and infrastructure.

    Hardware-related events published through Redfish includes:

    • Breaches of temperature limits

    • Server status

    • Fan status

    Begin using bare-metal events by deploying the Bare Metal Event Relay Operator and subscribing your application to the service. The Bare Metal Event Relay Operator installs and manages the lifecycle of the Redfish bare-metal event service.

    The Bare Metal Event Relay works only with Redfish-capable devices on single-node clusters provisioned on bare-metal infrastructure.

    How bare-metal events work

    The Bare Metal Event Relay enables applications running on bare-metal clusters to respond quickly to Redfish hardware changes and failures such as breaches of temperature thresholds, fan failure, disk loss, power outages, and memory failure. These hardware events are delivered over a reliable low-latency transport channel based on Advanced Message Queuing Protocol (AMQP). The latency of the messaging service is between 10 to 20 milliseconds.

    The Bare Metal Event Relay provides a publish-subscribe service for the hardware events. Applications can use a REST API to subscribe to the events. The Bare Metal Event Relay supports hardware that complies with Redfish OpenAPI v1.8 or later.

    The following figure illustrates an example bare-metal events data flow:

    Figure 1. Bare Metal Event Relay data flow

    Operator-managed pod

    The Operator uses custom resources to manage the pod containing the Bare Metal Event Relay and its components using the CR.

    Bare Metal Event Relay

    At startup, the Bare Metal Event Relay queries the Redfish API and downloads all the message registries, including custom registries. The Bare Metal Event Relay then begins to receive subscribed events from the Redfish hardware.

    The Bare Metal Event Relay enables applications running on bare-metal clusters to respond quickly to Redfish hardware changes and failures such as breaches of temperature thresholds, fan failure, disk loss, power outages, and memory failure. The events are reported using the HardwareEvent CR.

    Cloud native event

    Cloud native events (CNE) is a REST API specification for defining the format of event data.

    CNCF CloudEvents

    is a vendor-neutral specification developed by the Cloud Native Computing Foundation (CNCF) for defining the format of event data.

    HTTP transport or AMQP dispatch router

    The HTTP transport or AMQP dispatch router is responsible for the message delivery service between publisher and subscriber.

    Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, .

    Cloud event proxy sidecar

    The cloud event proxy sidecar container image is based on the O-RAN API specification and provides a publish-subscribe event framework for hardware events.

    Redfish message parsing service

    In addition to handling Redfish events, the Bare Metal Event Relay provides message parsing for events without a Message property. The proxy downloads all the Redfish message registries including vendor specific registries from the hardware when it starts. If an event does not contain a Message property, the proxy uses the Redfish message registries to construct the Message and Resolution properties and add them to the event before passing the event to the cloud events framework. This service allows Redfish events to have smaller message size and lower transmission latency.

    Installing the Bare Metal Event Relay using the CLI

    As a cluster administrator, you can install the Bare Metal Event Relay Operator by using the CLI.

    Prerequisites

    • A cluster that is installed on bare-metal hardware with nodes that have a RedFish-enabled Baseboard Management Controller (BMC).

    • Install the OpenShift CLI (oc).

    • Log in as a user with cluster-admin privileges.

    Procedure

    1. Create a namespace for the Bare Metal Event Relay.

      1. Save the following YAML in the bare-metal-events-namespace.yaml file:

      2. Create the Namespace CR:

        1. $ oc create -f bare-metal-events-namespace.yaml
    2. Create an Operator group for the Bare Metal Event Relay Operator.

      1. Save the following YAML in the bare-metal-events-operatorgroup.yaml file:

        1. apiVersion: operators.coreos.com/v1
        2. kind: OperatorGroup
        3. metadata:
        4. name: bare-metal-event-relay-group
        5. namespace: openshift-bare-metal-events
        6. spec:
        7. targetNamespaces:
        8. - openshift-bare-metal-events
      2. Create the OperatorGroup CR:

        1. $ oc create -f bare-metal-events-operatorgroup.yaml
    3. Subscribe to the Bare Metal Event Relay.

      1. Save the following YAML in the bare-metal-events-sub.yaml file:

        1. apiVersion: operators.coreos.com/v1alpha1
        2. kind: Subscription
        3. metadata:
        4. name: bare-metal-event-relay-subscription
        5. namespace: openshift-bare-metal-events
        6. spec:
        7. channel: "stable"
        8. name: bare-metal-event-relay
        9. source: redhat-operators
        10. sourceNamespace: openshift-marketplace
      2. Create the Subscription CR:

        1. $ oc create -f bare-metal-events-sub.yaml

    Verification

    To verify that the Bare Metal Event Relay Operator is installed, run the following command:

    1. $ oc get csv -n openshift-bare-metal-events -o custom-columns=Name:.metadata.name,Phase:.status.phase

    As a cluster administrator, you can install the Bare Metal Event Relay Operator using the web console.

    Prerequisites

    • A cluster that is installed on bare-metal hardware with nodes that have a RedFish-enabled Baseboard Management Controller (BMC).

    • Log in as a user with cluster-admin privileges.

    Procedure

    1. Install the Bare Metal Event Relay using the OKD web console:

      1. In the OKD web console, click OperatorsOperatorHub.

      2. Choose Bare Metal Event Relay from the list of available Operators, and then click Install.

      3. On the Install Operator page, select or create a Namespace, select openshift-bare-metal-events, and then click Install.

    Verification

    Optional: You can verify that the Operator installed successfully by performing the following check:

    1. Switch to the OperatorsInstalled Operators page.

    2. Ensure that Bare Metal Event Relay is listed in the project with a Status of InstallSucceeded.

      During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

    If the Operator does not appear as installed, to troubleshoot further:

    • Go to the OperatorsInstalled Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.

    • Go to the WorkloadsPods page and check the logs for pods in the project namespace.

    To pass Redfish bare-metal event notifications between publisher and subscriber on a node, you can install and configure an AMQ messaging bus to run locally on the node. You do this by installing the AMQ Interconnect Operator for use in the cluster.

    Prerequisites

    • Install the OKD CLI (oc).

    • Log in as a user with cluster-admin privileges.

    Procedure

    • Install the AMQ Interconnect Operator to its own amq-interconnect namespace. See .

    Verification

    1. Verify that the AMQ Interconnect Operator is available and the required pods are running:

      1. $ oc get pods -n amq-interconnect

      Example output

      1. NAME READY STATUS RESTARTS AGE
      2. interconnect-operator-5cb5fc7cc-4v7qm 1/1 Running 0 23h
    2. Verify that the required bare-metal-event-relay bare-metal event producer pod is running in the openshift-bare-metal-events namespace:

      1. $ oc get pods -n openshift-bare-metal-events

      Example output

      1. NAME READY STATUS RESTARTS AGE
      2. hw-event-proxy-operator-controller-manager-74d5649b7c-dzgtl 2/2 Running 0 25s

    Subscribing to Redfish BMC bare-metal events for a cluster node

    You can subscribe to Redfish BMC events generated on a node in your cluster by creating a BMCEventSubscription custom resource (CR) for the node, creating a HardwareEvent CR for the event, and creating a Secret CR for the BMC.

    Subscribing to bare-metal events

    You can configure the baseboard management controller (BMC) to send bare-metal events to subscribed applications running in an OKD cluster. Example Redfish bare-metal events include an increase in device temperature, or removal of a device. You subscribe applications to bare-metal events using a REST API.

    You can only create a BMCEventSubscription custom resource (CR) for physical hardware that supports Redfish and has a vendor interface set to redfish or idrac-redfish.

    Perform the following procedure to subscribe to bare-metal events for the node using a BMCEventSubscription CR.

    Prerequisites

    • Install the OpenShift CLI (oc).

    • Log in as a user with cluster-admin privileges.

    • Get the user name and password for the BMC.

    • Deploy a bare-metal node with a Redfish-enabled Baseboard Management Controller (BMC) in your cluster, and enable Redfish events on the BMC.

      Enabling Redfish events on specific hardware is outside the scope of this information. For more information about enabling Redfish events for your specific hardware, consult the BMC manufacturer documentation.

    Procedure

    1. Confirm that the node hardware has the Redfish EventService enabled by running the following curl command:

      where:

      bmc_ip_address

      is the IP address of the BMC where the Redfish events are generated.

      Example output

      1. {
      2. "@odata.context": "/redfish/v1/$metadata#EventService.EventService",
      3. "@odata.id": "/redfish/v1/EventService",
      4. "@odata.type": "#EventService.v1_0_2.EventService",
      5. "Actions": {
      6. "#EventService.SubmitTestEvent": {
      7. "EventType@Redfish.AllowableValues": ["StatusChange", "ResourceUpdated", "ResourceAdded", "ResourceRemoved", "Alert"],
      8. "target": "/redfish/v1/EventService/Actions/EventService.SubmitTestEvent"
      9. }
      10. },
      11. "DeliveryRetryAttempts": 3,
      12. "DeliveryRetryIntervalSeconds": 30,
      13. "Description": "Event Service represents the properties for the service",
      14. "EventTypesForSubscription": ["StatusChange", "ResourceUpdated", "ResourceAdded", "ResourceRemoved", "Alert"],
      15. "EventTypesForSubscription@odata.count": 5,
      16. "Id": "EventService",
      17. "Name": "Event Service",
      18. "ServiceEnabled": true,
      19. "Status": {
      20. "Health": "OK",
      21. "HealthRollup": "OK",
      22. "State": "Enabled"
      23. },
      24. "Subscriptions": {
      25. "@odata.id": "/redfish/v1/EventService/Subscriptions"
      26. }
      27. }
    2. Get the Bare Metal Event Relay service route for the cluster by running the following command:

      1. $ oc get route -n openshift-bare-metal-events

      Example output

      1. NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
      2. hw-event-proxy hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com hw-event-proxy-service 9087 edge None
    3. Create a BMCEventSubscription resource to subscribe to the Redfish events:

      1. Save the following YAML in the bmc_sub.yaml file:

        1. apiVersion: metal3.io/v1alpha1
        2. kind: BMCEventSubscription
        3. metadata:
        4. namespace: openshift-machine-api
        5. spec:
        6. hostName: <hostname> (1)
        7. destination: <proxy_service_url> (2)
        8. context: ''
        1Specifies the name or UUID of the worker node where the Redfish events are generated.
        2Specifies the bare-metal event proxy service, for example, https://hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com/webhook.
      2. Create the BMCEventSubscription CR:

        1. $ oc create -f bmc_sub.yaml
    4. Optional: To delete the BMC event subscription, run the following command:

      1. $ oc delete -f bmc_sub.yaml
    5. Optional: To manually create a Redfish event subscription without creating a BMCEventSubscription CR, run the following curl command, specifying the BMC username and password.

      1. $ curl -i -k -X POST -H "Content-Type: application/json" -d '{"Destination": "https://<proxy_service_url>", "Protocol" : "Redfish", "EventTypes": ["Alert"], "Context": "root"}' -u <bmc_username>:<password> 'https://<bmc_ip_address>/redfish/v1/EventService/Subscriptions' v

      where:

      proxy_service_url

      is the bare-metal event proxy service, for example, [https://hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com/webhook](https://hw-event-proxy-openshift-bare-metal-events.apps.compute-1.example.com/webhook).

      bmc_ip_address

      is the IP address of the BMC where the Redfish events are generated.

      Example output

      1. HTTP/1.1 201 Created
      2. Server: AMI MegaRAC Redfish Service
      3. Location: /redfish/v1/EventService/Subscriptions/1
      4. Allow: GET, POST
      5. Access-Control-Allow-Origin: *
      6. Access-Control-Expose-Headers: X-Auth-Token
      7. Access-Control-Allow-Headers: X-Auth-Token
      8. Access-Control-Allow-Credentials: true
      9. Cache-Control: no-cache, must-revalidate
      10. Link: <http://redfish.dmtf.org/schemas/v1/EventDestination.v1_6_0.json>; rel=describedby
      11. Link: <http://redfish.dmtf.org/schemas/v1/EventDestination.v1_6_0.json>
      12. Link: </redfish/v1/EventService/Subscriptions>; path=
      13. ETag: "1651135676"
      14. Content-Type: application/json; charset=UTF-8
      15. OData-Version: 4.0
      16. Content-Length: 614
      17. Date: Thu, 28 Apr 2022 08:47:57 GMT

    Querying Redfish bare-metal event subscriptions with curl

    Some hardware vendors limit the amount of Redfish hardware event subscriptions. You can query the number of Redfish event subscriptions by using curl.

    Prerequisites

    • Get the user name and password for the BMC.

    • Deploy a bare-metal node with a Redfish-enabled Baseboard Management Controller (BMC) in your cluster, and enable Redfish hardware events on the BMC.

    Procedure

    1. Check the current subscriptions for the BMC by running the following curl command:

      1. $ curl --globoff -H "Content-Type: application/json" -k -X GET --user <bmc_username>:<password> https://<bmc_ip_address>/redfish/v1/EventService/Subscriptions

      where:

      bmc_ip_address

      is the IP address of the BMC where the Redfish events are generated.

      Example output

      1. % Total % Received % Xferd Average Speed Time Time Time Current
      2. Dload Upload Total Spent Left Speed
      3. 100 435 100 435 0 0 399 0 0:00:01 0:00:01 --:--:-- 399
      4. {
      5. "@odata.context": "/redfish/v1/$metadata#EventDestinationCollection.EventDestinationCollection",
      6. "@odata.etag": ""
      7. 1651137375 "",
      8. "@odata.type": "#EventDestinationCollection.EventDestinationCollection",
      9. "Description": "Collection for Event Subscriptions",
      10. "Members": [
      11. {
      12. "@odata.id": "/redfish/v1/EventService/Subscriptions/1"
      13. }],
      14. "Members@odata.count": 1,
      15. "Name": "Event Subscriptions Collection"
      16. }

      In this example, a single subscription is configured: /redfish/v1/EventService/Subscriptions/1.

    2. Optional: To remove the /redfish/v1/EventService/Subscriptions/1 subscription with curl, run the following command, specifying the BMC username and password:

      where:

      bmc_ip_address

      is the IP address of the BMC where the Redfish events are generated.

    To start using bare-metal events, create the HardwareEvent custom resource (CR) for the host where the Redfish hardware is present. Hardware events and faults are reported in the hw-event-proxy logs.

    Prerequisites

    • You have installed the OKD CLI (oc).

    • You have logged in as a user with cluster-admin privileges.

    • You have created a BMCEventSubscription CR for the BMC Redfish hardware.

    • You have configured dynamic volume provisioning in the cluster or you have manually created StorageClass, LocalVolume, and PersistentVolume CRs to persist the events subscription.

      When you enable dynamic volume provisioning in the cluster, a PersistentVolume resource is automatically created for the PersistentVolumeClaim that the Bare Metal Event Relay deploys.

      For more information about manually creating persistent storage in the cluster, see “Persistent storage using local volumes”.

    Procedure

    1. Create the HardwareEvent custom resource (CR):

      Multiple HardwareEvent resources are not permitted.

      1. Save the following YAML in the hw-event.yaml file:

        1. apiVersion: "event.redhat-cne.org/v1alpha1"
        2. kind: "HardwareEvent"
        3. metadata:
        4. name: "hardware-event"
        5. spec:
        6. nodeSelector:
        7. node-role.kubernetes.io/hw-event: "" (1)
        8. storageType: "example-storage-class" (2)
        9. logLevel: "debug" (3)
        10. msgParserTimeout: "10" (4)
        1Required. Use the nodeSelector field to target nodes with the specified label, for example, node-role.kubernetes.io/hw-event: “”.
        2The value of storageType is used to populate the StorageClassName field for the PersistentVolumeClaim (PVC) resource that the Bare Metal Event Relay automatically deploys. The PVC resource is used to persist consumer event subscriptions.
        3Optional. The default value is debug. Sets the log level in hw-event-proxy logs. The following log levels are available: fatal, error, warning, , debug, trace.
        4Optional. Sets the timeout value in milliseconds for the Message Parser. If a message parsing request is not responded to within the timeout duration, the original hardware event message is passed to the cloud native event framework. The default value is 10.
      2. Apply the HardwareEvent CR in the cluster:

        1. $ oc create -f hardware-event.yaml
    2. Create a BMC username and password Secret CR that enables the hardware events proxy to access the Redfish message registry for the bare-metal host.

      1. Save the following YAML in the hw-event-bmc-secret.yaml file:

        1. apiVersion: v1
        2. kind: Secret
        3. metadata:
        4. name: redfish-basic-auth
        5. type: Opaque
        6. stringData: (1)
        7. username: <bmc_username>
        8. password: <bmc_password>
        9. # BMC host DNS or IP address
        10. hostaddr: <bmc_host_ip_address>
        1Enter plain text values for the various items under stringData.
      2. Create the Secret CR:

        1. $ oc create -f hw-event-bmc-secret.yaml

    Additional resources

    Use the bare-metal events REST API to subscribe an application to the bare-metal events that are generated on the parent node.

    Subscribe applications to Redfish events by using the resource address /cluster/node/<node_name>/redfish/event, where <node_name> is the cluster node running the application.

    Deploy your cloud-event-consumer application container and cloud-event-proxy sidecar container in a separate application pod. The cloud-event-consumer application subscribes to the cloud-event-proxy container in the application pod.

    Use the following API endpoints to subscribe the cloud-event-consumer application to Redfish events posted by the cloud-event-proxy container at http://localhost:8089/api/ocloudNotifications/v1/ in the application pod:

    • /api/ocloudNotifications/v1/subscriptions

      • POST: Creates a new subscription

      • GET: Retrieves a list of subscriptions

    • /api/ocloudNotifications/v1/subscriptions/<subscription_id>

      • GET: Returns details for the specified subscription ID
    • api/ocloudNotifications/v1/subscriptions/status/<subscription_id>

      • PUT: Creates a new status ping request for the specified subscription ID
    • /api/ocloudNotifications/v1/health

      • GET: Returns the health status of ocloudNotifications API

    9089 is the default port for the cloud-event-consumer container deployed in the application pod. You can configure a different port for your application as required.

    api/ocloudNotifications/v1/subscriptions

    HTTP method

    GET api/ocloudNotifications/v1/subscriptions

    Description

    Returns a list of subscriptions. If subscriptions exist, a 200 OK status code is returned along with the list of subscriptions.

    Example API response

    1. [
    2. {
    3. "id": "ca11ab76-86f9-428c-8d3a-666c24e34d32",
    4. "endpointUri": "http://localhost:9089/api/ocloudNotifications/v1/dummy",
    5. "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/ca11ab76-86f9-428c-8d3a-666c24e34d32",
    6. "resource": "/cluster/node/openshift-worker-0.openshift.example.com/redfish/event"
    7. }
    8. ]

    HTTP method

    POST api/ocloudNotifications/v1/subscriptions

    Description

    Creates a new subscription. If a subscription is successfully created, or if it already exists, a 201 Created status code is returned.

    Table 1. Query parameters
    ParameterType

    subscription

    data

    Example payload

    1. {
    2. "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions",
    3. "resource": "/cluster/node/openshift-worker-0.openshift.example.com/redfish/event"
    4. }

    api/ocloudNotifications/v1/subscriptions/<subscription_id>

    HTTP method

    GET api/ocloudNotifications/v1/subscriptions/<subscription_id>

    Description

    Returns details for the subscription with ID <subscription_id>

    Table 2. Query parameters
    ParameterType

    <subscription_id>

    string

    Example API response

    1. {
    2. "id":"ca11ab76-86f9-428c-8d3a-666c24e34d32",
    3. "endpointUri":"http://localhost:9089/api/ocloudNotifications/v1/dummy",
    4. "uriLocation":"http://localhost:8089/api/ocloudNotifications/v1/subscriptions/ca11ab76-86f9-428c-8d3a-666c24e34d32",
    5. "resource":"/cluster/node/openshift-worker-0.openshift.example.com/redfish/event"
    6. }

    HTTP method

    PUT api/ocloudNotifications/v1/subscriptions/status/<subscription_id>

    Description

    Creates a new status ping request for subscription with ID <subscription_id>. If a subscription is present, the status request is successful and a 202 Accepted status code is returned.

    Table 3. Query parameters
    ParameterType

    <subscription_id>

    string

    Example API response

    1. {"status":"ping sent"}

    api/ocloudNotifications/v1/health/

    HTTP method

    GET api/ocloudNotifications/v1/health/

    Description

    Returns the health status for the ocloudNotifications REST API.

    Example API response

    1. OK

    Migrating consumer applications to use HTTP transport for PTP or bare-metal events

    If you have previously deployed PTP or bare-metal events consumer applications, you need to update the applications to use HTTP message transport.

    Prerequisites

    • You have installed the OpenShift CLI (oc).

    • You have logged in as a user with cluster-admin privileges.

    • You have updated the PTP Operator or Bare Metal Event Relay to version 4.13+ which uses HTTP transport by default.

    • Configure dynamic volume provisioning in the cluster or manually create StorageClass, LocalVolume, and PersistentVolume resources to persist the events subscription.

    Procedure

    1. Update your events consumer application to use HTTP transport. Set the http-event-publishers variable for the cloud event sidecar deployment.

      For example, in a cluster with PTP events configured, the following YAML snippet illustrates a cloud event sidecar deployment:

      1. containers:
      2. - name: cloud-event-sidecar
      3. image: cloud-event-sidecar
      4. args:
      5. - "--metrics-addr=127.0.0.1:9091"
      6. - "--store-path=/store"
      7. - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043"
      8. - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043" (1)
      1The PTP Operator automatically resolves NODE_NAME to the host that is generating the PTP events. For example, compute-1.example.com.

      In a cluster with bare-metal events configured, set the http-event-publishers field to hw-event-publisher-service.openshift-bare-metal-events.svc.cluster.local:9043 in the cloud event sidecar deployment CR.