Monitoring cluster events and logs
There are two main sources of cluster-level information that are useful for this purpose: events and logging.
Cluster administrators are encouraged to familiarize themselves with the resource type and review the list of system events to determine which events are of interest. Events are associated with a namespace, either the namespace of the resource they are related to or, for cluster events, the default
namespace. The default namespace holds relevant events for monitoring or auditing a cluster, such as node events and resource events related to infrastructure components.
The master API and oc
command do not provide parameters to scope a listing of events to only those related to nodes. A simple approach would be to use grep
:
Example output
1h 20h 3 origin-node-1.example.local Node Normal NodeHasDiskPressure ...
A more flexible approach is to output the events in a form that other tools can process. For example, the following example uses the jq
tool against JSON output to extract only NodeHasDiskPressure
events:
"apiVersion": "v1",
"involvedObject": {
"kind": "Node",
"name": "origin-node-1.example.local",
"uid": "origin-node-1.example.local"
"kind": "Event",
...
}
Events related to resource creation, modification, or deletion can also be good candidates for detecting misuse of the cluster. The following query, for example, can be used to look for excessive pulling of images:
Example output
4
Using the oc log
command, you can view container logs, build configs and deployments in real time. Different can users have access different access to logs:
Users who have access to a project are able to see the logs for that project by default.
To save your logs for further audit and analysis, you can enable the add-on feature to collect, manage, and view system, container, and audit logs. You can deploy, manage, and upgrade OpenShift Logging through the OpenShift Elasticsearch Operator and Red Hat OpenShift Logging Operator.
With audit logs, you can follow a sequence of activities associated with how a user, administrator, or other OKD component is behaving. API audit logging is done on each server.
Additional resources