Monitoring and alarming

    Dashboard template click download

    Dashboard templates are updated from time to time. The way to update the template is shown in the last section.

    Welcome to provide better dashboard.

    Doris uses Prometheus and to collect and display input monitoring items.

    1. Prometheus

      Prometheus is an open source system monitoring and alarm suite. It can collect monitored items by Pull or Push and store them in its own time series database. And through the rich multi-dimensional data query language, to meet the different data display needs of users.

    2. Grafana

      Grafana is an open source data analysis and display platform. Support multiple mainstream temporal database sources including Prometheus. Through the corresponding database query statements, the display data is obtained from the data source. With flexible and configurable dashboard, these data can be quickly presented to users in the form of graphs.

    Note: This document only provides a way to collect and display Doris monitoring data using Prometheus and Grafana. In principle, these components are not developed or maintained. For more details on these components, please step through the corresponding official documents.

    Monitoring data

    Doris’s monitoring data is exposed through the HTTP interface of Frontend and Backend. Monitoring data is presented in the form of key-value text. Each Key may also be distinguished by different Labels. When the user has built Doris, the monitoring data of the node can be accessed in the browser through the following interfaces:

    • Frontend:
    • Backend: be_host:be_web_server_port/metrics
    • Broker: Not available for now

    Users will see the following monitoring item results (for example, FE partial monitoring items):

    This is a monitoring data presented in . We take one of these monitoring items as an example to illustrate:

    1. Behavior commentary line at the beginning of “#”. HELP is the description of the monitored item; TYPE represents the data type of the monitored item, and Gauge is the scalar data in the example. There are also Counter, Histogram and other data types. Specifically, you can see Prometheus Official Document.
    2. jvm_heap_size_bytes is the name of the monitored item (Key); type= "max" is a label named type, with a value of max. A monitoring item can have multiple Labels.
    3. The final number, such as 41661235200, is the monitored value.

    The entire monitoring architecture is shown in the following figure:

    Monitoring and alarming - 图2

    1. The yellow part is Prometheus related components. Prometheus Server is the main process of Prometheus. At present, Prometheus accesses the monitoring interface of Doris node by Pull, and then stores the time series data in the time series database TSDB (TSDB is included in the Prometheus process, and need not be deployed separately). Prometheus also supports building Push Gateway to allow monitored data to be pushed to Push Gateway by Push by monitoring system, and then data from Push Gateway by Prometheus Server through Pull.
    2. is a Prometheus alarm component, which needs to be deployed separately (no solution is provided yet, but can be built by referring to official documents). Through Alert Manager, users can configure alarm strategy, receive mail, short messages and other alarms.
    3. The green part is Grafana related components. Grafana Server is the main process of Grafana. After startup, users can configure Grafana through Web pages, including data source settings, user settings, Dashboard drawing, etc. This is also where end users view monitoring data.

    Start building

    Please start building the monitoring system after you have completed the deployment of Doris.

    1. Download the latest version of Prometheus on the . Here we take version 2.3.2-linux-amd64 as an example.

    2. Unzip the downloaded tar file on the machine that is ready to run the monitoring service.

    3. Open the configuration file prometheus.yml. Here we provide an example configuration and explain it (the configuration file is in YML format, pay attention to uniform indentation and spaces):

      Here we use the simplest way of static files to monitor configuration. Prometheus supports a variety of service discovery, which can dynamically sense the addition and deletion of nodes.

    4. start Prometheus

      Start Prometheus with the following command:

      nohup ./prometheus --web.listen-address="0.0.0.0:8181" &

      This command will run Prometheus in the background and specify its Web port as 8181. After startup, data is collected and stored in the data directory.

    5. access Prometheus

      Prometheus can be easily accessed through web pages. The page of Prometheus can be accessed by opening port 8181 through browser. Click on the navigation bar, Status -> , and you can see all the monitoring host nodes of the grouped Jobs. Normally, all nodes should be UP, indicating that data acquisition is normal. Click on an Endpoint to see the current monitoring value. If the node state is not UP, you can first access Doris’s metrics interface (see previous article) to check whether it is accessible, or query Prometheus related documents to try to resolve.

    6. So far, a simple Prometheus has been built and configured. For more advanced usage, see

    1. Download the latest version of Grafana on . Here we take version 5.2.1.linux-amd64 as an example.

    2. Unzip the downloaded tar file on the machine that is ready to run the monitoring service.

    3. Open the configuration file conf/defaults.ini. Here we only list the configuration items that need to be changed, and the other configurations can be used by default.

    4. start Grafana

      nohuo ./bin/grafana-server &

      This command runs Grafana in the background, and the access port is 8182 configured above.

    5. stop Grafana

      At present, there is no formal way to stop the process, kill - 9 directly. Of course, you can also set Grafana as a service to start and stop as a service.

    6. access Grafana

      Through the browser, open port 8182, you can start accessing the Grafana page. The default username password is admin.

    7. Configure Grafana

      For the first landing, you need to set up the data source according to the prompt. Our data source here is Prometheus, which was configured in the previous step.

      The Setting page of the data source configuration is described as follows:

      1. Name: Name of the data source, customized, such as doris_monitor_data_source
      2. Type: Select Prometheus
      3. URL: Fill in the web address of Prometheus, such as http://host:8181
      4. Access: Here we choose the Server mode, which is to access Prometheus through the server where the Grafana process is located.
      5. The other options are available by default.
      6. Click Save & Test at the bottom. If Data source is working, it means that the data source is available.
      7. After confirming that the data source is available, click on the + number in the left navigation bar and start adding Dashboard. Here we have prepared Doris’s dashboard template (at the beginning of this document). When the download is complete, click New dashboard -> Import dashboard -> Upload.json File above to import the downloaded JSON file.
      8. After importing, you can name Dashboard by default . At the same time, you need to select the data source, where you select the doris_monitor_data_source you created earlier.
      9. Click Import to complete the import. Later, you can see Doris’s dashboard display.
    8. So far, a simple Grafana has been built and configured. For more advanced usage, see

    Here we briefly introduce Doris Dashboard. The content of Dashboard may change with the upgrade of version. This document is not guaranteed to be the latest Dashboard description.

    1. Top Bar

      • The upper left corner is the name of Dashboard.
      • The upper right corner shows the current monitoring time range. You can choose different time ranges by dropping down. You can also specify a regular refresh page interval.
      • fe_master: The Master Frontend node corresponding to the cluster.
      • fe_instance: All Frontend nodes corresponding to the cluster. Select a different Frontend, and the chart below shows the monitoring information for the Frontend.
      • be_instance: All Backend nodes corresponding to the cluster. Select a different Backend, and the chart below shows the monitoring information for the Backend.
      • Interval: Some charts show rate-related monitoring items, where you can choose how much interval to sample and calculate the rate (Note: 15s interval may cause some charts to be unable to display).
    2. Row.

      Monitoring and alarming - 图4

      In Grafana, the concept of Row is a set of graphs. As shown in the figure above, Overview and Cluster Overview are two different Rows. Row can be folded by clicking Row. Currently Dashboard has the following Rows (in continuous updates):

      1. Overview: A summary display of all Doris clusters.
      2. Cluster Overview: A summary display of selected clusters.
      3. Query Statistic: Query-related monitoring of selected clusters.
      4. FE JVM: Select Frontend’s JVM monitoring.
      5. BE: A summary display of the backends of the selected cluster.
      6. BE Task: Display of Backends Task Information for Selected Clusters.
    3. Charts

      1. Hover the I icon in the upper left corner of the mouse to see the description of the chart.
      2. Click on the illustration below to view a monitoring item separately. Click again to display all.
      3. Dragging in the chart can select the time range.
      4. The selected cluster name is displayed in [] of the title.
      5. Some values correspond to the Y-axis on the left and some to the right, which can be distinguished by the -right at the end of the legend.
      6. Click on the name of the chart -> Edit to edit the chart.

    Dashboard Update

    1. Click on + in the left column of Grafana and Dashboard.
    2. Click New dashboard in the upper left corner, and Import dashboard appears on the right.
    3. Click to select the latest template file.
    4. Selecting Data Sources