Multi-tenancy

The scheme is mainly divided into two parts, one is the division of resource groups at the node level in the cluster, and the other is the resource limit for a single query.

First, let’s briefly introduce the node composition of Doris. There are two types of nodes in a Doris cluster: Frontend (FE) and Backend (BE).

FE is mainly responsible for metadata management, cluster management, user request access and query plan analysis.

BE is mainly responsible for data storage and execution of query plans.

FE does not participate in the processing and calculation of user data, so it is a node with low resource consumption. The BE is responsible for all data calculations and task processing, and is a resource-consuming node. Therefore, the resource division and resource restriction schemes introduced in this article are all aimed at BE nodes. Because the FE node consumes relatively low resources and can also be scaled horizontally, there is usually no need to isolate and restrict resources, and the FE node can be shared by all users.

Node resource division refers to setting tags for BE nodes in a Doris cluster, and the BE nodes with the same tags form a resource group. Resource group can be regarded as a management unit of data storage and calculation. Below we use a specific example to introduce the use of resource groups.

Set labels for BE nodes

Assume that the current Doris cluster has 6 BE nodes. They are host[1-6] respectively. In the initial situation, all nodes belong to a default resource group (Default).

We can use the following command to divide these 6 nodes into 3 resource groups: group_a, group_b, group_c:

Here we combine to form a resource group group_a, host[3-4] to form a resource group group_b, and host[5-6] to form a resource group group_c.

Distribution of data according to resource groups

After the resource group is divided. We can distribute different copies of user data in different resource groups. Assume a user table UserTable. We want to store a copy in each of the three resource groups, which can be achieved by the following table creation statement:

create table UserTable
(k1 int, k2 int)
distributed by hash(k1) buckets 1
properties(
    "replication_allocation"
    =
    "tag.location.group_a:1, tag.location.group_b:1, tag.location.group_c:1"
)

The following figure shows the current node division and data distribution:

 ┌────────────────────────────────────────────────────┐
 │                                                    │
 │         ┌──────────────────┐  ┌──────────────────┐ │
 │         │ host1            │  │ host2            │ │
 │         │  ┌─────────────┐ │  │                  │ │
 │ group_a │  │   replica1  │ │  │                  │ │
 │         │  └─────────────┘ │  │                  │ │
 │         │                  │  │                  │ │
 │         └──────────────────┘  └──────────────────┘ │
 ├────────────────────────────────────────────────────┤
 ├────────────────────────────────────────────────────┤
 │                                                    │
 │         ┌──────────────────┐  ┌──────────────────┐ │
 │         │                  │  │  ┌─────────────┐ │ │
 │ group_b │                  │  │  │   replica2  │ │ │
 │         │                  │  │  └─────────────┘ │ │
 │         │                  │  │                  │ │
 │         └──────────────────┘  └──────────────────┘ │
 │                                                    │
 ├────────────────────────────────────────────────────┤
 ├────────────────────────────────────────────────────┤
 │                                                    │
 │         ┌──────────────────┐  ┌──────────────────┐ │
 │         │ host5            │  │ host6            │ │
 │         │                  │  │  ┌─────────────┐ │ │
 │ group_c │                  │  │  │   replica3  │ │ │
 │         │                  │  │  └─────────────┘ │ │
 │         │                  │  │                  │ │
 │         └──────────────────┘  └──────────────────┘ │
 │                                                    │
 └────────────────────────────────────────────────────┘

The resource group method mentioned earlier is resource isolation and restriction at the node level. In the resource group, resource preemption problems may still occur. For example, as mentioned above, the three business departments are arranged in the same resource group. Although the degree of resource competition is reduced, the queries of these three departments may still affect each other.

Therefore, in addition to the resource group solution, Doris also provides a single query resource restriction function.

At present, Doris’s resource restrictions on single queries are mainly divided into two aspects: CPU and memory restrictions.

Memory Limitation

Doris can limit the maximum memory overhead that a query is allowed to use. To ensure that the memory resources of the cluster will not be fully occupied by a query. We can set the memory limit in the following ways:
```
// Set the session variable exec_mem_limit. Then all subsequent queries in the session (within the connection) use this memory limit.
set exec_mem_limit=1G;
// Set the global variable exec_mem_limit. Then all subsequent queries of all new sessions (new connections) use this memory limit.
set global exec_mem_limit=1G;
// Set the variable exec_mem_limit in SQL. Then the variable only affects this SQL.
```
Because Doris’ query engine is based on the full-memory MPP query framework. Therefore, when the memory usage of a query exceeds the limit, the query will be terminated. Therefore, when a query cannot run under a reasonable memory limit, we need to solve it through some SQL optimization methods or cluster expansion.
Users can limit the CPU resources of the query in the following ways:
```
// Set the session variable cpu_resource_limit. Then all queries in the session (within the connection) will use this CPU limit.
set cpu_resource_limit = 2
// Set the user's attribute cpu_resource_limit, then all queries of this user will use this CPU limit. The priority of this attribute is higher than the session variable cpu_resource_limit
set property for'user1''cpu_resource_limit' = '3';
```
The value of cpu_resource_limit is a relative value. The larger the value, the more CPU resources can be used. However, the upper limit of the CPU that can be used by a query also depends on the number of partitions and buckets of the table. In principle, the maximum CPU usage of a query is positively related to the number of tablets involved in the query. In extreme cases, assuming that a query involves only one tablet, even if cpu_resource_limit is set to a larger value, only 1 CPU resource can be used.

Through memory and CPU resource limits. We can divide user queries into more fine-grained resources within a resource group. For example, we can make some offline tasks with low timeliness requirements, but with a large amount of calculation, use less CPU resources and more memory resources. Some delay-sensitive online tasks use more CPU resources and reasonable memory resources.

Tag division and CPU limitation are new features in version 0.15. In order to ensure a smooth upgrade from the old version, Doris has made the following forward compatibility:

Each BE node will have a default Tag: "tag.location": "default".
The BE node added through the alter system add backend statement will also set Tag: "tag.location": "default" by default.
The copy distribution of all tables is modified by default to: "tag.location.default:xx. xx is the number of original copies.
Users can still specify the number of replicas in the table creation statement by "replication_num" = "xx", this attribute will be automatically converted to: "tag.location.default:xx. This ensures that there is no need to modify the original creation. Table statement.
By default, the memory limit for a single query is 2GB for a single node, and the CPU resources are unlimited, which is consistent with the original behavior. And the user’s resource_tags.location attribute is empty, that is, by default, the user can access the BE of any Tag, which is consistent with the original behavior.

Here we give an example of the steps to start using the resource division function after upgrading from the original cluster to version 0.15:

Turn off data repair and balance logic

After the upgrade, the default Tag of BE is "tag.location": "default", and the default copy distribution of the table is: "tag.location.default:xx. So if you directly modify the Tag of BE, the system will Automatically detect changes in the distribution of copies, and start data redistribution. This may occupy some system resources. So we can turn off the data repair and balance logic before modifying the tag to ensure that there will be no copies when we plan resources Redistribution operation.
Set Tag and table copy distribution

Next, you can use the alter system modify backend statement to set the BE Tag. And through the alter table statement to modify the copy distribution strategy of the table. Examples are as follows:
```
alter system modify backend "host1:9050, 1212:9050" set ("tag.location" = "group_a");
alter table my_table modify partition p1 set ("replication_allocation" = "tag.location.group_a:2");
```
Turn on data repair and balance logic

After the tag and copy distribution are set, we can turn on the data repair and equalization logic to trigger data redistribution.
```
ADMIN SET FRONTEND CONFIG ("disable_balance" = "false");
```
This process will continue for a period of time depending on the amount of data involved. And it will cause some colocation tables to fail colocation planning (because the copy is being migrated). You can view the progress by show proc "/cluster_balance/". You can also judge the progress by the number of UnhealthyTabletNum in show proc "/statistic". When UnhealthyTabletNum drops to 0, it means that the data redistribution is completed. .
Set the user’s resource label permissions.

Through the above 4 steps, we can smoothly use the resource division function after the original cluster is upgraded.