Broker

    Broker provides services through an RPC service port. It is a stateless JVM process that is responsible for encapsulating some POSIX-like file operations for read and write operations on remote storage, such as open, pred, pwrite, and so on. In addition, the Broker does not record any other information, so the connection information, file information, permission information, and so on stored remotely need to be passed to the Broker process in the RPC call through parameters in order for the Broker to read and write files correctly .

    Broker only acts as a data channel and does not participate in any calculations, so it takes up less memory. Usually one or more Broker processes are deployed in a Doris system. And the same type of Broker will form a group and set a Broker name .

    Broker’s position in the Doris system architecture is as follows:

    This document mainly introduces the parameters that Broker needs when accessing different remote storages, such as connection information, authorization information, and so on.

    Different types of brokers support different storage systems.

    1. Community HDFS

      • Support simple authentication access
      • Support kerberos authentication access
      • Support HDFS HA mode access
    2. Object storage
    • All object stores that support the S3 protocol
    1. Broker Load
    2. Backup

    Broker information includes two parts: Broker name and Certification information . The general syntax is as follows:

    1. (
    2. "username" = "xxx",
    3. "other_prop" = "prop_value",
    4. ...
    5. );

    Usually the user needs to specify an existing Broker Name through the WITH BROKER" broker_name " clause in the operation command. Broker Name is a name that the user specifies when adding a Broker process through the ALTER SYSTEM ADD BROKER command. A name usually corresponds to one or more broker processes. Doris selects available broker processes based on the name. You can use the SHOW BROKER command to view the Brokers that currently exist in the cluster.

    Certification Information

    Different broker types and different access methods need to provide different authentication information. Authentication information is usually provided as a Key-Value in the Property Map after WITH BROKER" broker_name ".

    Community HDFS

    1. Simple Authentication

      Simple authentication means that Hadoop configures hadoop.security.authentication to simple.

      Use system users to access HDFS. Or add in the environment variable started by Broker: HADOOP_USER_NAME.

      Just leave the password blank.

    2. Kerberos Authentication

      The authentication method needs to provide the following information::

      • hadoop.security.authentication: Specify the authentication method as kerberos.
      • kerberos_principal: Specify the principal of kerberos.
      • kerberos_keytab: Specify the path to the keytab file for kerberos. The file must be an absolute path to a file on the server where the broker process is located. And can be accessed by the Broker process.
      • : Specify the content of the keytab file in kerberos after base64 encoding. You can choose one of these with kerberos_keytab configuration.

      Examples are as follows:

      1. (
      2. "hadoop.security.authentication" = "kerberos",
      3. "kerberos_principal" = "doris@YOUR.COM",
      4. "kerberos_keytab" = "/home/doris/my.keytab"
      5. )
      1. [libdefaults]
      2. default_realm = DORIS.HADOOP
      3. default_tkt_enctypes = des3-hmac-sha1 des-cbc-crc
      4. default_tgs_enctypes = des3-hmac-sha1 des-cbc-crc
      5. dns_lookup_realm = false
      6. [realms]
      7. DORIS.HADOOP = {
      8. }
    3. HDFS HA Mode

      This configuration is used to access HDFS clusters deployed in HA mode.

      • dfs.nameservices: Specify the name of the hdfs service, custom, such as “dfs.nameservices” = “my_ha”.

      • dfs.ha.namenodes.xxx: Custom namenode names. Multiple names are separated by commas, where xxx is the custom name in dfs.nameservices, such as” dfs.ha.namenodes.my_ha “=” my_nn “.

      • dfs.namenode.rpc-address.xxx.nn: Specify the rpc address information of namenode, Where nn represents the name of the namenode configured in dfs.ha.namenodes.xxx, such as: “dfs.namenode.rpc-address.my_ha.my_nn” = “host:port”.

      • dfs.client.failover.proxy.provider: Specify the provider for the client to connect to the namenode. The default is: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.

        Examples are as follows:

        The HA mode can be combined with the previous two authentication methods for cluster access. If you access HA HDFS with simple authentication:

        1. (
        2. "username"="user",
        3. "password"="passwd",
        4. "dfs.nameservices" = "my_ha",
        5. "dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2",
        6. "dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
        7. "dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",