Pulsar SQL configuration and deployment

    你可以在 属性文件中配置 Presto Pulsar 连接器。 连接器和默认值的配置如下。

    Presto 可通过多个主机连接到 Pulsar 集群。 要为 broker 配置多个主机,需要添加多个 URL 到 pulsar.web-service-url。 要为 ZooKeeper 配置多个主机,需要添加多个 URI 到 pulsar.zookeeper-uri。 The following is an example.

    1. pulsar.web-service-url=http://localhost:8080,localhost:8081,localhost:8082
    2. pulsar.zookeeper-uri=localhost1,localhost2:2181

    Note: by default, Pulsar SQL does not get the last message in a topic. 它是由设置设计和控制的。 默认情况下,BookKeeper LAC 只在添加后续条目时才会优化。 如果没有添加后续条目,则最后写入的条目对 readers 不可见,直到 ledger 被 关闭。 这对于使用 managed ledger 的 Pulsar 来说不是问题,但是 Pulsar SQL 是直接从 BookKeeper ledger 中读取的。

    如果您想在 topic 中获取最后一条消息,请设置以下配置:

    1. For the broker configuration, set bookkeeperExplicitLacIntervalInMills > 0 in broker.conf or standalone.conf.

    2. For the Presto configuration, set pulsar.bookkeeper-explicit-interval > 0 and pulsar.bookkeeper-use-v2-protocol=false.

    If you already have a Presto cluster, you can copy the Presto Pulsar connector plugin to your existing cluster. Download the archived plugin package with the following command.

    1. $ wget https://archive.apache.org/dist/pulsar/pulsar-2.9.2/apache-pulsar-2.9.2-bin.tar.gz

    因为 Pulsar SQL 是由 Trino(项目原为 Presto SQL)驱动,部署的配置对 Pulsar SQL worker 是相同的。

    你可以使用相同的 CLI 参数给 Presto 启动器:

    1. $ ./bin/pulsar sql-worker --help
    2. Usage: launcher [options] command
    3. Commands: run, start, stop, restart, kill, status
    4. Options:
    5. -h, --help show this help message and exit
    6. -v, --verbose Run verbosely
    7. --etc-dir=DIR Defaults to INSTALL_PATH/etc
    8. --node-config=FILE Defaults to ETC_DIR/node.properties
    9. --jvm-config=FILE Defaults to ETC_DIR/jvm.config
    10. --config=FILE Defaults to ETC_DIR/config.properties
    11. --log-levels-file=FILE
    12. Defaults to ETC_DIR/log.properties
    13. --data-dir=DIR Defaults to INSTALL_PATH
    14. --pid-file=FILE Defaults to DATA_DIR/var/run/launcher.pid
    15. --launcher-log-file=FILE
    16. Defaults to DATA_DIR/var/log/launcher.log (only in
    17. daemon mode)
    18. --server-log-file=FILE
    19. Defaults to DATA_DIR/var/log/server.log (only in
    20. daemon mode)
    21. -D NAME=VALUE Set a Java system property

    The default configuration for the cluster is located in ${project.root}/conf/presto. You can customize your deployment by modifying the default configuration.

    你可以设置 worker 从不同的配置目录读取数据,或者设置不同的目录来写入数据。

    1. $ ./bin/pulsar sql-worker start

    You can deploy a Pulsar SQL cluster or Presto cluster on multiple nodes. The following example shows how to deploy a cluster on three-node cluster.

    1. 将 Pulsar 二进制文件复制到三个节点。

    The first node runs as Presto coordinator. The minimal configuration requirement in the ${project.root}/conf/presto/config.properties file is as follows.

    1. coordinator=true
    2. http-server.http.port=8080
    3. query.max-memory-per-node=1GB
    4. discovery-server.enabled=true
    5. discovery.uri=<coordinator-url>

    另两个节点作为 worker 节点,可以使用下面的配置:

    1. coordinator=false
    2. http-server.http.port=8080
    3. query.max-memory=50GB
    4. query.max-memory-per-node=1GB
    5. discovery.uri=<coordinator-url>
    1. 在文件${project.root}/conf/presto/catalog/pulsar.properties中相应地为 3 个节点修改 pulsar.web-service-urlpulsar.zookeeper-uri 配置。

    2. 启动 Coordinator 节点。

    1. 启动 worker 节点。
    1. $ ./bin/pulsar sql-worker run
    1. 启动 SQL CLI 并检查集群的状态。
    1. $ ./bin/pulsar sql --server <coordinate_url>
    1. 检查节点的状态。
    1. presto> SELECT * FROM system.runtime.nodes;
    2. node_id | http_uri | node_version | coordinator | state
    3. ---------+-------------------------+--------------+-------------+--------
    4. 1 | http://192.168.2.1:8081 | testversion | true | active
    5. 2 | http://192.168.2.3:8081 | testversion | false | active

    关于 Presto 部署的更多信息,请参阅 。