How to debug Pulsar connectors
部署一个 Mongo sink 环境
启动一个 Mongo 服务。
创建数据库和集合(Collection)。
docker exec -it pulsar-mongo /bin/bash
mongo
> use pulsar
> db.createCollection('messages')
> exit
启动 Pulsar 单机模式。
docker pull apachepulsar/pulsar:2.4.0
docker run -d -it -p 6650:6650 -p 8080:8080 -v $PWD/data:/pulsar/data --link pulsar-mongo --name pulsar-mongo-standalone apachepulsar/pulsar:2.4.0 bin/pulsar standalone
使用
mongo-sink-config.yaml
文件配置 Mongo sink。configs:
mongoUri: "mongodb://pulsar-mongo:27017"
database: "pulsar"
collection: "messages"
batchSize: 2
batchTimeMs: 500
docker cp mongo-sink-config.yaml pulsar-mongo-standalone:/pulsar/
下载 Mongo sink nar 包。
使用 localrun
命令以本地模式启动 Mongo sink。
./bin/pulsar-admin sinks localrun \
--archive pulsar-io-mongo-2.4.0.nar \
--tenant public --namespace default \
--inputs test-mongo \
--name pulsar-mongo-sink \
--sink-config-file mongo-sink-config.yaml \
--parallelism 1
Use one of the following methods to get a connector log in localrun mode:
After executing the
localrun
command, the log is automatically printed on the console.The log is located at:
logs/functions/tenant/namespace/function-name/function-name-instance-id.log
示例
The path of the Mongo sink connector is:
logs/functions/public/default/pulsar-mongo-sink/pulsar-mongo-sink-0.log
为了清楚地解释日志信息,此处将大块信息分解成小块,并为每个块添加描述。
-
08:21:54.132 [main] INFO org.apache.pulsar.common.nar.NarClassLoader - Created class loader with paths: [file:/tmp/pulsar-nar/pulsar-io-mongo-2.4.0.nar-unpacked/, file:/tmp/pulsar-nar/pulsar-io-mongo-2.4.0.nar-unpacked/META-INF/bundled-dependencies/,
This piece of log information illustrates the basic information about the Mongo sink connector, such as tenant, namespace, name, parallelism, resources, and so on, which can be used to check whether the Mongo sink connector is configured correctly or not.
此日志信息显示与 Mongo 连接和配置信息的状态。
08:21:56.231 [cluster-ClusterId{value='5d6396a3c9e77c0569ff00eb', description='null'}-pulsar-mongo:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:8}] to pulsar-mongo:27017
08:21:56.326 [cluster-ClusterId{value='5d6396a3c9e77c0569ff00eb', description='null'}-pulsar-mongo:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=pulsar-mongo:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 0]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=89058800}
该日志信息说明了消费者和客户端配置,包括主题名称、订阅名称、订阅类型等等。
08:21:56.719 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - Starting Pulsar consumer status recorder with config: {
"topicNames" : [ "test-mongo" ],
"topicsPattern" : null,
"subscriptionName" : "public/default/pulsar-mongo-sink",
"subscriptionType" : "Shared",
"receiverQueueSize" : 1000,
"acknowledgementsGroupTimeMicros" : 100000,
"negativeAckRedeliveryDelayMicros" : 60000000,
"maxTotalReceiverQueueSizeAcrossPartitions" : 50000,
"consumerName" : null,
"ackTimeoutMillis" : 0,
"tickDurationMillis" : 1000,
"cryptoFailureAction" : "CONSUME",
"properties" : {
"application" : "pulsar-sink",
"id" : "public/default/pulsar-mongo-sink",
"instance_id" : "0"
},
"readCompacted" : false,
"subscriptionInitialPosition" : "Latest",
"patternAutoDiscoveryPeriod" : 1,
"regexSubscriptionMode" : "PersistentOnly",
"autoUpdatePartitions" : true,
"replicateSubscriptionState" : false,
"resetIncludeHead" : false
}
08:21:56.726 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - Pulsar client config: {
"serviceUrl" : "pulsar://localhost:6650",
"authPluginClassName" : null,
"authParams" : null,
"operationTimeoutMs" : 30000,
"statsIntervalSeconds" : 60,
"numIoThreads" : 1,
"numListenerThreads" : 1,
"connectionsPerBroker" : 1,
"useTcpNoDelay" : true,
"useTls" : false,
"tlsTrustCertsFilePath" : null,
"tlsAllowInsecureConnection" : false,
"tlsHostnameVerificationEnable" : false,
"concurrentLookupRequest" : 5000,
"maxLookupRequest" : 50000,
"maxNumberOfRejectedRequestPerConnection" : 50,
"keepAliveIntervalSeconds" : 30,
"connectionTimeoutMs" : 10000,
"requestTimeoutMs" : 60000,
"defaultBackoffIntervalNanos" : 100000000,
"maxBackoffIntervalNanos" : 30000000000
}
You can use the following methods to debug a connector in cluster mode:
使用连接器日志
在集群模式下,多个连接器可以运行在一个 worker 上。 要找到指定连接器的日志路径,请使用 workerId
来定位连接器日志。
Pulsar admin CLI helps you debug Pulsar connectors with the following subcommands:
创建 Mongo sink
./bin/pulsar-admin sinks create \
--archive pulsar-io-mongo-2.4.0.nar \
--tenant public \
--namespace default \
--inputs test-mongo \
--name pulsar-mongo-sink \
--sink-config-file mongo-sink-config.yaml \
--parallelism 1
get
使用 get
命令,获取 Mongo sink 连接器的基本信息,例如租户、命名空间、名称、并行度等等。
./bin/pulsar-admin sinks get --tenant public --namespace default --name pulsar-mongo-sink
{
"tenant": "public",
"namespace": "default",
"name": "pulsar-mongo-sink",
"className": "org.apache.pulsar.io.mongodb.MongoSink",
"inputSpecs": {
"test-mongo": {
"isRegexPattern": false
}
},
"configs": {
"database": "pulsar",
"collection": "messages",
"batchSize": 2.0,
"batchTimeMs": 500.0
},
"parallelism": 1,
"retainOrdering": false,
"autoAck": true
}
使用 status
命令获取 Mongo sink 连接器的当前状态,例如实例数量、正在运行实例的数量、instanceId、workerId 等。
topics stats
使用 topics stats
命令获取主题及其关联的生产者、消费者的统计信息,如主题是否已收到消息,或者是否有消息积压,或可用权限以及其它关键信息。 All rates are computed over a 1-minute window and are relative to the last completed 1-minute period.
./bin/pulsar-admin topics stats test-mongo
{
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"averageMsgSize" : 0.0,
"storageSize" : 1,
"publishers" : [ ],
"subscriptions" : {
"public/default/pulsar-mongo-sink" : {
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"msgRateRedeliver" : 0.0,
"msgBacklog" : 0,
"blockedSubscriptionOnUnackedMsgs" : false,
"msgDelayed" : 0,
"unackedMessages" : 0,
"type" : "Shared",
"msgRateExpired" : 0.0,
"consumers" : [ {
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"msgRateRedeliver" : 0.0,
"consumerName" : "dffdd",
"availablePermits" : 999,
"unackedMessages" : 0,
"blockedConsumerOnUnackedMsgs" : false,
"metadata" : {
"instance_id" : "0",
"application" : "pulsar-sink",
"id" : "public/default/pulsar-mongo-sink"
},
"connectedSince" : "2019-08-26T08:48:07.582Z",
"clientVersion" : "2.4.0",
"address" : "/172.17.0.3:57790"
} ],
"isReplicated" : false
}
},
"replication" : { },
"deduplicationStatus" : "Disabled"
}
此清单列出了调试连接器时要检查的主要事项。 该清单提醒我们应该注意什么,以确保彻底检查,并作为评估工具获取连接器状态。
Pulsar 是否成功启动?
外部服务运行正常吗?
nar 包是否完整?
连接器配置文件正确吗?
在本地运行模式下,运行连接器并检查控制台输出的信息(连接器日志)。
在集群模式中:
使用
get
命令获取基本信息。使用
status
命令获取当前状态。使用
topics stats
命令获取特定主题及其关联的生产者和消费者的统计信息。检查连接器日志。