- 从一个或多个 Pulsar topic 中消费消息;
- 将用户提供的处理逻辑应用于每条消息;
- 将运行结果发布到另一个 topic。
Pulsar Functions背后的核心目标是使您能够轻松创建各种级别的复杂的的处理逻辑,而无需部署单独的类似系统(例如 , Apache Heron, , 等等) Pulsar Functions are computing infrastructure of Pulsar messaging system. The core goal is tied to a series of other goals:
- Developer productivity (language-native vs Pulsar Functions SDK functions)
- 简单的故障排查
- 操作简单(不需要外部处理系统)
Inspirations
Pulsar Functions are inspired by (and take cues from) several systems and paradigms:
- “无服务器(Serverless)”和“Function as a Service”(FaaS)云平台,如:、Google Cloud Functions、 等
Pulsar Functions can be described as
- Lambda 样式的 functions
- specifically designed to use Pulsar as a message bus.
Pulsar Functions provide a wide range of functionality, and the core programming model is simple. Functions receive messages from one or more input . Each time a message is received, the function will complete the following tasks.
- 将某些处理逻辑应用到输入并写入到输出:
- An output topic in Pulsar
- Apache BookKeeper
- Write logs to a log topic (potentially for debugging purposes)
- 增量
You can use Pulsar Functions to set up the following processing chain:
- A Python function listens for the topic and “sanitizes” incoming strings (removing extraneous whitespace and converting all characters to lowercase) and then publishes the results to a
sanitized-sentences
topic. - A Java function listens for the
sanitized-sentences
topic, counts the number of times each word appears within a specified time window, and publishes the results to aresults
topic - Finally, a Python function listens for the
results
topic and writes the results to a MySQL table.
If you implement the classic word count example using Pulsar Functions, it looks something like this:
To write the function in Java with Pulsar Functions SDK for Java, you can write the function as follows.
Bundle and build the JAR file to be deployed, and then deploy it in your Pulsar cluster using the as follows.
$ bin/pulsar-admin functions create \
--jar target/my-jar-with-dependencies.jar \
--classname org.example.functions.WordCountFunction \
--tenant public \
--namespace default \
--output persistent://public/default/count
Pulsar Functions are used in many cases. The following is a sophisticated example that involves content-based routing.
For example, a function takes items (strings) as input and publishes them to either a fruits
or vegetables
topic, depending on the item. Or, if an item is neither fruit nor vegetable, a warning is logged to a log topic. The following is a visual representation.
If you implement this routing functionality in Python, it looks something like this:
If this code is stored in ~/router.py
, then you can deploy it in your Pulsar cluster using the as follows.
$ bin/pulsar-admin functions create \
--py ~/router.py \
--classname router.RoutingFunction \
--tenant public \
--namespace default \
--name route-fruit-veg \
Fully Qualified Function Name (FQFN)
Each Pulsar Function has a Fully Qualified Function Name (FQFN) that consists of three elements: the function tenant, namespace, and function name. FQFN looks like this:
FQFNs enable you to create multiple functions with the same name provided that they are in different namespaces.
Currently, you can write Pulsar Functions in Java, Python, and Go. For details, refer to .
Processing guarantees
Pulsar Functions provide three different messaging semantics that you can apply to any function.
创建 Function 时,您可以为 Pulsar Function 设置 processing guarantees 。 The following command creates a function with effectively-once guarantees applied.
$ bin/pulsar-admin functions create \
--name my-effectively-once-function \
--processing-guarantees EFFECTIVELY_ONCE \
# Other function configs
The available options for --processing-guarantees
are:
ATMOST_ONCE
EFFECTIVELY_ONCE