Pulsar Functions overview

consume messages from one or more Pulsar topics,
apply a user-supplied processing logic to each message,
publish the results of the computation to another topic.

With Pulsar Functions, you can create complex processing logic without deploying a separate neighboring system (such as Apache Storm, , Apache Flink). Pulsar Functions are computing infrastructure of Pulsar messaging system. The core goal is tied to a series of other goals:

Developer productivity (language-native vs Pulsar Functions SDK functions)
Easy troubleshooting
Operational simplicity (no need for an external processing system)

Inspirations

Pulsar Functions are inspired by (and take cues from) several systems and paradigms:

Stream processing engines such as Apache Storm, , and Apache Flink

Pulsar Functions can be described as

-style functions that are
specifically designed to use Pulsar as a message bus.

Pulsar Functions provide a wide range of functionality, and the core programming model is simple. Functions receive messages from one or more input topics. Each time a message is received, the function will complete the following tasks.

Apply some processing logic to the input and write output to:
- An output topic in Pulsar
Write logs to a log topic (potentially for debugging purposes)
Increment a counter

You can use Pulsar Functions to set up the following processing chain:

A Python function listens for the topic and “sanitizes” incoming strings (removing extraneous whitespace and converting all characters to lowercase) and then publishes the results to a sanitized-sentences topic.
A Java function listens for the sanitized-sentences topic, counts the number of times each word appears within a specified time window, and publishes the results to a results topic
Finally, a Python function listens for the results topic and writes the results to a MySQL table.

If you implement the classic word count example using Pulsar Functions, it looks something like this:

To write the function in Java with , you can write the function as follows.

Bundle and build the JAR file to be deployed, and then deploy it in your Pulsar cluster using the command line as follows.


$ bin/pulsar-admin functions create \
  --jar target/my-jar-with-dependencies.jar \
  --classname org.example.functions.WordCountFunction \
  --tenant public \
  --namespace default \
  --inputs persistent://public/default/sentences \

Pulsar Functions are used in many cases. The following is a sophisticated example that involves content-based routing.

For example, a function takes items (strings) as input and publishes them to either a fruits or vegetables topic, depending on the item. Or, if an item is neither fruit nor vegetable, a warning is logged to a . The following is a visual representation.

If you implement this routing functionality in Python, it looks something like this:

If this code is stored in ~/router.py, then you can deploy it in your Pulsar cluster using the command line as follows.


$ bin/pulsar-admin functions create \
  --py ~/router.py \
  --classname router.RoutingFunction \
  --tenant public \
  --name route-fruit-veg \
  --inputs persistent://public/default/basket-items

SerDe

Fully Qualified Function Name (FQFN)

Each Pulsar Function has a Fully Qualified Function Name (FQFN) that consists of three elements: the function tenant, namespace, and function name. FQFN looks like this:

FQFNs enable you to create multiple functions with the same name provided that they are in different namespaces.

Currently, you can write Pulsar Functions in Java, Python, and Go. For details, refer to Develop Pulsar Functions.

Processing guarantees

Pulsar Functions provide three different messaging semantics that you can apply to any function.

You can set the processing guarantees for a Pulsar Function when you create the Function. The following pulsar-function create command creates a function with effectively-once guarantees applied.


$ bin/pulsar-admin functions create \
  --name my-effectively-once-function \
  --processing-guarantees EFFECTIVELY_ONCE \
  # Other function configs

The available options for --processing-guarantees are:

ATMOST_ONCE
ATLEAST_ONCE

Overview

Pulsar Functions overview

Inspirations

Fully Qualified Function Name (FQFN)

Processing guarantees