Get started
Type safety is extremely important in any application built around a message bus like Pulsar.
Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arise. For example, serialization and deserialization issues.
Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you’re free to adopt one or the other or to mix and match on a per-topic basis.
Note
Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also “knowing” which types are being transmitted via which topics.
If a producer is sending temperature sensor data on the topic , consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an “out-of-band” basis.
With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
Pulsar has a built-in schema registry that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
Why use schema
When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
The field type has changed (for example, is changed to )
There are a few methods to prevent and overcome these exceptions, for example, you can catch exceptions when parsing errors, which makes code hard to maintain; or you can adopt a schema management system to perform schema evolution, not to break downstream applications, and enforces type safety to max extend in the language you are using, the solution is Pulsar Schema.
Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like to more complex application-specific types.
You can use the User class to define the messages sent to Pulsar topics.
When constructing a producer with the User class, you can specify a schema or not as below.
If you construct a producer without specifying a schema, then the producer can only produce messages of type . If you have a POJO class, you need to serialize the POJO into bytes before sending messages.
Example
If you construct a producer with specifying a schema, then you can send a class to a topic directly without worrying about how to serialize POJOs into bytes.
Example
This example constructs a producer with the JSONSchema, and you can send the User class to topics directly without worrying about how to serialize it into bytes.