Skip to content

Commit

Permalink
[Doc] Add contents for *Get Started (Schema)* (apache#4859)
Browse files Browse the repository at this point in the history
  • Loading branch information
Anonymitaet authored and sijie committed Aug 2, 2019
1 parent e7195eb commit 9c68f19
Showing 1 changed file with 29 additions and 3 deletions.
32 changes: 29 additions & 3 deletions site2/docs/schema-get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,32 @@ title: Get started
sidebar_label: Get started
---

## Schema Registry

Type safety is extremely important in any application built around a message bus like Pulsar.

Producers and consumers need some kind of mechanism for coordinating types at the topic level to aviod various potential problems arise. For example, serialization and deserialization issues.

Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.

### Client-side approach

Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics.

If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.

Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.

### Server-side approach

Producers and consumers inform the system which data types can be transmitted via the topic.

With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.

Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.

## Why use schema

When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:

* The field does not exist
Expand All @@ -27,7 +53,7 @@ public class User {

When constructing a producer with the _User_ class, you can specify a schema or not as below.

## Without schema
### Without schema

If you construct a producer without specifying a schema, then the producer can only produce messages of type `byte[]`. If you have a POJO class, you need to serialize the POJO into bytes before sending messages.

Expand All @@ -41,7 +67,7 @@ User user = new User(“Tom”, 28);
byte[] message = … // serialize the `user` by yourself;
producer.send(message);
```
## With schema
### With schema

If you construct a producer with specifying a schema, then you can send a class to a topic directly without worrying about how to serialize POJOs into bytes.

Expand All @@ -57,6 +83,6 @@ User user = new User(“Tom”, 28);
producer.send(User);
```

## Summary
### Summary

When constructing a producer with a schema, you do not need to serialize messages into bytes, instead Pulsar schema does this job in the background.

0 comments on commit 9c68f19

Please sign in to comment.