Skip to content

Commit

Permalink
[Issue 5562][docs]Modify schema compatibility check doc (apache#5757)
Browse files Browse the repository at this point in the history
Fixes apache#5562 

Update the schema compatibility check documentation to follow the latest changes from apache#5227
  • Loading branch information
congbobo184 authored and sijie committed Dec 3, 2019
1 parent 9ad9535 commit 180e28a
Show file tree
Hide file tree
Showing 5 changed files with 238 additions and 30 deletions.
Binary file modified site2/docs/assets/schema-consumer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified site2/docs/assets/schema-producer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
171 changes: 170 additions & 1 deletion site2/docs/schema-evolution-compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@ Disable schema evolution, that is, any schema change is rejected.

</tr>

</table>
</table>

#### Example

Expand Down Expand Up @@ -626,6 +626,171 @@ None

In some data formats, for example, Avro, you can define fields with default values. Consequently, adding or removing a field with a default value is a fully compatible change.

## Schema verification

When a producer or a consumer tries to connect to a topic, a broker performs some checks to verify a schema.

### Producer

When a producer tries to connect to a topic (suppose ignore the schema auto creation), a broker does the following checks:

* Check if the schema carried by the producer exists in the schema registry or not.

* If the schema is already registered, then the producer is connected to a broker and produce messages with that schema.

* If the schema is not registered, then Pulsar verifies if the schema is allowed to be registered based on the configured compatibility check strategy.

### Consumer
When a consumer tries to connect to a topic, a broker checks if a carried schema is compatible with a registered schema based on the configured schema compatibility check strategy.

<table style="table">

<tr>

<th>

Compatibility check strategy

</th>

<th>

Check logic

</th>

</tr>

<tr>

<td>

`ALWAYS_COMPATIBLE`

</td>

<td>

All pass

</td>

</tr>

<tr>

<td>

`ALWAYS_INCOMPATIBLE`

</td>

<td>

No pass

</td>

</tr>

<tr>

<td>

`BACKWARD`

</td>

<td>

Can read the last schema

</td>

</tr>

<tr>

<td>

`BACKWARD_TRANSITIVE`

</td>

<td>

Can read all schemas

</td>

</tr>

<tr>

<td>

`FORWARD`

</td>

<td>

Can read the last schema

</td>

</tr>

<tr>

<td>

`FORWARD_TRANSITIVE`

</td>

<td>

Can read the last schema

</td>

</tr>

<tr>

<td>

`FULL`

</td>

<td>

Can read the last schema

</td>

</tr>

<tr>

<td>

`FULL_TRANSITIVE`

</td>

<td>

Can read all schemas

</td>

</tr>

</table>

## Order of upgrading clients

The order of upgrading client applications is determined by the compatibility check strategy.
Expand Down Expand Up @@ -781,3 +946,7 @@ Consequently, you can upgrade the producers and consumers in **any order**.
</tr>

</table>




53 changes: 38 additions & 15 deletions site2/docs/schema-manage.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,46 +36,69 @@ For a producer, the `AutoUpdate` happens in the following cases:

* If a **producer carries a schema**:

A broker performs the compatibility check based on the configured compatibility check strategy of the namespace to which the topic belongs.
A broker performs the compatibility check based on the configured compatibility check strategy of the namespace to which the topic belongs.

* If it is a new schema and it passes the compatibility check, the broker registers a new schema automatically for the topic.

* If the schema does not pass the compatibility check, the broker does not register a schema.
* If the schema is registered, a producer is connected to a broker.

* If the schema is not registered:

* If `isAllowAutoUpdateSchema` sets to **false**, the producer is rejected to connect to a broker.

* If `isAllowAutoUpdateSchema` sets to **true**:

* If the schema passes the compatibility check, then the broker registers a new schema automatically for the topic and the producer is connected.

* If the schema does not pass the compatibility check, then the broker does not register a schema and the producer is rejected to connect to a broker.

![AutoUpdate Producer](assets/schema-autoupdate-producer.png)
![AutoUpdate Producer](assets/schema-producer.png)

### AutoUpdate for consumer

For a consumer, the `AutoUpdate` happens in the following cases:

* If a **consumer connects to a topic without a schema** (which means the consumer receiving raw bytes), the consumer can connect to the topic successfully without doing any compatibility check.

* If a **consumer connects to a topic with a schema**:

* If the **topic is idle** (no producers, no entries, no other consumers and no registered schemas), the broker registers a schema for the topic automatically.
* If a **consumer connects to a topic with a schema**.

* If the **topic is not idle**, the broker verifies if the schema provided by the consumer is compatible with the registered schema of the topic.
* If a topic does not have all of them (a schema/data/a local consumer and a local producer):

* If the **schema passes the compatibility check**, the consumer can connect to the topic and receive messages.
* If `isAllowAutoUpdateSchema` sets to **true**, then the consumer registers a schema and it is connected to a broker.

* If `isAllowAutoUpdateSchema` sets to **false**, then the consumer is rejected to connect to a broker.

* If a topic has one of them (a schema/data/a local consumer and a local producer), then the schema compatibility check is performed.

* If the **schema does not pass the compatibility check**, the consumer is rejected and disconnected.

![AutoUpdate Producer](assets/schema-autoupdate-consumer.png)
* If the schema passes the compatibility check, then the consumer is connected to the broker.

* If the schema does not pass the compatibility check, then the consumer is rejected to connect to the broker.

![AutoUpdate Consumer](assets/schema-consumer.png)


### Manage AutoUpdate strategy

You can use the `pulsar-admin` command to manage the `AutoUpdate` strategy as below:

* [Enable AutoUpdate](#enable-autoupdate)

* [Disable AutoUpdate](#disable-autoupdate)

* [Adjust compatibility](#adjust-compatibility)

#### Enable AutoUpdate

To enable `AutoUpdate` on a namespace, you can use the `pulsar-admin` command.

```bash
bin/pulsar-admin namespaces set-is-allow-auto-update-schema --enable tenant/namespace
```

#### Disable AutoUpdate

To disable `AutoUpdate` on a namespace, you can use the `pulsar-admin` command.

```bash
bin/pulsar-admin namespaces set-schema-autoupdate-strategy --disabled tenant/namespace
bin/pulsar-admin namespaces set-is-allow-auto-update-schema --disable tenant/namespace
```

Once the `AutoUpdate` is disabled, you can only register a new schema using the `pulsar-admin` command.
Expand All @@ -85,7 +108,7 @@ Once the `AutoUpdate` is disabled, you can only register a new schema using the
To adjust the schema compatibility level on a namespace, you can use the `pulsar-admin` command.

```bash
bin/pulsar-admin namespaces set-schema-autoupdate-strategy --compatibility <compatibility-level> tenant/namespace
bin/pulsar-admin namespaces set-schema-compatibility-strategy --compatibility <compatibility-level> tenant/namespace
```

### Schema validation
Expand Down
44 changes: 30 additions & 14 deletions site2/docs/schema-understand.md
Original file line number Diff line number Diff line change
Expand Up @@ -539,14 +539,26 @@ This diagram illustrates how does schema work on the Producer side.
3. The broker looks up the schema in the schema storage to check if it is already a registered schema.

4. If yes, the broker skips the schema validation since it is a known schema, and returns the schema version to the producer.

5. If no, the broker verifies whether a schema can be automatically created in this namespace:

* If `isAllowAutoUpdateSchema` sets to **true**, then a schema can be created, and the broker validates the schema based on the schema compatibility check strategy defined for the topic.

5. If no, the broker validates the schema based on the schema compatibility check strategy defined for the topic.
* If `isAllowAutoUpdateSchema` sets to **false**, then a schema can not be created, and the producer is rejected to connect to the broker.

6. If the schema is compatible, the broker stores it and returns the schema version to the producer.
**Tip**:

`isAllowAutoUpdateSchema` can be set via **Pulsar admin API** or **REST API.**

For how to set `isAllowAutoUpdateSchema` via Pulsar admin API, see [Manage AutoUpdate Strategy](schema-manage.md/#manage-autoupdate-strategy).

6. If the schema is allowed to be updated, then the compatible strategy check is performed.

* If the schema is compatible, the broker stores it and returns the schema version to the producer.

All the messages produced by this producer are tagged with the schema version.

7. If the schema is incompatible, the broker rejects it.
* If the schema is incompatible, the broker rejects it.

### Consumer side

Expand All @@ -559,17 +571,21 @@ This diagram illustrates how does Schema work on the consumer side.
The schema instance defines the schema that the consumer uses for decoding messages received from a broker.

2. The consumer connects to the broker with the `SchemaInfo` extracted from the passed-in schema instance.

3. The broker looks up the schema in the schema storage to check if it is already a registered schema.

4. If yes, the broker skips the schema validation since it is a known schema, and returns the schema version to the consumer.

5. If no, the broker validates the schema based on the schema compatibility check strategy defined for the topic.

6. If the schema is compatible, the broker stores it and returns the schema version to the consumer.

7. If the schema is incompatible, the consumer will be disconnected.
3. The broker determines whether the topic has one of them (a schema/data/a local consumer and a local producer).

4. If a topic does not have all of them (a schema/data/a local consumer and a local producer):

* If `isAllowAutoUpdateSchema` sets to **true**, then the consumer registers a schema and it is connected to a broker.

* If `isAllowAutoUpdateSchema` sets to **false**, then the consumer is rejected to connect to a broker.

5. If a topic has one of them (a schema/data/a local consumer and a local producer), then the schema compatibility check is performed.

* If the schema passes the compatibility check, then the consumer is connected to the broker.

* If the schema does not pass the compatibility check, then the consumer is rejected to connect to the broker.

8. The consumer receives the messages from the broker.
6. The consumer receives messages from the broker.

If the schema used by the consumer supports schema versioning (for example, AVRO schema), the consumer fetches the `SchemaInfo` of the version tagged in messages, and use the passed-in schema and the schema tagged in messages to decode the messages.
If the schema used by the consumer supports schema versioning (for example, AVRO schema), the consumer fetches the `SchemaInfo` of the version tagged in messages and uses the passed-in schema and the schema tagged in messages to decode the messages.

0 comments on commit 180e28a

Please sign in to comment.