Exercises for the Confluent Schema Registry Workshop
If you're reading this, you probably know where to find the repo with the instructions, since this is it! Now that you're here, follow these instructions to get ready for the workshop:
-
Install Docker (Mac, Windows) on your system.
-
Mac/Windows only: in Docker’s advanced settings, increase the memory dedicated to Docker to at least 4GB.
-
Confirm that Docker has at least 4GB of memory available to it (on a Windows system, omit the
| grep Memory
and simply look for "Total Memory" in the output):docker system info | grep Memory
Should return a value greater than 4GB - if not, the Kafka stack will probably not work.
-
Smoke test your Docker environment, by running :
docker run -p 8080:8080 hello-world
You should see:
$ docker run -p 8080:8080 hello-world Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world d1725b59e92d: Pull complete Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788 Status: Downloaded newer image for hello-world:latest Hello from Docker! This message shows that your installation appears to be working correctly. [...]
-
-
Clone this repo by typing
git clone https://github.com/confluentinc/schema-registry-workshop
from a terminal. -
From the
schema-registry-workshop
directory (which you just cloned), rundocker-compose pull
. This will kick off no small amount of downloading. It is vitally important that you do this before coming to the workshop. -
After
docker-compose
is done, run the following:cd data/ratings ./gradlew build
This will download dependencies for the data generator.
-
Tools you'll need:
- A text editor
- A means of interacting with a REST API.
curl
works well for this from applicable terminal environments. Various browser plugins can provide a GUI for this same purpose. (Advanced REST Client for Chrome seems to do the trick.) - Tools like
jq
for manipulating JSON can be very handy, but you can use a text editor if all else fails. - A Java IDE may come in handy for the Java portions
In this exercise we'll design an Avro schema, register it in the Confluent Schema Registry, produce and consume events using this schema, and then modify the schema in compatible and incompatible ways.
-
Start up the Docker environment by running
docker-compose up -d
from the project root directory. -
Check out the
data/movies-json.js
file. It schema looks something like this:
- movie_id : int
- title : string
- release_year : int
- genres : array of strings
- actors : array of strings
- directors : array of strings
- composers : array of strings
- screenwriters : array of strings
- production_companies : array of strings
- cinematographer : string
-
Create a minimal
.avsc
definition including onlymovie_id
,title
, andrelease_year
. -
Register the schema in the Confluent Schema Registry.
Instructions can be found in Schema Registry documentation.
We are registering a schema for values, not keys. We will be producing the records to a topic called
movies-raw
, so we'll register the schema under the subjectmovies-raw-value
. It is important to note the details of the Schema Registry API for registering a schema. It says:Request JSON Object: schema – The Avro schema string
Which means that we need to pass to the API a JSON record, with one key "schema" and the value is a string containing our schema. We can't pass the schema itself when registering it.
If you have the
jq
tool, you can use it to wrap our Avro Schema as follows:jq -n --slurpfile schema movies-raw.avsc '$schema | {schema: tostring}'
And then pass the output ofjq
tocurl
with a pipe:jq -n --slurpfile schema movies-raw.avsc '$schema | {schema: tostring}' | curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data @- http://localhost:8081/subjects/movies-raw-value/versions
The output should be an ID. Remember the ID you got, so you can use it when producing and consuming events.
If you can't run
jq
, you can reformat the AVSC file into something that looks like this:{"schema": "{\"type\": \"record\", \"name\": \"movie\", \"fields\" : [{\"name\": \"movie_id\", \"type\": \"long\"},{\"name\": \"title\", \"type\": \"string\"},{\"name\": \"release_year\", \"type\": \"long\"}] }"}
-
Now it is time to produce an event with our schema. We'll use the REST Proxy for that.
You can see few examples for using Rest Proxy. Note that you don't have to include the entire schema in every single message, since the schema is registered, you can just include the ID: https://docs.confluent.io/current/kafka-rest/docs/api.html#post--topics-(string-topic_name)
For example, to produce to the movies topic, we can run:
curl -X POST -H "Content-Type: application/vnd.kafka.avro.v2+json" -H "Accept: application/vnd.kafka.v2+json" --data '{"value_schema_id": 1, "records": [{"value": {"movie":{"movie_id": 1, "title": "Ready Player One", "release_year":2018}}}]}' http://localhost:8082/topics/movies-raw
-
Let's try to consume some messages. If you don't have
curl
, a browser-based REST client will work.curl -H "Accept: application/vnd.kafka.avro.v1+json" "http://localhost:8082/topics/movies-raw/partitions/0/messages?offset=0&count=10"
-
Schema changes
Make some changes to the schema. Try each of these: adding fields, removing fields, and modifying fields. Is the result compatible? Let's check with schema registry:
jq -n --slurpfile schema movies-raw-new.avsc '$schema | {schema: tostring}' |curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data @- http://localhost:8081/compatibility/subjects/movies-raw-value/versions/latest
-
Use the REST Proxy to produce and consume messages with modified schemas, both compatible and in-compatible. Document what happens in each case (adding, removing, modifying).
-
Using a GET to
/config/subject/raw-movies-value
, report on what the schema comptibility is for the subject. (HINT: it should beBACKWARD
.) Change it toFOREWARD
and repeat the in the previous two steps.
In this exercise, we will repeat what we just did with REST, but in Java.
-
Drop the previous environment by running
docker-compose down
thendocker-compose up -d
again. -
Open up the Java project in
data/ratings
. It has a Gradle build that you can run from the command line. If you have a Java IDE, you can generate project files like this:- For Eclipse:
gradlew eclipse
- For IntelliJ IDEA:
gradlew idea
You should now be able to open the project in the IDE of your choice.
- For Eclipse:
-
The file
src/main/avro/movie.avsc
exists, but is not terribly interesting. Populate it with a minimal schema includingmovie_id
,title
, andrelease_year
. NOTE: every time you change the AVSC files, you must rungradlew generateAvroJava
from the command line to regenerate the Java classes. You can see the generated code inbuild/generated-main-avro-java
. -
src/main/java/io/confluent/demo/AvroMovieLoader.java
is a partially functioning Kafka producer that loads and parses data from a text file and produces Avro Movie objects to a Kafka topic. Change this class in the following ways:- Produce to a topic called
raw-movies
- Load the four fields (
movie_id
,title
, andrelease_year
) in theparseMovie()
method - Produce only a single line from the text file, not the whole file (for now)
- Produce to a topic called
-
src/main/java/io/confluent/demo/DemoConsumer.java
is a partially functioning Kafka consumer. Modify it to consume Movie objects and print out their contents. Run this consumer to see the Movies you have produced. -
Add the
directors
field to themovie.avsc
file, rebuild the schema, and modify theDemoProducer
andDemoConsumer
to accommodate the new field. Can you still produce and consume? Try adding anint
field tomovie.avsc
. -
Change the compatibility mode of the value field of
raw-movies
toFORWARD
and remove the fields added in the previous two steps, one at a time. Note the results.