Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unnamed_type error with primary type schema #95

Open
adjivas opened this issue Jun 7, 2022 · 11 comments · May be fixed by #115
Open

unnamed_type error with primary type schema #95

adjivas opened this issue Jun 7, 2022 · 11 comments · May be fixed by #115
Labels
help wanted Extra attention is needed internal API Anything related to the internal API and implementations

Comments

@adjivas
Copy link

adjivas commented Jun 7, 2022

Hello,

Do is it possible to register a primary type' schema with Avrora?

A primary type schema is by example that:

{
    "type": "string",
    "name": "myvalue"
}

When I try to save it with the register_schema_by_name function:

Avrora.Utils.Registrar.register_schema_by_name("myschema-value")

This error happen:

13:02:04.414 [debug] reading schema `myschema-value` from the file /app/priv/schemas/myproject/myschema-value.avsc
{:error, :unnamed_type}
@Strech Strech added the question? Questions or ideas about the library label Jun 11, 2022
@Strech
Copy link
Owner

Strech commented Jun 11, 2022

Hi @adjivas 👋🏼

Initially Avrora was designed to work with Record schemas which allows you to have nested schemas of complex types. But there was no reason to register a primary type (also can't find that definition in Avro specs) yet.

Probably that type can be registered as a part of another Record schema.

@Strech Strech closed this as completed Jun 16, 2022
@adjivas
Copy link
Author

adjivas commented Aug 12, 2022

Hi @Strech, sorry for the late reply.
There you can found a official list and a documented example of Primitive Types from the Avro specs:

{"type": "string"}

Primitive Types exists to the side of Complex Types.

@Strech
Copy link
Owner

Strech commented Sep 6, 2022

Sorry for a long pause, I will take a look @adjivas and drop a message of what we can do about it

@Strech Strech reopened this Sep 6, 2022
@Strech Strech added the help wanted Extra attention is needed label Sep 6, 2022
@Strech
Copy link
Owner

Strech commented Sep 13, 2022

I've re-read the documentation (also check the erlavro source code). And the specification says https://avro.apache.org/docs/1.11.1/specification/#primitive-types

Primitive types have no specified attributes.
Primitive type names are also defined type names. Thus, for example, the schema “string” is equivalent to: {"type": "string"}

Then I checked the erlavro and their parsing mechanism

iex(1)> :avro_json_decoder.decode_schema(~s({"type":"string","name":"MyString"}), allow_bad_references: true)   
{:avro_primitive_type, "string", []}

And the result is exactly as stated in the specification, primitive types can't have any attributes thus I don't see any in the output.

Here is another answer on that topic: https://stackoverflow.com/questions/66210730/aliases-for-primitive-types-in-avro

TL;DR You can't reference primitive type by alias (our new name perse)

If you have further question, feel free to drop them here

@Strech Strech closed this as completed Sep 13, 2022
@rewritten
Copy link
Contributor

It is true that a primitive type cannot have a name, but unnamed types can be used in registries, the clearest example is a union (see https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html#multiple-event-types-in-the-same-topic).

If a registry has a schema that, instead of being a record, is a union (that is, in JSON, an array of other types, typically records), and Avrora is asked to decode a message marked with that id, then it will fail to transform the downloaded JSON into a valid schema to decode, instead returning {:error, :unnamed_type}.

@Strech do you agree that this should be opened back?

I might give it a try in that case.

@Strech
Copy link
Owner

Strech commented Oct 4, 2023

@rewritten could you please provide an example schema you've mentioned? Maybe it's indeed an issue. It also could be that it's just a case so rare/broken that it makes no sense to put effort.

@Strech Strech reopened this Oct 4, 2023
@rewritten
Copy link
Contributor

rewritten commented Dec 4, 2023

Sorry I did not notice the response (not my main work account).

Another very common case is decoding message keys. Usually they are strings or numbers, but still avro-encoded in the message payload. In that case the schema will be just "string".

In case of unions, the schema is an array:

[
  {"type": "record", "name": "foo", "fields": [{"name": "foo_field", "type": "string"}]},
  {"type": "record", "name": "bar", "fields": [{"name": "bar_field", "type": "string"}]}
]

Currently, only schemas that are JSON objects with a "name" field at their root are supported (i.e., records, enums, and surprisingly fixeds), instead unions and basic types fail.

@Strech
Copy link
Owner

Strech commented Dec 4, 2023

Thanks for the additional details. I still need to wrap my head around it, but it feels like an issue now. It would be cool to have a basic failing example, like - this is the schema, here is it in the file, here is how I encode/decode, here is an error.

Because then I can iterate over it until we have a working solution. I will come to this issue right after decoding the logical types issue.

@Strech Strech added internal API Anything related to the internal API and implementations and removed question? Questions or ideas about the library labels Dec 4, 2023
@rewritten
Copy link
Contributor

I have wrapped up a stand-alone script that will show the issue - it starts by showing three types of schema and how they all work with simple decoders from :erlavro. Then it starts Avrora and populates the memory registry with the schema for a record, showing that it has the same behavior. It finally tries to do the same with a union and with a basic type, without succeding.

https://gist.github.com/rewritten/2533573332d1de1e4b568def9c757c42

Let me know if I can help more.

@rewritten
Copy link
Contributor

rewritten commented Dec 5, 2023

I cannot provide an example that includes an actual Confluent registry (for obvious reasons), you will have to trust me in that sense.

@Strech Strech added this to the Pre-major release 0.99 milestone Apr 9, 2024
@Strech
Copy link
Owner

Strech commented Apr 18, 2024

Docs:

  1. https://www.confluent.io/blog/multiple-event-types-in-the-same-kafka-topic/
  2. https://docs.confluent.io/platform/current/schema-registry/develop/api.html#post--subjects-(string-%20subject)-versions

Examples:

["int", "string"]
{
  "schema": "[\"int\",\"string\"]",
  "schemaType": "AVRO",
  "references": []
}

and with a custom type reference

["int", "io.Payment"]
{
  "schema": "[\"int\",\"io.Payment\"]",
  "schemaType": "AVRO",
  "references": [
    {
      "name": "io.Payment",
      "subject": "io.Payment",
      "version": 1
    }
  ]
}

The reference name (anchor) above is 1:1 with the schema name (also the subject), but it could be different.

These examples require Avrora to:

  1. Register untyped schema with a name that would be registered in the schema registry or generate a random name or (check untyped in erlavro)
  2. Have an ability to resolve references differently while parsing/registering schemas (see references new field in Schema Registry)
  3. Part of the reading resolution already exists, but requires better testing

As a bonus, Avrora could fix that by the controlled registration before the schema

{
 "type": "record",
 "namespace": "io.confluent.examples.avro",
 "name": "AllTypes",
 "fields": [
   {
     "name": "oneof_type",
     "type": [
       "io.confluent.examples.avro.Customer",
       "io.confluent.examples.avro.Product",
       "io.confluent.examples.avro.Order"
     ]
   }
 ]
}

This extra level of indirection allows automatic registration of the top-level Avro schema to work properly. However, unlike Protobuf, with Avro, the referenced schemas still need to be registered manually beforehand, as the Avro object does not have the necessary information to allow referenced schemas to be automatically registered.

@Strech Strech linked a pull request Apr 24, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed internal API Anything related to the internal API and implementations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants