Skip to content

Commit

Permalink
[SPARK-47904][SQL] Preserve case in Avro schema when using enableStab…
Browse files Browse the repository at this point in the history
…leIdentifiersForUnionType

### What changes were proposed in this pull request?

When `enableStableIdentifiersForUnionType` is enabled, all of the types are lowercased which creates a problem when field types are case-sensitive:

Union type with fields:
```
Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new Schema.Field("F", Schema.create(Type.FLOAT))).asJava)
```

would become

```
struct<member_myenum: string, member_myrecord2: struct<f: float>>
```

but instead should be
```
struct<member_myENUM: string, member_myRecord2: struct<F: float>>
```

### Why are the changes needed?

Fixes a bug of lowercasing the field name (the type portion).

### Does this PR introduce _any_ user-facing change?

Yes, if a user enables `enableStableIdentifiersForUnionType` and has Union types, all fields will preserve the case. Previously, the field names would be all in lowercase.

### How was this patch tested?

I added a test case to verify the new field names.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46126 from sadikovi/SPARK-47904.

Authored-by: Ivan Sadikov <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
sadikovi authored and dongjoon-hyun committed Apr 22, 2024
1 parent f2d0cf2 commit fc0c855
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -208,14 +208,12 @@ object SchemaConverters {
// could be "a" and "A" and we need to distinguish them. In this case, we throw
// an exception.
// Stable id prefix can be empty so the name of the field can be just the type.
val tempFieldName =
s"${stableIdPrefixForUnionType}${s.getName.toLowerCase(Locale.ROOT)}"
if (fieldNameSet.contains(tempFieldName)) {
val tempFieldName = s"${stableIdPrefixForUnionType}${s.getName}"
if (!fieldNameSet.add(tempFieldName.toLowerCase(Locale.ROOT))) {
throw new IncompatibleSchemaException(
"Cannot generate stable indentifier for Avro union type due to name " +
"Cannot generate stable identifier for Avro union type due to name " +
s"conflict of type name ${s.getName}")
}
fieldNameSet.add(tempFieldName)
tempFieldName
} else {
s"member$i"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,7 @@ abstract class AvroSuite
"",
Seq())
}
assert(e.getMessage.contains("Cannot generate stable indentifier"))
assert(e.getMessage.contains("Cannot generate stable identifier"))
}
{
val e = intercept[Exception] {
Expand All @@ -388,7 +388,7 @@ abstract class AvroSuite
"",
Seq())
}
assert(e.getMessage.contains("Cannot generate stable indentifier"))
assert(e.getMessage.contains("Cannot generate stable identifier"))
}
// Two array types or two map types are not allowed in union.
{
Expand Down Expand Up @@ -441,6 +441,33 @@ abstract class AvroSuite
}
}

test("SPARK-47904: Test that field name case is preserved") {
checkUnionStableId(
List(
Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
Schema.createRecord("myRecord", "", null, false,
List[Schema.Field](new Schema.Field("f", Schema.createFixed("myField", "", null, 6)))
.asJava),
Schema.createRecord("myRecord2", "", null, false,
List[Schema.Field](new Schema.Field("F", Schema.create(Type.FLOAT)))
.asJava)),
"struct<member_myENUM: string, member_myRecord: struct<f: binary>, " +
"member_myRecord2: struct<F: float>>",
Seq())

{
val e = intercept[Exception] {
checkUnionStableId(
List(
Schema.createRecord("myRecord", "", null, false, List[Schema.Field]().asJava),
Schema.createRecord("myrecord", "", null, false, List[Schema.Field]().asJava)),
"",
Seq())
}
assert(e.getMessage.contains("Cannot generate stable identifier"))
}
}

test("SPARK-46930: Use custom prefix for stable ids when converting Union type") {
// Test default "member_" prefix.
checkUnionStableId(
Expand Down

0 comments on commit fc0c855

Please sign in to comment.