-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support flattening and unflattening structured types #79
Conversation
@@ -134,7 +144,14 @@ public RelNode copy(RelTraitSet traitSet, List<RelNode> inputs) { | |||
|
|||
@Override | |||
public void implement(Implementor implementor) throws SQLException { | |||
implementor.setSink(database, table.getQualifiedName(), table.getRowType(), Collections.emptyMap()); | |||
RelDataTypeFactory typeFactory = new SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT); | |||
RelDataType flattened = DataTypeUtils.flatten(table.getRowType(), typeFactory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jogrogan I wonder if this would make KEY ROW(...)
possible? If you were to take a Venice key schema and shove it into a nested record named KEY
, it would get flattened as KEY$FOO, KEY$BAR
automatically. And then in the sink table it would appear as KEY_FOO, KEY_BAR
(see comment below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't thinking of keys when I wrote this, but it seems like it will help. We could have select key$foo from venice.t1
, and the result would be key_foo
in the physical venice sink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that makes sense, I can play around with it once merged
* unchanged. | ||
* | ||
*/ | ||
public static RelDataType flatten(RelDataType dataType, RelDataTypeFactory typeFactory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be need anything complicated here once we support map types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
DataTypeUtils
withflatten
andunflatten
.hoptimator-utils
andhoptimator-avro
artifacts.Details
SQL via JDBC has poor support for structured/nested types. Nested records show up with opaque
STRUCTURED
types, which have limited utility in the SQL CLI or JDBC driver.In order to offer improved support for complex records in data sources such as Avro-encoded Kafka topics, this PR introduces support for flattened types. If a JDBC driver so chooses, it may flatten complex records using a
FOO$BAR$QUX
naming convention. These will automatically get re-structured asFOO Row(BAR Row(QUX ...))
,FOO.BAR.QUX
, orFOO_BAR_QUX
(depending on context) in the output SQL job.Arrays with nested structs are elided as
ANY ARRAY
, which is the closest we can get in JDBC. Primitive arrays are supported, e.g.INTEGER ARRAY
.This means SQL authors need only deal in primitive types, or arrays of primitive types.
N.B. we still support non-flattened data sources, but this requires writing a custom Calcite adapter, for now.
Testing
In addition to new unit tests, the new logic was tested against production Kafka topics with complex Avro schemas. The Kafka JDBC driver was modified to invoke
DataTypeUtils.flatten()
.(Details elided.)
As expected: