Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6437]For druid sql JSON_OBJECT() function results in RUNTIME_FAILURE when querying INFORMATION_SCHEMA.COLUMNS #3821

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AlbericByte
Copy link

@AlbericByte AlbericByte commented Jun 12, 2024

Fix the druid json_object issue.

Description

jira: CALCITE-6437
druid issue: apache/druid#16356

  1. Now in druid, we will construct the SqlFunction instance as following:
private static final SqlFunction SQL_FUNCTION = OperatorConversions
        .operatorBuilder(FUNCTION_NAME)
        .operandTypeChecker(OperandTypes.variadic(SqlOperandCountRanges.from(1)))
        .operandTypeInference((callBinding, returnType, operandTypes) -> {
          RelDataTypeFactory typeFactory = callBinding.getTypeFactory();
          for (int i = 0; i < operandTypes.length; i++) {
            if (i % 2 == 0) {
              operandTypes[i] = typeFactory.createSqlType(SqlTypeName.VARCHAR);
              continue;
            }
            operandTypes[i] = typeFactory.createTypeWithNullability(
                typeFactory.createSqlType(SqlTypeName.ANY),
                true
            );
          }
        })
        .returnTypeInference(NESTED_RETURN_TYPE_INFERENCE)
        .functionCategory(SqlFunctionCategory.SYSTEM)
        .build();
  1. We try to get SqlJsonObjectFunction as following:
 public @Nullable RexCallImplementor get(final SqlOperator operator) {
    if (operator instanceof SqlUserDefinedFunction) {
      org.apache.calcite.schema.Function udf =
          ((SqlUserDefinedFunction) operator).getFunction();
      if (!(udf instanceof ImplementableFunction)) {
        throw new IllegalStateException("User defined function " + operator
            + " must implement ImplementableFunction");
      }
      CallImplementor implementor =
          ((ImplementableFunction) udf).getImplementor();
      return wrapAsRexCallImplementor(implementor);
    } else if (operator instanceof SqlTypeConstructorFunction) {
      return map.get(SqlStdOperatorTable.ROW);
    }
    return map.get(operator);
  }
  1. Here is issue, the operator is SqlFunction, but in the map, the key instance is SqlJsonObjectFunction, the type is not equals, so we can naver get from the map
  2. So i try to fix this in this cr as following in SqlToRelConverter.
    We can overwrite to be SqlJsonObjectFunction
  3. And also probably there is other way to fix:
    add one more construction of SqlJsonObjectFunction as following
public SqlJsonObjectFunction(SqlFunction baseFunction)

and update JsonObjectOperatorConversion in druid as following:

SQL_FUNCTION = SqlJsonObjectFunction(OperatorConversions
        .operatorBuilder(FUNCTION_NAME)
        .operandTypeChecker(OperandTypes.variadic(SqlOperandCountRanges.from(1)))
        .operandTypeInference((callBinding, returnType, operandTypes) -> {
          RelDataTypeFactory typeFactory = callBinding.getTypeFactory();
          for (int i = 0; i < operandTypes.length; i++) {
            if (i % 2 == 0) {
              operandTypes[i] = typeFactory.createSqlType(SqlTypeName.VARCHAR);
              continue;
            }
            operandTypes[i] = typeFactory.createTypeWithNullability(
                typeFactory.createSqlType(SqlTypeName.ANY),
                true
            );
          }
        })
        .returnTypeInference(NESTED_RETURN_TYPE_INFERENCE)
        .functionCategory(SqlFunctionCategory.SYSTEM)
        .build());
  1. I am not sure which one is better, could you guy help to look, or better solution?

@NobiGo
Copy link
Contributor

NobiGo commented Jun 13, 2024

@AlbericByte Please create a new issue in JIRA.

@AlbericByte AlbericByte changed the title Fix the druid json_object issue [CALCITE-6437]Fix the druid json_object issue Jun 13, 2024
@AlbericByte
Copy link
Author

@AlbericByte Please create a new issue in JIRA.

add the jira in the description : CALCITE-6437
Thanks for notification

@mihaibudiu
Copy link
Contributor

In order to make it easier to keep track of the correspondence of issues in JIRA and github the issue title should match exactly the JIRA title.
The commit that fixes the issue should also have the same message.

@mihaibudiu
Copy link
Contributor

Please add a unit test that fails before the fix.
The file SqlOperatorTest may be the right place for it.

@AlbericByte AlbericByte changed the title [CALCITE-6437]Fix the druid json_object issue [CALCITE-6437]For druid sql JSON_OBJECT() function results in RUNTIME_FAILURE when querying INFORMATION_SCHEMA.COLUMNS Jun 16, 2024
@AlbericByte
Copy link
Author

In order to make it easier to keep track of the correspondence of issues in JIRA and github the issue title should match exactly the JIRA title. The commit that fixes the issue should also have the same message.

Got it, thanks for the tips

@AlbericByte
Copy link
Author

AlbericByte commented Jun 23, 2024

Please add a unit test that fails before the fix. The file SqlOperatorTest may be the right place for it.
@mihaibudiu and @NobiGo
added more test case, thanks for help

Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the comment I left, this looks fine

@@ -3247,7 +3247,7 @@ select json_object('deptno': deptno, 'employees': json_arrayagg(json_object('ena
+-------------------------------------------------------------------------------------------+
(6 rows)

!ok
!ok2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change looks strange

@@ -3777,5 +3777,22 @@ select distinct sum(deptno + '1') as deptsum from dept order by 1;
+---------+
(1 row)

!ok

# Test cases for [CALCITE-6437] For druid sql JSON_OBJECT() function results in RUNTIME_FAILURE when querying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SQL can run success without this PR. And according to this change, I doubt this is a good way to fix this issue in Calcite.

@kgyrtkirk
Copy link
Member

the issue arises from the fact that Druid tries to implement JSON_OBJECT with its own SqlFunction

Comparing against SqlStdFunctions.X enforces users of Calcite to use those functions. I think instead of resorting to String comparision it would be more straightforward to compare them based on their SqlKind instead.

So I think another way to solve this is to introduce SqlKind.JSON_OBJECT (and possibly do that for all SqlStdOperators ? )
How does that sound?

@AlbericByte
Copy link
Author

AlbericByte commented Jun 26, 2024

@kgyrtkirk

  • I have tried this solution to add SqlKind.JSON_OBJECT, and add this as following:

public static final Set<SqlKind> FUNCTION = EnumSet.of(OTHER_FUNCTION, ROW, TRIM, LTRIM, RTRIM, CAST, REVERSE, JDBC_FN, JSON_OBJECT, JSON_ARRAY POSITION, CONVERT);

  • But after modification, the test case will be failed

expr("json_arrayagg(json_array(\"column\") format json)") .ok("JSON_ARRAYAGG(JSON_ARRAY(column ABSENT ON NULL) FORMAT JSON ABSENT ON NULL)");
The expected result will be
"JSON_ARRAYAGG((JSON_ARRAY(column ABSENT ON NULL)) FORMAT JSON ABSENT ON NULL)"

  • i need time to figure why there is one more parentheses, still need @NobiGo 'suggestion, does this solution is good or not?

Copy link

@AlbericByte
Copy link
Author

AlbericByte commented Jul 3, 2024

@NobiGo @kgyrtkirk @mihaibudiu in Druid the return type of json_object function is ExpressionType.NESTED_DATA, but in Calcite, the return type of json_object is VARCHAR(2000).

From current calcite logic, if can not find the sqlfunction in RexImpTable, calcite will throw exception. And actually in druid, we have implement json_object in JsonObjectExprMacro. is there any api to register new SqlJsonObjectFunction with NESTED_DATA in druid? so i can register a new SqlJsonObjectFunction with NESTED_DATA output.

i tried to reused SqlStdOperatorTable.JSON_OBJECT in druid, and also add cast or json_query to wrap the json_object, but it will fail because the SqlStdOperatorTable.JSON_OBJECT return varchar, or cast donot support NESTED_DATA.

seems i can not find better solution outside calcite, could you please give me a guidance. Thanks

@mihaibudiu
Copy link
Contributor

What is the status of this PR? Is it ready or more work is needed?

@mihaibudiu
Copy link
Contributor

There are many functions in Calcite which are "overloaded", which behave differently depending on the SQL dialect. You can define a new function, with the same name, but a different implementation for Druid.

Copy link

github-actions bot commented Dec 5, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants