Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-3662] Generate wrong SQL when plan contains Project(Sort(Aggregate)) and aggregate field has no alias #1715

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

helloppx
Copy link
Contributor

@helloppx helloppx commented Jan 2, 2020

…regate)) and aggregate field has no alias (Lei Jiang)
+ " ORDER BY SAL) AS t5\n"
+ " GROUP BY 'all') AS t8) AS $cor5,\n"
+ " GROUP BY 'all'\n"
+ " ORDER BY SAL) AS t7) AS t8) AS $cor5,\n"
+ " LATERAL UNNEST (SELECT $cor5.X AS $f0\n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khaitranq @ihuzenko @danny0405 @vlsi
Could you help me review this test case ?

Copy link
Contributor

@jinxing64 jinxing64 Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently select list is given as '*' when wrapSelect [1], however it's seems not right for this case.
Shall we set the correct alias for such case ?
I think manipulate the internals of builder is not a good idea.

[1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L437

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background info:
Because SELECT is less than FETCH[1], when RelToSqlConverter visits Project, it will wrap Sort(Aggregate) as a subselect. When aggregate field has no alias, the select list of Project will be wrong.

Hi Jin Xing, I referenced method visitAggregate[2], it meets the same problem.

[1]
//Clauses in a SQL query. Ordered by evaluation order.
//SELECT is set only when there is a NON-TRIVIAL SELECT clause
public enum Clause {
FROM, WHERE, GROUP_BY, HAVING, SELECT, SET_OP, ORDER_BY, FETCH, OFFSET
}
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L289

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation @helloppx
Yes, thee reference is indeed a fix for this case. But I still think a thorough fix is to set the correct alias. I take below case as an example:
If I have sql and corresponding plan as below:

select "product_class_id" + 1, *
 from
(
   select * from
   (select "product_class_id", sum("product_id") from "product" group by "product_class_id" limit 10) A
   union all
   select * from
   (select "product_class_id", sum("product_id") from "product" group by "product_class_id" limit 10) B
) C
LogicalProject(EXPR$0=[+($0, 1)], product_class_id=[$0], EXPR$1=[$1])
  LogicalUnion(all=[true])
    LogicalProject(product_class_id=[$0], EXPR$1=[$1])
      LogicalSort(fetch=[10])
        LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
          LogicalProject(product_class_id=[$0], product_id=[$1])
            JdbcTableScan(table=[[foodmart, product]])
    LogicalProject(product_class_id=[$0], EXPR$1=[$1])
      LogicalSort(fetch=[10])
        LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
          LogicalProject(product_class_id=[$0], product_id=[$1])
            JdbcTableScan(table=[[foodmart, product]])

RelToSqlConverter will convert the plan to below sql (with the fix bring by this PR):

SELECT "product_class_id" + 1, "product_class_id", SUM("product_id")
FROM (SELECT "product_class_id", SUM("product_id")
FROM "foodmart"."product"
GROUP BY "product_class_id"
FETCH NEXT 10 ROWS ONLY
UNION ALL
SELECT "product_class_id", SUM("product_id")
FROM "foodmart"."product"
GROUP BY "product_class_id"
FETCH NEXT 10 ROWS ONLY) AS "t5"

Obviously it's not correct (the outmost SUM("product_id") is wrong).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rectify the alias for subquery, above case I mentioned should work well.

@jinxing64
Copy link
Contributor

How about add a test in RelToSqlConverterTest ?

@helloppx
Copy link
Contributor Author

helloppx commented Jan 3, 2020

How about add a test in RelToSqlConverterTest ?

I did some research. If we want to reproduce this JIRA case, we must use SqlToRelConverter to convert SQL to REL and then use RelToSqlConverter to generate SQL. If we move this case to RelToSqlConverterTest, It is not easy to use SqlToRelTestBase#createTester()#convertSqlToRel(sql) method for this case.

…regate)) and aggregate field has no alias (Lei Jiang)
node = as(node, name);
} else {
ordinalMap.put(lowerName, node);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jinxing64 , I added a test case you mentioned: testSelectImplicitAliasByStar
Above change(5 lines) can fix that issue, but I don't know whether it is a good idea, since we have to update select-list of 12 test cases.

Failed tests
Classes
RelToSqlConverterTest. testDb2DialectSelectQueryComplex()
RelToSqlConverterTest. testExceptOperatorForBigQuery()
RelToSqlConverterTest. testExistsWithExpand()
RelToSqlConverterTest. testIntersectOperatorForBigQuery()
RelToSqlConverterTest. testNotExistsWithExpand()
RelToSqlConverterTest. testSelectQueryComplex()
RelToSqlConverterTest. testSelectQueryWithGroupByHaving3()
RelToSqlConverterTest. testUncollectImplicitAlias()
RelToSqlConverterTest. testUnionAllOperatorForBigQuery()
RelToSqlConverterTest. testUnionAllWithNoOperands()
RelToSqlConverterTest. testUnionAllWithNoOperandsUsingOracleDialect()
RelToSqlConverterTest. testUnionOperatorForBigQuery()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What isordinalMap used for ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @helloppx ~
Thanks a lot for digging into this ~
Are you sure all the failed tests still generate the correct sql by semantics ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that 12 cases are semantically correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ordinalMap is a "EXPR$x" -> "EXPRESSION" store: EXPR$1 -> product_id + 1
Subquery save the kv to ordinalMap and then outer query call AliasContext#field() to fill select-list by ordinalMap.

node = as(node, name);
} else {
ordinalMap.put(lowerName, node);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What isordinalMap used for ?

@@ -919,7 +924,11 @@ public Result visit(TableFunctionScan e) {
final String lowerName = name.toLowerCase(Locale.ROOT);
if (lowerName.startsWith("expr$")) {
// Put it in ordinalMap
ordinalMap.put(lowerName, node);
if (stack.size() != 1) {
node = as(node, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this as that we don't need to add an alias for the outmost node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic is correct, but, can we have a more simple solution for the SQL, the RelToSqlConverterTest has many failing test cases(some of which is correct to fix but most are unnecessary change)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Danny, +1

node = as(node, name);
} else {
ordinalMap.put(lowerName, node);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @helloppx ~
Thanks a lot for digging into this ~
Are you sure all the failed tests still generate the correct sql by semantics ?

"SELECT \"product_id\", COALESCE(SUM(\"store_sales\"), 0) AS \"EXPR$1\"\n" +
"FROM \"foodmart\".\"sales_fact_1997\"\n" +
"GROUP BY \"product_id\") AS \"t1\"");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another simpler unit test as below

select "shelf_width", * from
(select "shelf_width", "product_id" + 1 from "product"
union all
select "shelf_width", "product_id" from "product") A

But it converted as

SELECT "shelf_width", "shelf_width" AS "shelf_width0", "product_id" + 1
FROM (SELECT "shelf_width", "product_id" + 1
FROM "foodmart"."product"
UNION ALL
SELECT "shelf_width", "product_id"
FROM "foodmart"."product") AS "t1"

As we can see, above conversion is not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested that case and it seems right:

SELECT "shelf_width", "shelf_width" AS "shelf_width0", "EXPR$1"
FROM (SELECT "shelf_width", "product_id" + 1 AS "EXPR$1"
FROM "foodmart"."product"
UNION ALL 
SELECT "shelf_width", "product_id"
FROM "foodmart"."product") AS "t1"

if (e.getInput() instanceof Sort) {
builder = x.builder(e);
builder.clauses.add(Clause.SELECT);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder.clauses.add(Clause.SELECT) is useless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, i believe that code snippet in visit(Sort) can be removed:

if (stack.size() != 1 && builder.select.getSelectList() == null) {
      // Generates explicit column names instead of start(*) for
      // non-root order by to avoid ambiguity.
      final List<SqlNode> selectList = Expressions.list();
      for (RelDataTypeField field : e.getRowType().getFieldList()) {
        addSelect(selectList, builder.context.field(field.getIndex()), e.getRowType());
      }
      builder.select.setSelectList(new SqlNodeList(selectList, POS));
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think below fix snippet here is not correct.

    if (e.getInput() instanceof Sort) {
      builder = x.builder(e);
      builder.clauses.add(Clause.SELECT);
    } else {
      builder = x.builder(e, Clause.SELECT);
    }

Above change apply the projects on the select list from Sort. If Sort uses ordinal in ORDER BY clause, the semantics can be wrong.
Check below case

select "p" * (-1) from
 (select "product_id" "p", sum("net_weight") "product_id"
 from "product"
 group by "product_id" order by 1 limit 10) A

It will be converted as

SELECT \"product_id\" * -1
FROM \"foodmart\".\"product\"
GROUP BY \"product_id\"
ORDER BY 1
FETCH NEXT 10 ROWS ONLY

The order of the result is obviously inconsistent with the original Sql.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fix the alias, I think we don't need the change here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to replace the order by ordinal with order by alias to solve this problem completely. Not all the DBs supports order by ordinal, they all support order by alias.

Copy link
Contributor Author

@helloppx helloppx Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @jinxing64 , @danny0405
I did some test, it will cause another issue[1] to replace order by ordinal with order by alias,
so I deleted the code snippet[2]

[1]

//SqlImplementor#Result
//replace `order by ordinal` with `order by alias`
@Override public SqlNode orderField(int ordinal) {
  final SqlNode node = field(ordinal);
  if (node instanceof SqlIdentifier && ((SqlIdentifier) node).isSimple()) {
     Result result = Result.this;
     final String alias = SqlValidatorUtil.getAlias(selectList.get(ordinal), -1);
     return new SqlIdentifier(ImmutableList.of(alias), POS);
   }   
   return node;
}   

//The above method and [2] will produce wrong SQL
SELECT \"product_id\" * -1
FROM \"foodmart\".\"product\"
GROUP BY \"product_id\"
ORDER BY \"p\"
FETCH NEXT 10 ROWS ONLY


[2]

if (e.getInput() instanceof Sort) {
  builder = x.builder(e);
} else {
  builder = x.builder(e, Clause.SELECT);
}

@@ -919,7 +924,11 @@ public Result visit(TableFunctionScan e) {
final String lowerName = name.toLowerCase(Locale.ROOT);
if (lowerName.startsWith("expr$")) {
// Put it in ordinalMap
ordinalMap.put(lowerName, node);
if (stack.size() != 1) {
node = as(node, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic is correct, but, can we have a more simple solution for the SQL, the RelToSqlConverterTest has many failing test cases(some of which is correct to fix but most are unnecessary change)

.query("select sum(sal)\n"
+ "from scott.emp \n"
+ "group by ename limit 10")
.planHasSql("SELECT SUM(\"SAL\")\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add the tests in RelToSqlConverterTest.

@yanlin-Lynn
Copy link
Contributor

I add a test case in RelToSqlConverterTest, it seems to produce a correct sql

@Test public void testGroupByWithLimitPlan() {
    String query = "SELECT sum(\"salary\") FROM \"employee\" group by \"gender\" limit 10";
    sql(query)
        .ok("SELECT SUM(\"salary\")\n"
            + "FROM \"foodmart\".\"employee\"\n"
            + "GROUP BY \"gender\"\n"
            + "FETCH NEXT 10 ROWS ONLY");
  }

…regate)) and aggregate field has no alias (Lei Jiang)
@helloppx
Copy link
Contributor Author

helloppx commented Jan 8, 2020

I add a test case in RelToSqlConverterTest, it seems to produce a correct sql

@Test public void testGroupByWithLimitPlan() {
    String query = "SELECT sum(\"salary\") FROM \"employee\" group by \"gender\" limit 10";
    sql(query)
        .ok("SELECT SUM(\"salary\")\n"
            + "FROM \"foodmart\".\"employee\"\n"
            + "GROUP BY \"gender\"\n"
            + "FETCH NEXT 10 ROWS ONLY");
  }

Hi @yanlin-Lynn @danny0405
If we want to reproduce this JIRA case we need to make sure Sort is input of Project [1].
If we move the case to RelToSqlConverterTest, it will produce a plan [2] so we cannot reproduce the JIRA case.

[1]

JdbcToEnumerableConverter
  JdbcProject(EXPR$0=[$1])
    JdbcSort(fetch=[10])
      JdbcAggregate(group=[{1}], EXPR$0=[SUM($5)])
        JdbcTableScan(table=[[SCOTT, EMP]])

[2]

LogicalSort(fetch=[10])
  LogicalProject(EXPR$0=[$1])
    LogicalAggregate(group=[{0}], EXPR$0=[SUM($1)])
      LogicalProject(brand_name=[$2], net_weight=[$7])
        JdbcTableScan(table=[[foodmart, product]])

@yanlin-Lynn
Copy link
Contributor

Currently, I'm using RelToSqlConverter in my project, so I spent some time on this.
I tried to fix this in buildAggregate, and it's affected 5 test cases in RelToSqlConverterTest.
Here is the prototype of my fix, https://github.com/yanlin-Lynn/calcite/tree/CALCITE-3662, hope it's of some use to you.

clone.pop();
final Frame parent = clone.peek();
if (parent != null && parent.r instanceof Union) {
// For case: Project(Union(expression without alias))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only Union, i believe all the SetOp and BiRel can have such a simplification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danny0405 , I have updated the solution and fix the code conflict.

…regate)) and aggregate field has no alias (Lei Jiang)
@danny0405
Copy link
Contributor

Can you fix the code conflict ?

…regate)) and aggregate field has no alias (Lei Jiang)
@yanlin-Lynn
Copy link
Contributor

We can reproduce the case in RelToSqlConverterTest, you can apply SortProjectTransposeRule.INSTANCE rule to make sure Sort transpose Project

…regate)) and aggregate field has no alias (Lei Jiang)
…regate)) and aggregate field has no alias (Lei Jiang)
@helloppx
Copy link
Contributor Author

helloppx commented Jan 9, 2020

Currently, I'm using RelToSqlConverter in my project, so I spent some time on this.
I tried to fix this in buildAggregate, and it's affected 5 test cases in RelToSqlConverterTest.
Here is the prototype of my fix, https://github.com/yanlin-Lynn/calcite/tree/CALCITE-3662, hope it's of some use to you.

Thank you @yanlin-Lynn , I have moved the test case. :D

…regate)) and aggregate field has no alias (Lei Jiang)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants