Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-3662] Generate wrong SQL when plan contains Project(Sort(Aggregate)) and aggregate field has no alias #1715

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -227,8 +227,13 @@ public Result visit(Project e) {
if (isStar(e.getChildExps(), e.getInput().getRowType(), e.getRowType())) {
return x;
}
final Builder builder =
x.builder(e, Clause.SELECT);
final Builder builder;
if (e.getInput() instanceof Sort) {
builder = x.builder(e);
builder.clauses.add(Clause.SELECT);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder.clauses.add(Clause.SELECT) is useless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, i believe that code snippet in visit(Sort) can be removed:

if (stack.size() != 1 && builder.select.getSelectList() == null) {
      // Generates explicit column names instead of start(*) for
      // non-root order by to avoid ambiguity.
      final List<SqlNode> selectList = Expressions.list();
      for (RelDataTypeField field : e.getRowType().getFieldList()) {
        addSelect(selectList, builder.context.field(field.getIndex()), e.getRowType());
      }
      builder.select.setSelectList(new SqlNodeList(selectList, POS));
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think below fix snippet here is not correct.

    if (e.getInput() instanceof Sort) {
      builder = x.builder(e);
      builder.clauses.add(Clause.SELECT);
    } else {
      builder = x.builder(e, Clause.SELECT);
    }

Above change apply the projects on the select list from Sort. If Sort uses ordinal in ORDER BY clause, the semantics can be wrong.
Check below case

select "p" * (-1) from
 (select "product_id" "p", sum("net_weight") "product_id"
 from "product"
 group by "product_id" order by 1 limit 10) A

It will be converted as

SELECT \"product_id\" * -1
FROM \"foodmart\".\"product\"
GROUP BY \"product_id\"
ORDER BY 1
FETCH NEXT 10 ROWS ONLY

The order of the result is obviously inconsistent with the original Sql.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fix the alias, I think we don't need the change here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to replace the order by ordinal with order by alias to solve this problem completely. Not all the DBs supports order by ordinal, they all support order by alias.

Copy link
Contributor Author

@helloppx helloppx Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @jinxing64 , @danny0405
I did some test, it will cause another issue[1] to replace order by ordinal with order by alias,
so I deleted the code snippet[2]

[1]

//SqlImplementor#Result
//replace `order by ordinal` with `order by alias`
@Override public SqlNode orderField(int ordinal) {
  final SqlNode node = field(ordinal);
  if (node instanceof SqlIdentifier && ((SqlIdentifier) node).isSimple()) {
     Result result = Result.this;
     final String alias = SqlValidatorUtil.getAlias(selectList.get(ordinal), -1);
     return new SqlIdentifier(ImmutableList.of(alias), POS);
   }   
   return node;
}   

//The above method and [2] will produce wrong SQL
SELECT \"product_id\" * -1
FROM \"foodmart\".\"product\"
GROUP BY \"product_id\"
ORDER BY \"p\"
FETCH NEXT 10 ROWS ONLY


[2]

if (e.getInput() instanceof Sort) {
  builder = x.builder(e);
} else {
  builder = x.builder(e, Clause.SELECT);
}

builder = x.builder(e, Clause.SELECT);
}
final List<SqlNode> selectList = new ArrayList<>();
for (RexNode ref : e.getChildExps()) {
SqlNode sqlExpr = builder.context.toSql(null, ref);
Expand Down
11 changes: 11 additions & 0 deletions core/src/test/java/org/apache/calcite/test/JdbcAdapterTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,17 @@ public class JdbcAdapterTest {
+ "ON \"t\".\"DEPTNO\" = \"t0\".\"DEPTNO\"");
}

@Test public void testGroupByWithLimitPlan() {
CalciteAssert.model(JdbcTest.SCOTT_MODEL)
.query("select sum(sal)\n"
+ "from scott.emp \n"
+ "group by ename limit 10")
.planHasSql("SELECT SUM(\"SAL\")\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add the tests in RelToSqlConverterTest.

+ "FROM \"SCOTT\".\"EMP\"\n"
+ "GROUP BY \"ENAME\"\n"
+ "LIMIT 10");
}

@Test public void testPushDownSort() {
CalciteAssert.model(JdbcTest.SCOTT_MODEL)
.query("select ename \n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -468,14 +468,14 @@ private Fluent pig(String script) {
+ "HIREDATE, SAL, COMM, DEPTNO)) AS A\n"
+ " FROM scott.EMP\n"
+ " GROUP BY DEPTNO) AS $cor4,\n"
+ " LATERAL (SELECT COLLECT(ROW(ENAME, JOB, DEPTNO, SAL)) AS X\n"
+ " FROM (SELECT ENAME, JOB, DEPTNO, SAL\n"
+ " LATERAL (SELECT X\n"
+ " FROM (SELECT 'all' AS $f0, COLLECT(ROW(ENAME, JOB, DEPTNO, SAL)) AS X\n"
+ " FROM UNNEST (SELECT $cor4.A AS $f0\n"
+ " FROM (VALUES (0)) AS t (ZERO)) "
+ "AS t2 (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)\n"
+ " WHERE JOB <> 'CLERK'\n"
+ " ORDER BY SAL) AS t5\n"
+ " GROUP BY 'all') AS t8) AS $cor5,\n"
+ " GROUP BY 'all'\n"
+ " ORDER BY SAL) AS t7) AS t8) AS $cor5,\n"
+ " LATERAL UNNEST (SELECT $cor5.X AS $f0\n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khaitranq @ihuzenko @danny0405 @vlsi
Could you help me review this test case ?

Copy link
Contributor

@jinxing64 jinxing64 Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently select list is given as '*' when wrapSelect [1], however it's seems not right for this case.
Shall we set the correct alias for such case ?
I think manipulate the internals of builder is not a good idea.

[1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L437

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background info:
Because SELECT is less than FETCH[1], when RelToSqlConverter visits Project, it will wrap Sort(Aggregate) as a subselect. When aggregate field has no alias, the select list of Project will be wrong.

Hi Jin Xing, I referenced method visitAggregate[2], it meets the same problem.

[1]
//Clauses in a SQL query. Ordered by evaluation order.
//SELECT is set only when there is a NON-TRIVIAL SELECT clause
public enum Clause {
FROM, WHERE, GROUP_BY, HAVING, SELECT, SET_OP, ORDER_BY, FETCH, OFFSET
}
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L289

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation @helloppx
Yes, thee reference is indeed a fix for this case. But I still think a thorough fix is to set the correct alias. I take below case as an example:
If I have sql and corresponding plan as below:

select "product_class_id" + 1, *
 from
(
   select * from
   (select "product_class_id", sum("product_id") from "product" group by "product_class_id" limit 10) A
   union all
   select * from
   (select "product_class_id", sum("product_id") from "product" group by "product_class_id" limit 10) B
) C
LogicalProject(EXPR$0=[+($0, 1)], product_class_id=[$0], EXPR$1=[$1])
  LogicalUnion(all=[true])
    LogicalProject(product_class_id=[$0], EXPR$1=[$1])
      LogicalSort(fetch=[10])
        LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
          LogicalProject(product_class_id=[$0], product_id=[$1])
            JdbcTableScan(table=[[foodmart, product]])
    LogicalProject(product_class_id=[$0], EXPR$1=[$1])
      LogicalSort(fetch=[10])
        LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
          LogicalProject(product_class_id=[$0], product_id=[$1])
            JdbcTableScan(table=[[foodmart, product]])

RelToSqlConverter will convert the plan to below sql (with the fix bring by this PR):

SELECT "product_class_id" + 1, "product_class_id", SUM("product_id")
FROM (SELECT "product_class_id", SUM("product_id")
FROM "foodmart"."product"
GROUP BY "product_class_id"
FETCH NEXT 10 ROWS ONLY
UNION ALL
SELECT "product_class_id", SUM("product_id")
FROM "foodmart"."product"
GROUP BY "product_class_id"
FETCH NEXT 10 ROWS ONLY) AS "t5"

Obviously it's not correct (the outmost SUM("product_id") is wrong).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rectify the alias for subquery, above case I mentioned should work well.

+ " FROM (VALUES (0)) AS t (ZERO)) "
+ "AS t11 (ENAME, JOB, DEPTNO, SAL) AS t110\n"
Expand Down