Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-3662] Generate wrong SQL when plan contains Project(Sort(Aggregate)) and aggregate field has no alias #1715

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -227,8 +227,12 @@ public Result visit(Project e) {
if (isStar(e.getChildExps(), e.getInput().getRowType(), e.getRowType())) {
return x;
}
final Builder builder =
x.builder(e, Clause.SELECT);
final Builder builder;
if (e.getInput() instanceof Sort) {
builder = x.builder(e);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder.clauses.add(Clause.SELECT) is useless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, i believe that code snippet in visit(Sort) can be removed:

if (stack.size() != 1 && builder.select.getSelectList() == null) {
      // Generates explicit column names instead of start(*) for
      // non-root order by to avoid ambiguity.
      final List<SqlNode> selectList = Expressions.list();
      for (RelDataTypeField field : e.getRowType().getFieldList()) {
        addSelect(selectList, builder.context.field(field.getIndex()), e.getRowType());
      }
      builder.select.setSelectList(new SqlNodeList(selectList, POS));
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think below fix snippet here is not correct.

    if (e.getInput() instanceof Sort) {
      builder = x.builder(e);
      builder.clauses.add(Clause.SELECT);
    } else {
      builder = x.builder(e, Clause.SELECT);
    }

Above change apply the projects on the select list from Sort. If Sort uses ordinal in ORDER BY clause, the semantics can be wrong.
Check below case

select "p" * (-1) from
 (select "product_id" "p", sum("net_weight") "product_id"
 from "product"
 group by "product_id" order by 1 limit 10) A

It will be converted as

SELECT \"product_id\" * -1
FROM \"foodmart\".\"product\"
GROUP BY \"product_id\"
ORDER BY 1
FETCH NEXT 10 ROWS ONLY

The order of the result is obviously inconsistent with the original Sql.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fix the alias, I think we don't need the change here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to replace the order by ordinal with order by alias to solve this problem completely. Not all the DBs supports order by ordinal, they all support order by alias.

Copy link
Contributor Author

@helloppx helloppx Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @jinxing64 , @danny0405
I did some test, it will cause another issue[1] to replace order by ordinal with order by alias,
so I deleted the code snippet[2]

[1]

//SqlImplementor#Result
//replace `order by ordinal` with `order by alias`
@Override public SqlNode orderField(int ordinal) {
  final SqlNode node = field(ordinal);
  if (node instanceof SqlIdentifier && ((SqlIdentifier) node).isSimple()) {
     Result result = Result.this;
     final String alias = SqlValidatorUtil.getAlias(selectList.get(ordinal), -1);
     return new SqlIdentifier(ImmutableList.of(alias), POS);
   }   
   return node;
}   

//The above method and [2] will produce wrong SQL
SELECT \"product_id\" * -1
FROM \"foodmart\".\"product\"
GROUP BY \"product_id\"
ORDER BY \"p\"
FETCH NEXT 10 ROWS ONLY


[2]

if (e.getInput() instanceof Sort) {
  builder = x.builder(e);
} else {
  builder = x.builder(e, Clause.SELECT);
}

builder = x.builder(e, Clause.SELECT);
}
final List<SqlNode> selectList = new ArrayList<>();
for (RexNode ref : e.getChildExps()) {
SqlNode sqlExpr = builder.context.toSql(null, ref);
Expand Down Expand Up @@ -634,15 +638,6 @@ public Result visit(Sort e) {
}
Result x = visitChild(0, e.getInput());
Builder builder = x.builder(e, Clause.ORDER_BY);
if (stack.size() != 1 && builder.select.getSelectList() == null) {
// Generates explicit column names instead of start(*) for
// non-root order by to avoid ambiguity.
final List<SqlNode> selectList = Expressions.list();
for (RelDataTypeField field : e.getRowType().getFieldList()) {
addSelect(selectList, builder.context.field(field.getIndex()), e.getRowType());
}
builder.select.setSelectList(new SqlNodeList(selectList, POS));
}
List<SqlNode> orderByList = Expressions.list();
for (RelFieldCollation field : e.getCollation().getFieldCollations()) {
builder.addOrderItem(orderByList, field);
Expand Down Expand Up @@ -918,8 +913,16 @@ public List<SqlNode> createAsFullOperands(RelDataType rowType, SqlNode leftOpera
String alias = SqlValidatorUtil.getAlias(node, -1);
final String lowerName = name.toLowerCase(Locale.ROOT);
if (lowerName.startsWith("expr$")) {
// Put it in ordinalMap
ordinalMap.put(lowerName, node);
final ArrayDeque<Frame> clone = new ArrayDeque<>(stack);
clone.pop();
final Frame parent = clone.peek();
if (parent != null && parent.r instanceof Union) {
// For case: Project(Union(expression without alias))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only Union, i believe all the SetOp and BiRel can have such a simplification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danny0405 , I have updated the solution and fix the code conflict.

// Project should use the alias to fill its select-list.
node = as(node, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this as that we don't need to add an alias for the outmost node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic is correct, but, can we have a more simple solution for the SQL, the RelToSqlConverterTest has many failing test cases(some of which is correct to fix but most are unnecessary change)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Danny, +1

} else {
ordinalMap.put(lowerName, node);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jinxing64 , I added a test case you mentioned: testSelectImplicitAliasByStar
Above change(5 lines) can fix that issue, but I don't know whether it is a good idea, since we have to update select-list of 12 test cases.

Failed tests
Classes
RelToSqlConverterTest. testDb2DialectSelectQueryComplex()
RelToSqlConverterTest. testExceptOperatorForBigQuery()
RelToSqlConverterTest. testExistsWithExpand()
RelToSqlConverterTest. testIntersectOperatorForBigQuery()
RelToSqlConverterTest. testNotExistsWithExpand()
RelToSqlConverterTest. testSelectQueryComplex()
RelToSqlConverterTest. testSelectQueryWithGroupByHaving3()
RelToSqlConverterTest. testUncollectImplicitAlias()
RelToSqlConverterTest. testUnionAllOperatorForBigQuery()
RelToSqlConverterTest. testUnionAllWithNoOperands()
RelToSqlConverterTest. testUnionAllWithNoOperandsUsingOracleDialect()
RelToSqlConverterTest. testUnionOperatorForBigQuery()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What isordinalMap used for ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @helloppx ~
Thanks a lot for digging into this ~
Are you sure all the failed tests still generate the correct sql by semantics ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that 12 cases are semantically correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ordinalMap is a "EXPR$x" -> "EXPRESSION" store: EXPR$1 -> product_id + 1
Subquery save the kv to ordinalMap and then outer query call AliasContext#field() to fill select-list by ordinalMap.

} else if (alias == null || !alias.equals(name)) {
node = as(node, name);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1145,21 +1145,21 @@ private static String toSql(RelNode root, SqlDialect dialect) {
@Test public void testUnionOperatorForBigQuery() {
final String query = "select mod(11,3) from \"product\"\n"
+ "UNION select 1 from \"product\"";
final String expected = "SELECT MOD(11, 3)\n"
final String expected = "SELECT MOD(11, 3) AS `EXPR$0`\n"
+ "FROM foodmart.product\n"
+ "UNION DISTINCT\n"
+ "SELECT 1\n"
+ "SELECT 1 AS `EXPR$0`\n"
+ "FROM foodmart.product";
sql(query).withBigQuery().ok(expected);
}

@Test public void testUnionAllOperatorForBigQuery() {
final String query = "select mod(11,3) from \"product\"\n"
+ "UNION ALL select 1 from \"product\"";
final String expected = "SELECT MOD(11, 3)\n"
final String expected = "SELECT MOD(11, 3) AS `EXPR$0`\n"
+ "FROM foodmart.product\n"
+ "UNION ALL\n"
+ "SELECT 1\n"
+ "SELECT 1 AS `EXPR$0`\n"
+ "FROM foodmart.product";
sql(query).withBigQuery().ok(expected);
}
Expand Down Expand Up @@ -4344,6 +4344,21 @@ private void checkLiteral2(String expression, String expected) {
sql(query).ok(expected);
}

@Test public void testSelectImplicitAliasFromUnionByStar() {
final String query = "select \"shelf_width\", * from (\n"
+ "select \"shelf_width\", \"product_id\" + 1 from \"product\"\n"
+ "union all\n"
+ "select \"shelf_width\", \"product_id\" from \"product\""
+ ") A";
final String expected = "SELECT \"shelf_width\", \"shelf_width\" AS \"shelf_width0\", \"EXPR$1\"\n"
+ "FROM (SELECT \"shelf_width\", \"product_id\" + 1 AS \"EXPR$1\"\n"
+ "FROM \"foodmart\".\"product\"\n"
+ "UNION ALL\n"
+ "SELECT \"shelf_width\", \"product_id\"\n"
+ "FROM \"foodmart\".\"product\") AS \"t1\"";
sql(query).ok(expected);
}

/** Fluid interface to run tests. */
static class Sql {
private final SchemaPlus schema;
Expand Down
11 changes: 11 additions & 0 deletions core/src/test/java/org/apache/calcite/test/JdbcAdapterTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,17 @@ public class JdbcAdapterTest {
+ "ON \"t\".\"DEPTNO\" = \"t0\".\"DEPTNO\"");
}

@Test public void testGroupByWithLimitPlan() {
CalciteAssert.model(JdbcTest.SCOTT_MODEL)
.query("select sum(sal)\n"
+ "from scott.emp \n"
+ "group by ename limit 10")
.planHasSql("SELECT SUM(\"SAL\")\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add the tests in RelToSqlConverterTest.

+ "FROM \"SCOTT\".\"EMP\"\n"
+ "GROUP BY \"ENAME\"\n"
+ "LIMIT 10");
}

@Test public void testPushDownSort() {
CalciteAssert.model(JdbcTest.SCOTT_MODEL)
.query("select ename \n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -468,14 +468,14 @@ private Fluent pig(String script) {
+ "HIREDATE, SAL, COMM, DEPTNO)) AS A\n"
+ " FROM scott.EMP\n"
+ " GROUP BY DEPTNO) AS $cor4,\n"
+ " LATERAL (SELECT COLLECT(ROW(ENAME, JOB, DEPTNO, SAL)) AS X\n"
+ " FROM (SELECT ENAME, JOB, DEPTNO, SAL\n"
+ " LATERAL (SELECT X\n"
+ " FROM (SELECT 'all' AS $f0, COLLECT(ROW(ENAME, JOB, DEPTNO, SAL)) AS X\n"
+ " FROM UNNEST (SELECT $cor4.A AS $f0\n"
+ " FROM (VALUES (0)) AS t (ZERO)) "
+ "AS t2 (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)\n"
+ " WHERE JOB <> 'CLERK'\n"
+ " ORDER BY SAL) AS t5\n"
+ " GROUP BY 'all') AS t8) AS $cor5,\n"
+ " GROUP BY 'all'\n"
+ " ORDER BY SAL) AS t7) AS t8) AS $cor5,\n"
+ " LATERAL UNNEST (SELECT $cor5.X AS $f0\n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khaitranq @ihuzenko @danny0405 @vlsi
Could you help me review this test case ?

Copy link
Contributor

@jinxing64 jinxing64 Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently select list is given as '*' when wrapSelect [1], however it's seems not right for this case.
Shall we set the correct alias for such case ?
I think manipulate the internals of builder is not a good idea.

[1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L437

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background info:
Because SELECT is less than FETCH[1], when RelToSqlConverter visits Project, it will wrap Sort(Aggregate) as a subselect. When aggregate field has no alias, the select list of Project will be wrong.

Hi Jin Xing, I referenced method visitAggregate[2], it meets the same problem.

[1]
//Clauses in a SQL query. Ordered by evaluation order.
//SELECT is set only when there is a NON-TRIVIAL SELECT clause
public enum Clause {
FROM, WHERE, GROUP_BY, HAVING, SELECT, SET_OP, ORDER_BY, FETCH, OFFSET
}
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L289

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation @helloppx
Yes, thee reference is indeed a fix for this case. But I still think a thorough fix is to set the correct alias. I take below case as an example:
If I have sql and corresponding plan as below:

select "product_class_id" + 1, *
 from
(
   select * from
   (select "product_class_id", sum("product_id") from "product" group by "product_class_id" limit 10) A
   union all
   select * from
   (select "product_class_id", sum("product_id") from "product" group by "product_class_id" limit 10) B
) C
LogicalProject(EXPR$0=[+($0, 1)], product_class_id=[$0], EXPR$1=[$1])
  LogicalUnion(all=[true])
    LogicalProject(product_class_id=[$0], EXPR$1=[$1])
      LogicalSort(fetch=[10])
        LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
          LogicalProject(product_class_id=[$0], product_id=[$1])
            JdbcTableScan(table=[[foodmart, product]])
    LogicalProject(product_class_id=[$0], EXPR$1=[$1])
      LogicalSort(fetch=[10])
        LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
          LogicalProject(product_class_id=[$0], product_id=[$1])
            JdbcTableScan(table=[[foodmart, product]])

RelToSqlConverter will convert the plan to below sql (with the fix bring by this PR):

SELECT "product_class_id" + 1, "product_class_id", SUM("product_id")
FROM (SELECT "product_class_id", SUM("product_id")
FROM "foodmart"."product"
GROUP BY "product_class_id"
FETCH NEXT 10 ROWS ONLY
UNION ALL
SELECT "product_class_id", SUM("product_id")
FROM "foodmart"."product"
GROUP BY "product_class_id"
FETCH NEXT 10 ROWS ONLY) AS "t5"

Obviously it's not correct (the outmost SUM("product_id") is wrong).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rectify the alias for subquery, above case I mentioned should work well.

+ " FROM (VALUES (0)) AS t (ZERO)) "
+ "AS t11 (ENAME, JOB, DEPTNO, SAL) AS t110\n"
Expand Down