-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-3662] Generate wrong SQL when plan contains Project(Sort(Aggregate)) and aggregate field has no alias #1715
base: main
Are you sure you want to change the base?
Conversation
…regate)) and aggregate field has no alias (Lei Jiang)
+ " ORDER BY SAL) AS t5\n" | ||
+ " GROUP BY 'all') AS t8) AS $cor5,\n" | ||
+ " GROUP BY 'all'\n" | ||
+ " ORDER BY SAL) AS t7) AS t8) AS $cor5,\n" | ||
+ " LATERAL UNNEST (SELECT $cor5.X AS $f0\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khaitranq @ihuzenko @danny0405 @vlsi
Could you help me review this test case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently select list is given as '*' when wrapSelect
[1], however it's seems not right for this case.
Shall we set the correct alias for such case ?
I think manipulate the internals of builder
is not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Background info:
Because SELECT is less than FETCH[1], when RelToSqlConverter visits Project, it will wrap Sort(Aggregate) as a subselect. When aggregate field has no alias, the select list of Project will be wrong.
Hi Jin Xing, I referenced method visitAggregate
[2], it meets the same problem.
[1]
//Clauses in a SQL query. Ordered by evaluation order.
//SELECT is set only when there is a NON-TRIVIAL SELECT clause
public enum Clause {
FROM, WHERE, GROUP_BY, HAVING, SELECT, SET_OP, ORDER_BY, FETCH, OFFSET
}
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L289
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation @helloppx
Yes, thee reference is indeed a fix for this case. But I still think a thorough fix is to set the correct alias. I take below case as an example:
If I have sql and corresponding plan as below:
select "product_class_id" + 1, *
from
(
select * from
(select "product_class_id", sum("product_id") from "product" group by "product_class_id" limit 10) A
union all
select * from
(select "product_class_id", sum("product_id") from "product" group by "product_class_id" limit 10) B
) C
LogicalProject(EXPR$0=[+($0, 1)], product_class_id=[$0], EXPR$1=[$1])
LogicalUnion(all=[true])
LogicalProject(product_class_id=[$0], EXPR$1=[$1])
LogicalSort(fetch=[10])
LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
LogicalProject(product_class_id=[$0], product_id=[$1])
JdbcTableScan(table=[[foodmart, product]])
LogicalProject(product_class_id=[$0], EXPR$1=[$1])
LogicalSort(fetch=[10])
LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
LogicalProject(product_class_id=[$0], product_id=[$1])
JdbcTableScan(table=[[foodmart, product]])
RelToSqlConverter will convert the plan to below sql (with the fix bring by this PR):
SELECT "product_class_id" + 1, "product_class_id", SUM("product_id")
FROM (SELECT "product_class_id", SUM("product_id")
FROM "foodmart"."product"
GROUP BY "product_class_id"
FETCH NEXT 10 ROWS ONLY
UNION ALL
SELECT "product_class_id", SUM("product_id")
FROM "foodmart"."product"
GROUP BY "product_class_id"
FETCH NEXT 10 ROWS ONLY) AS "t5"
Obviously it's not correct (the outmost SUM("product_id")
is wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rectify the alias for subquery, above case I mentioned should work well.
How about add a test in |
I did some research. If we want to reproduce this JIRA case, we must use |
…regate)) and aggregate field has no alias (Lei Jiang)
node = as(node, name); | ||
} else { | ||
ordinalMap.put(lowerName, node); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jinxing64 , I added a test case you mentioned: testSelectImplicitAliasByStar
Above change(5 lines) can fix that issue, but I don't know whether it is a good idea, since we have to update select-list of 12 test cases.
Failed tests
Classes
RelToSqlConverterTest. testDb2DialectSelectQueryComplex()
RelToSqlConverterTest. testExceptOperatorForBigQuery()
RelToSqlConverterTest. testExistsWithExpand()
RelToSqlConverterTest. testIntersectOperatorForBigQuery()
RelToSqlConverterTest. testNotExistsWithExpand()
RelToSqlConverterTest. testSelectQueryComplex()
RelToSqlConverterTest. testSelectQueryWithGroupByHaving3()
RelToSqlConverterTest. testUncollectImplicitAlias()
RelToSqlConverterTest. testUnionAllOperatorForBigQuery()
RelToSqlConverterTest. testUnionAllWithNoOperands()
RelToSqlConverterTest. testUnionAllWithNoOperandsUsingOracleDialect()
RelToSqlConverterTest. testUnionOperatorForBigQuery()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What isordinalMap
used for ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @helloppx ~
Thanks a lot for digging into this ~
Are you sure all the failed tests still generate the correct sql by semantics ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that 12 cases are semantically correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ordinalMap
is a "EXPR$x" -> "EXPRESSION"
store: EXPR$1
-> product_id + 1
Subquery save the kv to ordinalMap
and then outer query call AliasContext#field()
to fill select-list by ordinalMap
.
node = as(node, name); | ||
} else { | ||
ordinalMap.put(lowerName, node); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What isordinalMap
used for ?
@@ -919,7 +924,11 @@ public Result visit(TableFunctionScan e) { | |||
final String lowerName = name.toLowerCase(Locale.ROOT); | |||
if (lowerName.startsWith("expr$")) { | |||
// Put it in ordinalMap | |||
ordinalMap.put(lowerName, node); | |||
if (stack.size() != 1) { | |||
node = as(node, name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this as that we don't need to add an alias for the outmost node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic is correct, but, can we have a more simple solution for the SQL, the RelToSqlConverterTest
has many failing test cases(some of which is correct to fix but most are unnecessary change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Danny, +1
node = as(node, name); | ||
} else { | ||
ordinalMap.put(lowerName, node); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @helloppx ~
Thanks a lot for digging into this ~
Are you sure all the failed tests still generate the correct sql by semantics ?
"SELECT \"product_id\", COALESCE(SUM(\"store_sales\"), 0) AS \"EXPR$1\"\n" + | ||
"FROM \"foodmart\".\"sales_fact_1997\"\n" + | ||
"GROUP BY \"product_id\") AS \"t1\""); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another simpler unit test as below
select "shelf_width", * from
(select "shelf_width", "product_id" + 1 from "product"
union all
select "shelf_width", "product_id" from "product") A
But it converted as
SELECT "shelf_width", "shelf_width" AS "shelf_width0", "product_id" + 1
FROM (SELECT "shelf_width", "product_id" + 1
FROM "foodmart"."product"
UNION ALL
SELECT "shelf_width", "product_id"
FROM "foodmart"."product") AS "t1"
As we can see, above conversion is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested that case and it seems right:
SELECT "shelf_width", "shelf_width" AS "shelf_width0", "EXPR$1"
FROM (SELECT "shelf_width", "product_id" + 1 AS "EXPR$1"
FROM "foodmart"."product"
UNION ALL
SELECT "shelf_width", "product_id"
FROM "foodmart"."product") AS "t1"
if (e.getInput() instanceof Sort) { | ||
builder = x.builder(e); | ||
builder.clauses.add(Clause.SELECT); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
builder.clauses.add(Clause.SELECT)
is useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, i believe that code snippet in visit(Sort)
can be removed:
if (stack.size() != 1 && builder.select.getSelectList() == null) {
// Generates explicit column names instead of start(*) for
// non-root order by to avoid ambiguity.
final List<SqlNode> selectList = Expressions.list();
for (RelDataTypeField field : e.getRowType().getFieldList()) {
addSelect(selectList, builder.context.field(field.getIndex()), e.getRowType());
}
builder.select.setSelectList(new SqlNodeList(selectList, POS));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think below fix snippet here is not correct.
if (e.getInput() instanceof Sort) {
builder = x.builder(e);
builder.clauses.add(Clause.SELECT);
} else {
builder = x.builder(e, Clause.SELECT);
}
Above change apply the projects on the select list from Sort
. If Sort
uses ordinal in ORDER BY
clause, the semantics can be wrong.
Check below case
select "p" * (-1) from
(select "product_id" "p", sum("net_weight") "product_id"
from "product"
group by "product_id" order by 1 limit 10) A
It will be converted as
SELECT \"product_id\" * -1
FROM \"foodmart\".\"product\"
GROUP BY \"product_id\"
ORDER BY 1
FETCH NEXT 10 ROWS ONLY
The order of the result is obviously inconsistent with the original Sql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fix the alias, I think we don't need the change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to replace the order by ordinal
with order by alias
to solve this problem completely. Not all the DBs supports order by ordinal, they all support order by alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @jinxing64 , @danny0405
I did some test, it will cause another issue[1] to replace order by ordinal
with order by alias
,
so I deleted the code snippet[2]
[1]
//SqlImplementor#Result
//replace `order by ordinal` with `order by alias`
@Override public SqlNode orderField(int ordinal) {
final SqlNode node = field(ordinal);
if (node instanceof SqlIdentifier && ((SqlIdentifier) node).isSimple()) {
Result result = Result.this;
final String alias = SqlValidatorUtil.getAlias(selectList.get(ordinal), -1);
return new SqlIdentifier(ImmutableList.of(alias), POS);
}
return node;
}
//The above method and [2] will produce wrong SQL
SELECT \"product_id\" * -1
FROM \"foodmart\".\"product\"
GROUP BY \"product_id\"
ORDER BY \"p\"
FETCH NEXT 10 ROWS ONLY
[2]
if (e.getInput() instanceof Sort) {
builder = x.builder(e);
} else {
builder = x.builder(e, Clause.SELECT);
}
@@ -919,7 +924,11 @@ public Result visit(TableFunctionScan e) { | |||
final String lowerName = name.toLowerCase(Locale.ROOT); | |||
if (lowerName.startsWith("expr$")) { | |||
// Put it in ordinalMap | |||
ordinalMap.put(lowerName, node); | |||
if (stack.size() != 1) { | |||
node = as(node, name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic is correct, but, can we have a more simple solution for the SQL, the RelToSqlConverterTest
has many failing test cases(some of which is correct to fix but most are unnecessary change)
.query("select sum(sal)\n" | ||
+ "from scott.emp \n" | ||
+ "group by ename limit 10") | ||
.planHasSql("SELECT SUM(\"SAL\")\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add the tests in RelToSqlConverterTest
.
I add a test case in RelToSqlConverterTest, it seems to produce a correct sql
|
…regate)) and aggregate field has no alias (Lei Jiang)
Hi @yanlin-Lynn @danny0405 [1]
[2]
|
Currently, I'm using |
clone.pop(); | ||
final Frame parent = clone.peek(); | ||
if (parent != null && parent.r instanceof Union) { | ||
// For case: Project(Union(expression without alias)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only Union
, i believe all the SetOp
and BiRel
can have such a simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danny0405 , I have updated the solution and fix the code conflict.
…regate)) and aggregate field has no alias (Lei Jiang)
Can you fix the code conflict ? |
…regate)) and aggregate field has no alias (Lei Jiang)
We can reproduce the case in |
…regate)) and aggregate field has no alias (Lei Jiang)
…regate)) and aggregate field has no alias (Lei Jiang)
Thank you @yanlin-Lynn , I have moved the test case. :D |
…regate)) and aggregate field has no alias (Lei Jiang)
8a5cf83
to
cf7f71b
Compare
https://issues.apache.org/jira/browse/CALCITE-3662