-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-4686] SubQueryRemoveRule.matchJoin should correctly rewrite all sub-queries #2385
base: main
Are you sure you want to change the base?
Conversation
b24f579
to
e8f2271
Compare
Moving Shifter to from RelBuilder to RelOptUtil. Treating all variables from the left side as correlated variables then apply them subqueries to thes right side.
de96a1d
to
932975f
Compare
Hey @jamesstarr , thanks for working on this. Going quickly over the changeset I get the impression that the PR tries to achieve multiple things. Can you highlight/outline the goals of this PR either here or under CALCITE-1045? |
@zabetak, This PR is focused on handling left joins with correlated queries in ON clauses. Joins with correlated queries are rewritten too as join with the right side contain filter node and correlate node to handle the join's filter in SubQueryRemovelRule. The filter is checked in a loop for correlation ID to handle multiple subqueries. Right and full outer joins are not rewritten because they would logically be incorrect. Shifter, a util class in RelBuilder, was moved to RelOptUtil so the existing logic could be used in SubQueryRemoveRule. RelBuilder also now only considers a join correlated if an ID is provided and actually referenced is found on the right side to the ID. |
@zabetak, do you have any further feed back? |
Turn back on RelOptRulesTest.testExpandJoinIn and testExpandJoinInComposite.
Using RelBuilder over rexbuilder in SubQueryRemoveRule. Making CorrelationId final. Not state builder twice.
@zabetak I pulled in master, create separate ticket and cleaned up the code, and rewrote the pull request comment. Let me know if any further changes are needed. |
Hey @jamesstarr , thanks for pushing this forward. I think it is in good shape right now. I will try to review and merge soon but not sure if I will make it during the next week. |
switch (join.getJoinType()) { | ||
case INNER: | ||
case LEFT: | ||
case ANTI: | ||
case SEMI: | ||
break; | ||
default: | ||
//Incorrect queries will be produced for full and right joins. | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we cannot treat RIGHT
and FULL
outer joins I think it makes sense to skip calling onMatch
method altogether. Is it possible to move the check as rule predicate or apply the check as part of RelOptRule#matches
?
/** Shifts a predicate's inputs to the left, replacing early | ||
* ones with references to a {@link RexCorrelVariable}. */ | ||
public static RexNode correlateLeftShiftRight(RexBuilder rexBuilder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear what the method does from the javadoc or method name. I know that you moved existing code but given that now the method is public it would be nice if we could improve the doc. You can possibly include sub-plan examples in the javadoc to aid in the understanding of the method.
RexNode condition = RelOptUtil.correlateLeftShiftRight(builder.getRexBuilder(), | ||
join.getLeft(), id, join.getRight(), join.getCondition()); | ||
boolean found = false; | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen that matchFilter
shares some similarities with this newly added code. Is it correct to assume that we need to do this also for matchProject
? Should we log a follow-up JIRA?
match join and as the predicate subquery join rule. Improving javadoc for RelOptUtil.
@@ -695,7 +715,7 @@ private static void matchJoin(SubQueryRemoveRule rule, RelOptRuleCall call) { | |||
Config JOIN = EMPTY | |||
.withOperandSupplier(b -> | |||
b.operand(Join.class) | |||
.predicate(RexUtil.SubQueryFinder::containsSubQuery) | |||
.predicate(SubQueryRemoveRule::hasSubQueryAndJoinTypeIsRewritable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two other alternatives for the predicate; not sure if they are better than the current proposal I just like when methods do only one thing:
(join) -> isRewritable(join) && RexUtil.SubQueryFinder.containsSubQuery(join)
(join) -> !join.getJoinType().generatesNullsOnLeft && RexUtil.SubQueryFinder.containsSubQuery(join)
the second is not 100% equivalent but I am not sure if we intent to cover CROSS
, LEFT_SEMI_JOIN
, COMMA
, if it ever reaches this rule.
Not necessarily requesting a change, leaving the final decision to @jamesstarr !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call is reused in assert in matchJoin. So moving to the first one would mean, that it is repeated twice the && containsSubQuery
twice while still having a helper function. Concerning !join.getJoinType().generatesNullsOnLeft()
, the whole expression would have to be repeated twice, and while unlikely to change, it could allow for drift if some one wants to change the restrictiveness. Furthermore, the phrasing is not particular intuitive.
I will rename hasSubQueryAndJoinTypeIsRewritable to containsSubQueryAndJoinTypeIsRewritable to match the existing verbs and change it use generatesNullsOnLeft() so it will likely work if any further join types are added.
for (RexSubQuery subQuery = RexUtil.SubQueryFinder.find(condition); subQuery != null; | ||
subQuery = RexUtil.SubQueryFinder.find(condition)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change from while (true) to for :)
final Set<CorrelationId> variablesSet = | ||
RelOptUtil.getVariablesUsed(subQuery.rel); | ||
final RelOptUtil.Logic logic = | ||
LogicVisitor.find(RelOptUtil.Logic.TRUE, ImmutableList.of(condition), subQuery); | ||
int offset = builder.peek().getRowType().getFieldCount(); | ||
final RexNode target = rule.apply(subQuery, variablesSet, logic, | ||
builder, 1, offset); | ||
final RexShuttle shuttle = new ReplaceSubQueryShuttle(subQuery, target); | ||
condition = shuttle.apply(condition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: The final
at this level make the code more verbose with small benefit given that the scope is very small. In some cases removing it could make things fit in one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am normally not a big fan of final in this localized of context when all variables are treated as immutable, but when some variables are mutated, having final on all the immutable variables signals that the mutable variables are in fact mutable.
I will try to shift things to a single where possible.
/** Rewrites an {@link RexNode} on top of a correlated join to the rights context by | ||
* shifting {@link RexInputRef} if they are from the right side, and rewriting | ||
* {@link RexInputRef} to field accesses of {@link RexCorrelVariable}. | ||
* @param left Rel fields to be correlated | ||
* @param id id to correlate | ||
* @param right Rel field to be shifted to the right | ||
* @param rexNode expression to be rewritten | ||
* @return An expression | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion below:
Transforms a join condition over two inputs to a correlated condition over the right input.
Replaces references on the left relation with correlating variables and shifts references on the right relation as needed to be able to put the condition over the right relation.
TODO: Add example similar to the one below.
<pre>{@code
Join[cond=..]
TableScan[EMP]
TableScan[DEPT]
}</pre>
<pre>{@code
Operator
TableScan[EMP]
Filter[cond=..]
TableScan[DEPT]
}</pre>
Rename method to either:
transformJoinConditionToCorrelate(RexNode condition, RelNode left, RelNode right, CorrelationId id, RexBuilder builder)
transformJoinToCorrelateCondition(...)
transformConditionFromJoinToCorrelate(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zabetak, I have followed up on your requests: adding an example and renaming the method. Please let me know if you want any further changes.
transformJoinConditionToCorrelate. Also, updating comment and adding an example. Reflowing SubQueryRemoveRule.matchJoin to use single lines where possible. Making local variable offset final for consistency. Renaming SubQueryRemoveRule.hasSubQueryAndJoinTypeIsRewritable to containsSubQueryAndJoinTypeIsRewritable. Changing checking join type to use generatesNullsOnLeft.
@zabetak, Could you please merge this. |
@zabetak, I really do not want this going to the graveyard of calcite pull requests. |
Hi @jamesstarr , I am very busy lately but as soon as I get the chance I will make a final pass and merge it. |
8a5cf83
to
cf7f71b
Compare
What is the status on this PR? Was it ready to be merged? cc @zabetak |
Never mind, I had removed the conflicting test some time ago. |
Reworking SubQueryRemoveRule.matchJoin to:
Testing and refactoring: