Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-4686] SubQueryRemoveRule.matchJoin should correctly rewrite all sub-queries #2385

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jamesstarr
Copy link
Contributor

@jamesstarr jamesstarr commented Mar 31, 2021

Reworking SubQueryRemoveRule.matchJoin to:

  • Validate join type before rewriting
  • Correctly shift filter and then rewrite join condition on right side of the join
  • Rewrite all subqueries in ON conditions

Testing and refactoring:

  • Moved Shifter from RelBuilder to RelOptUtil, so the logic can be reused by SubQueryRemoveRule.matchJoin.
  • Enabled RelOptRulesTest.testExpandJoinIn and RelOptRulesTest.testExpandJoinInComposite.
  • Adding RelOptRulesTest. testJoinOnMultipleCorrelatedQueries

@jamesstarr jamesstarr marked this pull request as draft March 31, 2021 22:44
@jamesstarr jamesstarr force-pushed the CALCITE-1045 branch 2 times, most recently from b24f579 to e8f2271 Compare April 15, 2021 16:28
@jamesstarr jamesstarr changed the title [CALCITE-1045] WIP Supporting SubQueryRemoveRule for Joins [CALCITE-1045] Supporting SubQueryRemoveRule for Joins Apr 15, 2021
@jamesstarr jamesstarr marked this pull request as ready for review April 15, 2021 18:29
jamesstarr and others added 3 commits April 30, 2021 10:49
Moving Shifter to from RelBuilder to RelOptUtil.  Treating all variables
from the left side as correlated variables then apply them subqueries to
thes right side.
@jamesstarr jamesstarr force-pushed the CALCITE-1045 branch 2 times, most recently from de96a1d to 932975f Compare April 30, 2021 18:01
@zabetak
Copy link
Member

zabetak commented May 25, 2021

Hey @jamesstarr , thanks for working on this. Going quickly over the changeset I get the impression that the PR tries to achieve multiple things. Can you highlight/outline the goals of this PR either here or under CALCITE-1045?

@zabetak zabetak self-assigned this May 25, 2021
@jamesstarr
Copy link
Contributor Author

@zabetak, This PR is focused on handling left joins with correlated queries in ON clauses.

Joins with correlated queries are rewritten too as join with the right side contain filter node and correlate node to handle the join's filter in SubQueryRemovelRule. The filter is checked in a loop for correlation ID to handle multiple subqueries. Right and full outer joins are not rewritten because they would logically be incorrect. Shifter, a util class in RelBuilder, was moved to RelOptUtil so the existing logic could be used in SubQueryRemoveRule. RelBuilder also now only considers a join correlated if an ID is provided and actually referenced is found on the right side to the ID.

@jamesstarr
Copy link
Contributor Author

@zabetak, do you have any further feed back?

Turn back on RelOptRulesTest.testExpandJoinIn and testExpandJoinInComposite.
Using RelBuilder over rexbuilder in SubQueryRemoveRule.
Making CorrelationId final.
Not state builder twice.
@jamesstarr jamesstarr changed the title [CALCITE-1045] Supporting SubQueryRemoveRule for Joins [CALCITE-4686] SubQueryRemoveRule.matchJoin should correctly rewrite all sub-queries Jul 7, 2021
@jamesstarr
Copy link
Contributor Author

@zabetak I pulled in master, create separate ticket and cleaned up the code, and rewrote the pull request comment. Let me know if any further changes are needed.

@zabetak
Copy link
Member

zabetak commented Jul 10, 2021

Hey @jamesstarr , thanks for pushing this forward. I think it is in good shape right now. I will try to review and merge soon but not sure if I will make it during the next week.

Comment on lines 642 to 651
switch (join.getJoinType()) {
case INNER:
case LEFT:
case ANTI:
case SEMI:
break;
default:
//Incorrect queries will be produced for full and right joins.
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we cannot treat RIGHT and FULL outer joins I think it makes sense to skip calling onMatch method altogether. Is it possible to move the check as rule predicate or apply the check as part of RelOptRule#matches?

Comment on lines 308 to 310
/** Shifts a predicate's inputs to the left, replacing early
* ones with references to a {@link RexCorrelVariable}. */
public static RexNode correlateLeftShiftRight(RexBuilder rexBuilder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what the method does from the javadoc or method name. I know that you moved existing code but given that now the method is public it would be nice if we could improve the doc. You can possibly include sub-plan examples in the javadoc to aid in the understanding of the method.

RexNode condition = RelOptUtil.correlateLeftShiftRight(builder.getRexBuilder(),
join.getLeft(), id, join.getRight(), join.getCondition());
boolean found = false;
while (true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen that matchFilter shares some similarities with this newly added code. Is it correct to assume that we need to do this also for matchProject? Should we log a follow-up JIRA?

match join and as the predicate subquery join rule.

Improving javadoc for RelOptUtil.
@@ -695,7 +715,7 @@ private static void matchJoin(SubQueryRemoveRule rule, RelOptRuleCall call) {
Config JOIN = EMPTY
.withOperandSupplier(b ->
b.operand(Join.class)
.predicate(RexUtil.SubQueryFinder::containsSubQuery)
.predicate(SubQueryRemoveRule::hasSubQueryAndJoinTypeIsRewritable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two other alternatives for the predicate; not sure if they are better than the current proposal I just like when methods do only one thing:

(join) -> isRewritable(join) && RexUtil.SubQueryFinder.containsSubQuery(join)
(join) -> !join.getJoinType().generatesNullsOnLeft && RexUtil.SubQueryFinder.containsSubQuery(join)

the second is not 100% equivalent but I am not sure if we intent to cover CROSS, LEFT_SEMI_JOIN, COMMA, if it ever reaches this rule.

Not necessarily requesting a change, leaving the final decision to @jamesstarr !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call is reused in assert in matchJoin. So moving to the first one would mean, that it is repeated twice the && containsSubQuery twice while still having a helper function. Concerning !join.getJoinType().generatesNullsOnLeft(), the whole expression would have to be repeated twice, and while unlikely to change, it could allow for drift if some one wants to change the restrictiveness. Furthermore, the phrasing is not particular intuitive.

I will rename hasSubQueryAndJoinTypeIsRewritable to containsSubQueryAndJoinTypeIsRewritable to match the existing verbs and change it use generatesNullsOnLeft() so it will likely work if any further join types are added.

Comment on lines +649 to +650
for (RexSubQuery subQuery = RexUtil.SubQueryFinder.find(condition); subQuery != null;
subQuery = RexUtil.SubQueryFinder.find(condition)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change from while (true) to for :)

Comment on lines 651 to 659
final Set<CorrelationId> variablesSet =
RelOptUtil.getVariablesUsed(subQuery.rel);
final RelOptUtil.Logic logic =
LogicVisitor.find(RelOptUtil.Logic.TRUE, ImmutableList.of(condition), subQuery);
int offset = builder.peek().getRowType().getFieldCount();
final RexNode target = rule.apply(subQuery, variablesSet, logic,
builder, 1, offset);
final RexShuttle shuttle = new ReplaceSubQueryShuttle(subQuery, target);
condition = shuttle.apply(condition);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: The final at this level make the code more verbose with small benefit given that the scope is very small. In some cases removing it could make things fit in one line.

Copy link
Contributor Author

@jamesstarr jamesstarr Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am normally not a big fan of final in this localized of context when all variables are treated as immutable, but when some variables are mutated, having final on all the immutable variables signals that the mutable variables are in fact mutable.

I will try to shift things to a single where possible.

Comment on lines 308 to 316
/** Rewrites an {@link RexNode} on top of a correlated join to the rights context by
* shifting {@link RexInputRef} if they are from the right side, and rewriting
* {@link RexInputRef} to field accesses of {@link RexCorrelVariable}.
* @param left Rel fields to be correlated
* @param id id to correlate
* @param right Rel field to be shifted to the right
* @param rexNode expression to be rewritten
* @return An expression
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion below:

Transforms a join condition over two inputs to a correlated condition over the right input.

Replaces references on the left relation with correlating variables and shifts references on the right relation as needed to be able to put the condition over the right relation.

TODO: Add example similar to the one below.
<pre>{@code
Join[cond=..]
  TableScan[EMP]
  TableScan[DEPT]
}</pre>

<pre>{@code
Operator
  TableScan[EMP]
  Filter[cond=..]
    TableScan[DEPT]
}</pre>

Rename method to either:

transformJoinConditionToCorrelate(RexNode condition, RelNode left, RelNode right, CorrelationId id, RexBuilder builder)
transformJoinToCorrelateCondition(...)
transformConditionFromJoinToCorrelate(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zabetak, I have followed up on your requests: adding an example and renaming the method. Please let me know if you want any further changes.

transformJoinConditionToCorrelate.  Also, updating comment and adding an
example.

Reflowing SubQueryRemoveRule.matchJoin to use single lines where
possible.  Making local variable offset final for consistency.

Renaming SubQueryRemoveRule.hasSubQueryAndJoinTypeIsRewritable to
containsSubQueryAndJoinTypeIsRewritable.  Changing checking join type to
use generatesNullsOnLeft.
@jamesstarr
Copy link
Contributor Author

@zabetak, Could you please merge this.

@jamesstarr
Copy link
Contributor Author

@zabetak, I really do not want this going to the graveyard of calcite pull requests.

@zabetak
Copy link
Member

zabetak commented Aug 25, 2021

Hi @jamesstarr , I am very busy lately but as soon as I get the chance I will make a final pass and merge it.

@zabetak zabetak added the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label Sep 21, 2021
@julianhyde julianhyde force-pushed the main branch 2 times, most recently from 8a5cf83 to cf7f71b Compare June 8, 2023 21:21
@olivrlee
Copy link
Contributor

What is the status on this PR? Was it ready to be merged? cc @zabetak
I've been trying to follow the trail of relevant JIRA tickets

@jamesstarr
Copy link
Contributor Author

jamesstarr commented Jul 11, 2023

What is the status on this PR? Was it ready to be merged? cc @zabetak I've been trying to follow the trail of relevant JIRA tickets

Never mind, I had removed the conflicting test some time ago.

There is bug in one of the test due to nested queries not being correlated correctly. I have a fix for the nested queries here: https://github.com//pull/3042

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants