-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-3455] Redundant rule firing for both logical and physical nodes #1543
base: main
Are you sure you want to change the base?
Conversation
operand(Project.class, | ||
operand(Project.class, any())), | ||
operand(LogicalProject.class, | ||
operand(LogicalProject.class, any())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes sounds reasonable to me, but i'm a little worried about the compatibility. It would give us more confidence if some engine like Druid can give feedback that this change does not break something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know who is the right contact for Druid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @zabetak can make some help ~
Quotes Julian's comment in the mailing list:
I'm also expecting if we can make more effort to do a bigger refactoring to make the RelOptRule customization more easier. |
@@ -44,7 +45,7 @@ | |||
|
|||
private SortRemoveConstantKeysRule() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot ~
Besides all rules in existing change (FilterProjectTransposeRule, ProjectMergeRule, SortRemoveConstantKeyRule), shall we include more rules, which only need to be applied for logical nodes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confident that the rules I update here will not cause the change of search space, aka the transformation on physical nodes would result in an equivalent, so it's safe. But we need to be careful when updating other rules. And it'd better be done by the rule author. This change would solve most, if not all, the problems, and we can always come back to update other rules when feel necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explain ~
If I understand correctly, what Julian's proposing is to refactor the RelOptRule interface so it's easier to specify which class of RelNode to match. In general I agree with this idea, but it's orthogonal to what I am fixing here. If we had such interface in place, we would still need to pass in logical class rather than the base class to avoid duplicating rule firing, which is what I am doing here. |
@@ -30,7 +31,7 @@ | |||
class EnumerableLimitRule extends RelOptRule { | |||
EnumerableLimitRule() { | |||
super( | |||
operand(Sort.class, any()), | |||
operand(LogicalSort.class, any()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the javadoc accordingly (org.apache.calcite.rel.logical.LogicalSort
instead of org.apache.calcite.rel.core.Sort
). The same for the rest of the modified rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed. Thx.
@danny0405 had a good feedback - since ProjectMergeRule can also be used by HepPlanner, and in some cases user would use HepPlanner to merge physical project nodes at the end of optimization phase, so I update this change to only disable physical project merge under volcano planer. |
+ " EnumerableAggregate(group=[{0}])\n" | ||
+ " EnumerableTableScan(table=[[s, depts]])\n" | ||
+ " EnumerableCalc(expr#0..4=[{inputs}], expr#5=[100], expr#6=[>($t0, $t5)], expr#7=[$cor5], expr#8=[$t7.deptno], expr#9=[=($t1, $t8)], expr#10=[AND($t6, $t9)], proj#0..2=[{exprs}], $condition=[$t10])\n" | ||
+ " EnumerableCalc(expr#0..4=[{inputs}], expr#5=[$cor4], expr#6=[$t5.deptno], expr#7=[=($t1, $t6)], expr#8=[100], expr#9=[>($t0, $t8)], expr#10=[AND($t7, $t9)], proj#0..2=[{exprs}], $condition=[$t10])\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xndai , can you explain why this plan was changed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new plan is equivalent. The rule firing sequence is changed now, so we get an alternative plan.
@@ -60,7 +62,7 @@ | |||
* {@link org.apache.calcite.rel.core.Correlate} from being de-correlated. | |||
*/ | |||
public static final FilterProjectTransposeRule INSTANCE = | |||
new FilterProjectTransposeRule(Filter.class, Project.class, true, true, | |||
new FilterProjectTransposeRule(LogicalFilter.class, LogicalProject.class, true, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a backwards incompatible change. Please add an entry in the history.md (see breaking changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense (to keep backwards compatibility) to leave the rule's "default" INSTANCE
as it is (matching Filter+Project), and create a separate e.g. LOGICAL_INSTANCE
matching LogicalFilter+LogicalProject?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. I will try that. Thanks @rubenada
// We only merge logical project nodes with VolcanoPlanner. Since LogicalProject | ||
// would be converted into physical nodes later. There's no need to fire this | ||
// rule on physical project nodes again. | ||
if (call.getPlanner() instanceof VolcanoPlanner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in favor of having planner specific behavior inside the rules. I think creating two instances of the rule and passing them to different planners/rulesets is a cleaner solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with @zabetak.
@@ -44,7 +45,7 @@ | |||
|
|||
private SortRemoveConstantKeysRule() { | |||
super( | |||
operand(Sort.class, any()), | |||
operand(LogicalSort.class, any()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this is a backwards incompatible change. Other than that I am more in favor of creating different instances of the rule rather than limiting the rule to match only LogicalSort. We have seen projects were people create their own set of logical operators. Changing the rule like this makes it impossible for such projects to use the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. The rule did work for the projects that inherit from Sort, however, it would stop working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I am working on a revision.
Have you seen https://issues.apache.org/jira/browse/CALCITE-2223 ? |
I don't think this is a duplication. This patch tries to improve the optimization performance by avoid firing redundant rules. It's not particularly related to cycles between RelSubset. |
80f411d
to
ca27fe9
Compare
For EnumerableLimitRule and SortRemoveConstantKeysRule, once they are done with the transformation on logical nodes, there's no need to fire these rules on physical nodes again. And for ProjectMergeRule, ProjectFilterTransposeRule and FilterProjectTransposeRule, we add logical instances which only match logical nodes. We keep this existing instances unchanged for backward compatability purpose.
…ing to logical nodes only
@vlsi @zabetak @rubenada @hsyuan I've updated this patch based on your feedback. Now I create separate LOGICAL_INSTANCE for rules that potentially would only match for logical nodes. Now Calcite users can customize the rule set and use the LOGICAL_INSTANCE if they want. Also updated some unit tests to use the LOGICAL_INSTANCE instead. Please take a look. |
I have a question related to CALCITE-3455 but not necessarily related to this PR: |
I agree in general this is the right direction. Not only the rule should allow specific convention for matching, but also able to create new rels with given conventions. For example, today when we fire ProjectMerge, it will generate another LogicalProject even if two EmmuerableProjects are merged. This doesn't make sense. But in order to do that we should provide a way to register RelFactory into the framework for custom convention. This mechanism is also needed by CALCITE-2970. Having said that, for short term / mid term I feel we can alleviate the problem by restricting certain rules to only match logic nodes, which is what this PR intends to do. |
There was a discussion in the dev list about rules matching different conventions but we didn't reach consensus at that time. |
Thanks @zabetak for the pointer. After reviewing the whole thread, I now understand why @vlsi believes this is dup of 2223. I think "single convention" and "convention preserving" rules are the concepts that capture exactly what we need. For "single convention", although user can specify operand class, some rules today don't have extension points, e.g. ProjectMergeRule. Anyone who would want to update those rule to only match a given convention would have to copy and rewrite the whole rule. Or at least sub-class the existing rules. And then there's no way to support "convention preserving" as the actual convention needs to be defined when defining the rule. Personally I agree with @vlsi that this is one outstanding missing piece of Calcite which affects planning performance quite a bit. Most of the Calcite users would just use the default rule set, and expect them to work out of the box. It would be too much to ask them to rewrite or even sub-class the rules. My patch here provides a temporary solution to address the most pressing need - those rules I updated do get fired a lot in practice. But if we can reach a conclusion on 2223 (looks like we are very close), this patch won't be needed. |
49cb002
to
8768a23
Compare
8a5cf83
to
cf7f71b
Compare
For ProjectMergeRule, EnumerableLimitRule and SortRemoveConstantKeysRule, once they are done with the transformation on logical nodes, there's no need to fire these rules on physical nodes again. So limit these rule matching to be with logical nodes only. This can noticeably reduce optimization latency.