forked from mysql/mysql-server
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrange_optimizer.cc
1826 lines (1595 loc) · 67.2 KB
/
range_optimizer.cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/* Copyright (c) 2000, 2024, Oracle and/or its affiliates.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License, version 2.0,
as published by the Free Software Foundation.
This program is designed to work with certain software (including
but not limited to OpenSSL) that is licensed under separate terms,
as designated in a particular file or component or in included license
documentation. The authors of MySQL hereby grant you an additional
permission to link the program and your derivative works with the
separately licensed software that they have either included with
the program or referenced in the documentation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License, version 2.0, for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
/*
TODO:
Fix that MAYBE_KEY are stored in the tree so that we can detect use
of full hash keys for queries like:
select s.id, kws.keyword_id from sites as s,kws where s.id=kws.site_id and
kws.keyword_id in (204,205);
*/
/*
This file contains:
Range/index_merge/groupby-minmax optimizer module
A module that accepts a table, condition, and returns
- an AccessPath that can give a RowIterator, that can be used to retrieve
rows that match the specified condition, or
- a "no records will match the condition" statement.
The module entry point is
test_quick_select()
KeyTupleFormat
~~~~~~~~~~~~~~
The code in this file (and elsewhere) makes operations on key value tuples.
Those tuples are stored in the following format:
The tuple is a sequence of key part values. The length of key part value
depends only on its type (and not depends on the what value is stored)
KeyTuple: keypart1-data, keypart2-data, ...
The value of each keypart is stored in the following format:
keypart_data: [isnull_byte] keypart-value-bytes
If a keypart may have a NULL value (key_part->field->is_nullable() can
be used to check this), then the first byte is a NULL indicator with the
following valid values:
1 - keypart has NULL value.
0 - keypart has non-NULL value.
<questionable-statement> If isnull_byte==1 (NULL value), then the following
keypart->length bytes must be 0.
</questionable-statement>
keypart-value-bytes holds the value. Its format depends on the field type.
The length of keypart-value-bytes may or may not depend on the value being
stored. The default is that length is static and equal to
KEY_PART_INFO::length.
Key parts with (key_part_flag & HA_BLOB_PART) have length depending of the
value:
keypart-value-bytes: value_length value_bytes
The value_length part itself occupies HA_KEY_BLOB_LENGTH=2 bytes.
See key_copy() and key_restore() for code to move data between index tuple
and table record
CAUTION: the above description is only sergefp's understanding of the
subject and may omit some details.
*/
#include "sql/range_optimizer/range_optimizer.h"
#include <float.h>
#include <stdio.h>
#include <string.h>
#include <algorithm>
#include <queue>
#include <set>
#include "field_types.h" // enum_field_types
#include "m_string.h"
#include "my_alloc.h"
#include "my_bitmap.h"
#include "my_compiler.h"
#include "my_dbug.h"
#include "my_sqlcommand.h"
#include "mysql/strings/m_ctype.h"
#include "mysql/udf_registration_types.h"
#include "mysql_com.h"
#include "scope_guard.h"
#include "sql/check_stack.h"
#include "sql/current_thd.h"
#include "sql/field_common_properties.h"
#include "sql/item.h"
#include "sql/item_func.h"
#include "sql/join_optimizer/access_path.h"
#include "sql/key.h" // is_key_used
#include "sql/mem_root_array.h"
#include "sql/mysqld.h"
#include "sql/opt_costmodel.h"
#include "sql/opt_hints.h" // hint_key_state
#include "sql/opt_trace.h" // Opt_trace_array
#include "sql/opt_trace_context.h"
#include "sql/psi_memory_key.h"
#include "sql/range_optimizer/group_index_skip_scan_plan.h"
#include "sql/range_optimizer/index_merge_plan.h"
#include "sql/range_optimizer/index_range_scan_plan.h"
#include "sql/range_optimizer/index_skip_scan_plan.h"
#include "sql/range_optimizer/internal.h"
#include "sql/range_optimizer/path_helpers.h"
#include "sql/range_optimizer/range_analysis.h"
#include "sql/range_optimizer/range_opt_param.h"
#include "sql/range_optimizer/range_optimizer.h"
#include "sql/range_optimizer/rowid_ordered_retrieval_plan.h"
#include "sql/range_optimizer/tree.h"
#include "sql/sql_class.h" // THD
#include "sql/sql_lex.h"
#include "sql/sql_list.h"
#include "sql/sql_optimizer.h" // JOIN
#include "sql/sql_select.h"
#include "sql/system_variables.h"
#include "sql/table.h"
#include "sql/thr_malloc.h"
#include "sql/uniques.h" // Unique
using std::min;
static AccessPath *get_best_disjunct_quick(
THD *thd, RANGE_OPT_PARAM *param, TABLE *table,
bool index_merge_union_allowed, bool index_merge_sort_union_allowed,
bool index_merge_intersect_allowed, bool skip_records_in_range,
SEL_IMERGE *imerge, const double cost_est, Key_map *needed_reg);
#ifndef NDEBUG
static void print_quick(AccessPath *path, const Key_map *needed_reg);
#endif
namespace opt_range {
SEL_ARG *null_element = nullptr;
}
using namespace opt_range;
void range_optimizer_init() {
null_element = new SEL_ARG;
null_element->color =
SEL_ARG::BLACK; // Don't trip up the test in test_rb_tree.
}
void range_optimizer_free() { delete null_element; }
/*
Add SEL_TREE to this index_merge without any checks,
NOTES
This function implements the following:
(x_1||...||x_N) || t = (x_1||...||x_N||t), where x_i, t are SEL_TREEs
RETURN
true on OOM.
*/
bool SEL_IMERGE::or_sel_tree(SEL_TREE *tree) { return trees.push_back(tree); }
/*
Perform OR operation on this SEL_IMERGE and supplied SEL_TREE new_tree,
combining new_tree with one of the trees in this SEL_IMERGE if they both
have SEL_ARGs for the same key.
SYNOPSIS
or_sel_tree_with_checks()
param RANGE_OPT_PARAM from test_quick_select
remove_jump_scans See get_mm_tree()
new_tree SEL_TREE with type KEY or KEY_SMALLER.
NOTES
This does the following:
(t_1||...||t_k)||new_tree =
either
= (t_1||...||t_k||new_tree)
or
= (t_1||....||(t_j|| new_tree)||...||t_k),
where t_i, y are SEL_TREEs.
new_tree is combined with the first t_j it has a SEL_ARG on common
key with. As a consequence of this, choice of keys to do index_merge
read may depend on the order of conditions in WHERE part of the query.
RETURN
0 OK
1 One of the trees was combined with new_tree to SEL_TREE::ALWAYS,
and (*this) should be discarded.
-1 An error occurred.
*/
int SEL_IMERGE::or_sel_tree_with_checks(RANGE_OPT_PARAM *param,
bool remove_jump_scans,
SEL_TREE *new_tree) {
DBUG_TRACE;
for (SEL_TREE *&tree : trees) {
if (sel_trees_can_be_ored(tree, new_tree, param)) {
tree = tree_or(param, remove_jump_scans, tree, new_tree);
if (tree == nullptr) return 1;
if (tree->type == SEL_TREE::ALWAYS) return 1;
/* SEL_TREE::IMPOSSIBLE is impossible here */
return 0;
}
}
/* New tree cannot be combined with any of existing trees. */
if (or_sel_tree(new_tree)) {
return -1;
} else {
return 0;
}
}
/*
Perform OR operation on this index_merge and supplied index_merge list.
RETURN
0 - OK
1 - One of conditions in result is always true and this SEL_IMERGE
should be discarded.
-1 - An error occurred
*/
int SEL_IMERGE::or_sel_imerge_with_checks(RANGE_OPT_PARAM *param,
bool remove_jump_scans,
SEL_IMERGE *imerge) {
for (SEL_TREE *tree : imerge->trees) {
int ret = or_sel_tree_with_checks(param, remove_jump_scans, tree);
if (ret != 0) {
return ret;
}
}
return 0;
}
SEL_IMERGE::SEL_IMERGE(SEL_IMERGE *arg, RANGE_OPT_PARAM *param)
: trees(param->temp_mem_root, arg->trees) {}
void trace_quick_description(const AccessPath *path, Opt_trace_context *trace) {
Opt_trace_object range_trace(trace, "range_details");
String range_info;
range_info.set_charset(system_charset_info);
add_info_string(path, &range_info);
range_trace.add_utf8("used_index", range_info.ptr(), range_info.length());
}
QUICK_RANGE::QUICK_RANGE()
: min_key(nullptr),
max_key(nullptr),
min_length(0),
max_length(0),
flag(NO_MIN_RANGE | NO_MAX_RANGE),
rkey_func_flag(HA_READ_INVALID),
min_keypart_map(0),
max_keypart_map(0) {}
QUICK_RANGE::QUICK_RANGE(MEM_ROOT *mem_root, const uchar *min_key_arg,
uint min_length_arg, key_part_map min_keypart_map_arg,
const uchar *max_key_arg, uint max_length_arg,
key_part_map max_keypart_map_arg, uint flag_arg,
enum ha_rkey_function rkey_func_flag_arg)
: min_key(nullptr),
max_key(nullptr),
min_length((uint16)min_length_arg),
max_length((uint16)max_length_arg),
flag((uint16)flag_arg),
rkey_func_flag(rkey_func_flag_arg),
min_keypart_map(min_keypart_map_arg),
max_keypart_map(max_keypart_map_arg) {
min_key = mem_root->ArrayAlloc<uchar>(min_length_arg + 1);
max_key = mem_root->ArrayAlloc<uchar>(max_length_arg + 1);
if (min_key != nullptr) {
memcpy(min_key, min_key_arg, min_length_arg + 1);
}
if (max_key != nullptr) {
memcpy(max_key, max_key_arg, max_length_arg + 1);
}
}
bool setup_range_optimizer_param(THD *thd, MEM_ROOT *return_mem_root,
MEM_ROOT *temp_mem_root, Key_map keys_to_use,
TABLE *table, Query_block *query_block,
RANGE_OPT_PARAM *param) {
param->table = table;
param->query_block = query_block;
param->keys = 0;
param->return_mem_root = return_mem_root;
param->temp_mem_root = temp_mem_root;
param->using_real_indexes = true;
param->use_index_statistics = false;
temp_mem_root->set_max_capacity(thd->variables.range_optimizer_max_mem_size);
temp_mem_root->set_error_for_capacity_exceeded(true);
// These are being stored in AccessPaths, so they need to be on
// return_mem_root.
param->real_keynr = return_mem_root->ArrayAlloc<uint>(table->s->keys);
param->key = return_mem_root->ArrayAlloc<KEY_PART *>(table->s->keys);
param->key_parts = return_mem_root->ArrayAlloc<KEY_PART>(table->s->key_parts);
if (param->real_keynr == nullptr || param->key == nullptr ||
param->key_parts == nullptr) {
return true; // Can't use range
}
KEY_PART *key_parts = param->key_parts;
Opt_trace_context *const trace = &thd->opt_trace;
{
Opt_trace_array trace_idx(trace, "potential_range_indexes",
Opt_trace_context::RANGE_OPTIMIZER);
/*
Make an array with description of all key parts of all table keys.
This is used in get_mm_parts function.
*/
KEY *key_info = table->key_info;
for (uint idx = 0; idx < table->s->keys; idx++, key_info++) {
Opt_trace_object trace_idx_details(trace);
trace_idx_details.add_utf8("index", key_info->name);
KEY_PART_INFO *key_part_info;
if (!keys_to_use.is_set(idx)) {
trace_idx_details.add("usable", false)
.add_alnum("cause", "not_applicable");
continue;
}
if (hint_key_state(thd, table->pos_in_table_list, idx, NO_RANGE_HINT_ENUM,
0)) {
trace_idx_details.add("usable", false)
.add_alnum("cause", "no_range_optimization hint");
continue;
}
if (key_info->flags & HA_FULLTEXT) {
trace_idx_details.add("usable", false).add_alnum("cause", "fulltext");
continue; // ToDo: ft-keys in non-ft ranges, if possible SerG
}
trace_idx_details.add("usable", true);
param->key[param->keys] = key_parts;
key_part_info = key_info->key_part;
Opt_trace_array trace_keypart(trace, "key_parts");
for (uint part = 0; part < actual_key_parts(key_info);
part++, key_parts++, key_part_info++) {
key_parts->key = param->keys;
key_parts->part = part;
key_parts->length = key_part_info->length;
key_parts->store_length = key_part_info->store_length;
key_parts->field = key_part_info->field;
key_parts->null_bit = key_part_info->null_bit;
key_parts->image_type = (part < key_info->user_defined_key_parts &&
key_info->flags & HA_SPATIAL)
? Field::itMBR
: Field::itRAW;
/* Only HA_PART_KEY_SEG is used */
key_parts->flag = key_part_info->key_part_flag;
trace_keypart.add_utf8(
get_field_name_or_expression(thd, key_part_info->field));
}
param->real_keynr[param->keys++] = idx;
}
}
param->key_parts_end = key_parts;
return false;
}
/*
Test if a key can be used in different ranges, and create the QUICK
access method (range, index merge etc) that is estimated to be
cheapest unless table/index scan is even cheaper (exception: @see
parameter force_quick_range).
SYNOPSIS
test_quick_select()
thd Current thread
return_mem_root MEM_ROOT to allocate AccessPaths, RowIterators and
dependent information on (ie., permanent artifacts
that must live on after the range optimizer
has finished executing).
temp_mem_root MEM_ROOT to use for temporary data. Should usually
be empty on entry, as we we will set memory limits
on it. The primary reason why it's declared in the
caller is that DynamicRangeIterator can clear it
and reuse its memory between calls.
keys_to_use Keys to use for range retrieval
prev_tables Tables assumed to be already read when the scan is
performed (but not read at the moment of this call),
including const tables. Otherwise 0.
read_tables If invoked during execution: tables already read
for this join (so values can be assumed to be present).
Otherwise 0.
limit Query limit
force_quick_range Prefer to use range (instead of full table scan) even
if it is more expensive.
interesting_order The sort order the range access method must be able
to provide. Three-value logic: asc/desc/don't care
table The table to optimize over.
skip_records_in_range
Same as QEP_TAB::m_skip_records_in_range.
cond The condition to optimize for, if any.
needed_reg this info is used in make_join_query_block() even if
there is no quick.
ignore_table_scan Disregard table scan while looking for range.
query_block The block the given table is part of.
path [out] Calculated AccessPath, or nullptr.
NOTES
Updates the following:
needed_reg - Bits for keys with may be used if all prev regs are read
In the table struct the following information is updated:
quick_keys - Which keys can be used
quick_rows - How many rows the key matches
quick_condition_rows - E(# rows that will satisfy the table condition)
IMPLEMENTATION
quick_condition_rows value is obtained as follows:
It is a minimum of E(#output rows) for all considered table access
methods (range and index_merge accesses over various indexes).
The obtained value is not a true E(#rows that satisfy table condition)
but rather a pessimistic estimate. To obtain a true E(#...) one would
need to combine estimates of various access methods, taking into account
correlations between sets of rows they will return.
For example, if values of tbl.key1 and tbl.key2 are independent (a right
assumption if we have no information about their correlation) then the
correct estimate will be:
E(#rows("tbl.key1 < c1 AND tbl.key2 < c2")) =
= E(#rows(tbl.key1 < c1)) / total_rows(tbl) * E(#rows(tbl.key2 < c2)
which is smaller than
MIN(E(#rows(tbl.key1 < c1), E(#rows(tbl.key2 < c2)))
which is currently produced.
TODO
* Change the value returned in quick_condition_rows from a pessimistic
estimate to true E(#rows that satisfy table condition).
(we can re-use some of E(#rows) calcuation code from
index_merge/intersection for this)
* Check if this function really needs to modify keys_to_use, and change the
code to pass it by reference if it doesn't.
* In addition to force_quick_range other means can be (an usually are) used
to make this function prefer range over full table scan. Figure out if
force_quick_range is really needed.
RETURN
-1 if impossible select (i.e. certainly no rows will be selected)
0 if can't use quick_select
1 if found usable ranges and quick select has been successfully created.
@note After this call, caller may decide to really use the returned QUICK,
by calling QEP_TAB::set_range_scan() and updating tab->type() if appropriate.
*/
int test_quick_select(THD *thd, MEM_ROOT *return_mem_root,
MEM_ROOT *temp_mem_root, Key_map keys_to_use,
table_map prev_tables, table_map read_tables,
ha_rows limit, bool force_quick_range,
const enum_order interesting_order, TABLE *table,
bool skip_records_in_range, Item *cond,
Key_map *needed_reg, bool ignore_table_scan,
Query_block *query_block, AccessPath **path) {
DBUG_TRACE;
*path = nullptr;
needed_reg->clear_all();
if (keys_to_use.is_clear_all()) return 0;
DBUG_PRINT("enter", ("keys_to_use: %lu prev_tables: %lu ",
(ulong)keys_to_use.to_ulonglong(), (ulong)prev_tables));
const Cost_model_server *const cost_model = thd->cost_model();
ha_rows records = table->file->stats.records;
if (!records) records++; /* purecov: inspected */
double scan_time =
cost_model->row_evaluate_cost(static_cast<double>(records)) + 1;
Cost_estimate cost_est = table->file->table_scan_cost();
cost_est.add_io(1.1);
cost_est.add_cpu(scan_time);
if (ignore_table_scan) {
scan_time = DBL_MAX;
cost_est.set_max_cost();
}
if (limit < records) {
cost_est.reset();
// Force to use index
cost_est.add_io(
table->cost_model()->page_read_cost(static_cast<double>(records)) + 1);
cost_est.add_cpu(scan_time);
} else if (cost_est.total_cost() <= 2.0 && !force_quick_range)
return 0; /* No need for quick select */
Opt_trace_context *const trace = &thd->opt_trace;
Opt_trace_object trace_range(trace, "range_analysis");
Opt_trace_object(trace, "table_scan")
.add("rows", table->file->stats.records)
.add("cost", cost_est);
keys_to_use.intersect(table->keys_in_use_for_query);
if (keys_to_use.is_clear_all()) return 0;
/*
Use the 3 multiplier as range optimizer allocates big RANGE_OPT_PARAM
structure and may evaluate a subquery expression
TODO During the optimization phase we should evaluate only inexpensive
single-lookup subqueries.
*/
uchar buff[STACK_BUFF_ALLOC];
if (check_stack_overrun(thd, 3 * STACK_MIN_SIZE + sizeof(RANGE_OPT_PARAM),
buff)) {
return 0; // Fatal error flag is set
}
/* set up parameter that is passed to all functions */
RANGE_OPT_PARAM param;
if (setup_range_optimizer_param(thd, return_mem_root, temp_mem_root,
keys_to_use, table, query_block, ¶m)) {
return 0;
}
thd->push_internal_handler(¶m.error_handler);
auto cleanup = create_scope_guard([thd] { thd->pop_internal_handler(); });
/*
Set index_merge_allowed from OPTIMIZER_SWITCH_INDEX_MERGE.
Notice also that OPTIMIZER_SWITCH_INDEX_MERGE disables all
index merge sub strategies.
*/
const bool index_merge_allowed =
thd->optimizer_switch_flag(OPTIMIZER_SWITCH_INDEX_MERGE);
const bool index_merge_union_allowed =
index_merge_allowed &&
thd->optimizer_switch_flag(OPTIMIZER_SWITCH_INDEX_MERGE_UNION);
const bool index_merge_sort_union_allowed =
index_merge_allowed &&
thd->optimizer_switch_flag(OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION);
const bool index_merge_intersect_allowed =
index_merge_allowed &&
thd->optimizer_switch_flag(OPTIMIZER_SWITCH_INDEX_MERGE_INTERSECT);
/* Calculate cost of full index read for the shortest covering index */
if (!table->covering_keys.is_clear_all()) {
int key_for_use = find_shortest_key(table, &table->covering_keys);
// find_shortest_key() should return a valid key:
assert(key_for_use != MAX_KEY);
Cost_estimate key_read_time = param.table->file->index_scan_cost(
key_for_use, 1, static_cast<double>(records));
key_read_time.add_cpu(
cost_model->row_evaluate_cost(static_cast<double>(records)));
bool chosen = false;
if (key_read_time < cost_est) {
cost_est = key_read_time;
chosen = true;
}
Opt_trace_object trace_cov(trace, "best_covering_index_scan",
Opt_trace_context::RANGE_OPTIMIZER);
trace_cov.add_utf8("index", table->key_info[key_for_use].name)
.add("cost", key_read_time)
.add("chosen", chosen);
if (!chosen) trace_cov.add_alnum("cause", "cost");
}
AccessPath *best_path = nullptr;
double best_cost = cost_est.total_cost();
SEL_TREE *tree = nullptr;
if (cond) {
{
Opt_trace_array trace_setup_cond(trace, "setup_range_conditions");
tree = get_mm_tree(thd, ¶m, prev_tables | INNER_TABLE_BIT,
read_tables | INNER_TABLE_BIT,
table->pos_in_table_list->map(),
/*remove_jump_scans=*/true, cond);
}
if (tree) {
if (tree->type == SEL_TREE::IMPOSSIBLE) {
trace_range.add("impossible_range", true);
cost_est.reset();
cost_est.add_io(static_cast<double>(HA_POS_ERROR));
return -1;
}
/*
If the tree can't be used for range scans, proceed anyway, as we
can construct a group-min-max quick select
*/
if (tree->type != SEL_TREE::KEY) {
trace_range.add("range_scan_possible", false);
if (tree->type == SEL_TREE::ALWAYS)
trace_range.add_alnum("cause", "condition_always_true");
tree = nullptr;
}
}
}
/*
Try to construct a GroupIndexSkipScanIterator.
Notice that it can be constructed no matter if there is a range tree.
*/
AccessPath *group_path = get_best_group_skip_scan(
thd, ¶m, tree, interesting_order, skip_records_in_range, best_cost);
if (group_path) {
DBUG_EXECUTE_IF("force_lis_for_group_by", group_path->set_cost(0.0););
param.table->quick_condition_rows =
min<double>(group_path->num_output_rows(), table->file->stats.records);
Opt_trace_object grp_summary(trace, "best_group_range_summary",
Opt_trace_context::RANGE_OPTIMIZER);
if (unlikely(trace->is_started()))
trace_basic_info(thd, group_path, ¶m, &grp_summary);
if (group_path->cost() < best_cost) {
grp_summary.add("chosen", true);
best_path = group_path;
best_cost = best_path->cost();
} else
grp_summary.add("chosen", false).add_alnum("cause", "cost");
}
bool force_skip_scan = hint_table_state(thd, param.table->pos_in_table_list,
SKIP_SCAN_HINT_ENUM, 0);
if (thd->optimizer_switch_flag(OPTIMIZER_SKIP_SCAN) || force_skip_scan) {
AccessPath *skip_scan_path =
get_best_skip_scan(thd, ¶m, tree, interesting_order,
skip_records_in_range, force_skip_scan);
if (skip_scan_path) {
param.table->quick_condition_rows = min<double>(
skip_scan_path->num_output_rows(), table->file->stats.records);
Opt_trace_object summary(trace, "best_skip_scan_summary",
Opt_trace_context::RANGE_OPTIMIZER);
if (unlikely(trace->is_started()))
trace_basic_info(thd, skip_scan_path, ¶m, &summary);
if (skip_scan_path->cost() < best_cost || force_skip_scan) {
summary.add("chosen", true);
best_path = skip_scan_path;
best_cost = best_path->cost();
} else
summary.add("chosen", false).add_alnum("cause", "cost");
}
}
if (tree && (best_path == nullptr || !get_forced_by_hint(best_path))) {
/*
It is possible to use a range-based quick select (but it might be
slower than 'all' table scan).
*/
dbug_print_tree("final_tree", tree, ¶m);
{
/*
Calculate cost of single index range scan and possible
intersections of these
*/
Opt_trace_object trace_range_alt(trace, "analyzing_range_alternatives",
Opt_trace_context::RANGE_OPTIMIZER);
AccessPath *range_path = get_key_scans_params(
thd, ¶m, tree, false, true, interesting_order,
skip_records_in_range, best_cost, /*ror_only=*/false, needed_reg);
/* Get best 'range' plan and prepare data for making other plans */
if (range_path) {
best_path = range_path;
best_cost = best_path->cost();
}
/*
Simultaneous key scans and row deletes on several handler
objects are not allowed so don't use ROR-intersection for
table deletes. Also, ROR-intersection cannot return rows in
descending order
*/
if ((thd->lex->sql_command != SQLCOM_DELETE) &&
(index_merge_allowed ||
hint_table_state(thd, param.table->pos_in_table_list,
INDEX_MERGE_HINT_ENUM, 0)) &&
interesting_order != ORDER_DESC) {
/*
Get best non-covering ROR-intersection plan and prepare data for
building covering ROR-intersection.
*/
AccessPath *rori_path = get_best_ror_intersect(
thd, ¶m, table, index_merge_intersect_allowed, tree, best_cost,
/*force_index_merge_result=*/true, /*reuse_handler=*/true);
if (rori_path) {
best_path = rori_path;
best_cost = best_path->cost();
}
}
}
// Here we calculate cost of union index merge
if (!tree->merges.is_empty()) {
// Cannot return rows in descending order.
if ((index_merge_allowed ||
hint_table_state(thd, param.table->pos_in_table_list,
INDEX_MERGE_HINT_ENUM, 0)) &&
interesting_order != ORDER_DESC && param.table->file->stats.records) {
/* Try creating index_merge/ROR-union scan. */
AccessPath *best_conj_path = nullptr, *new_conj_path = nullptr;
Opt_trace_array trace_idx_merge(trace, "analyzing_index_merge_union",
Opt_trace_context::RANGE_OPTIMIZER);
// Buffer for index_merge cost estimates.
for (SEL_IMERGE &imerge : tree->merges) {
new_conj_path = get_best_disjunct_quick(
thd, ¶m, table, index_merge_union_allowed,
index_merge_sort_union_allowed, index_merge_intersect_allowed,
skip_records_in_range, &imerge, best_cost, needed_reg);
if (new_conj_path)
param.table->quick_condition_rows =
min<double>(param.table->quick_condition_rows,
new_conj_path->num_output_rows());
if (best_conj_path == nullptr ||
(new_conj_path != nullptr &&
new_conj_path->cost() < best_conj_path->cost())) {
best_conj_path = new_conj_path;
}
}
if (best_conj_path) best_path = best_conj_path;
}
}
}
/*
If we got a read plan, return it, but only if the storage engine supports
using indexes for access.
*/
if (best_path && (table->file->ha_table_flags() & HA_NO_INDEX_ACCESS) == 0) {
records = best_path->num_output_rows();
*path = best_path;
}
if (unlikely(trace->is_started() && best_path)) {
Opt_trace_object trace_range_summary(trace, "chosen_range_access_summary");
{
Opt_trace_object trace_range_plan(trace, "range_access_plan");
trace_basic_info(thd, best_path, ¶m, &trace_range_plan);
}
trace_range_summary.add("rows_for_plan", best_path->num_output_rows())
.add("cost_for_plan", best_path->cost())
.add("chosen", true);
}
DBUG_EXECUTE("info", print_quick(*path, needed_reg););
if (records == 0) {
return -1;
} else {
return *path != nullptr;
}
}
/**
Helper function for get_best_disjunct_quick(), dealing with the case of
creating a ROR union. Returns nullptr if either an error occurred, or if the
ROR union was found to be more expensive than read_cost (which is presumably
the cost for the index merge plan).
*/
static AccessPath *get_ror_union_path(
THD *thd, RANGE_OPT_PARAM *param, TABLE *table,
bool index_merge_intersect_allowed, SEL_IMERGE *imerge,
const double read_cost, bool force_index_merge,
Bounds_checked_array<AccessPath *> roru_read_plans,
AccessPath **range_scans, Opt_trace_object *trace_best_disjunct) {
double roru_index_cost = 0.0;
ha_rows roru_total_records = 0;
/* Find 'best' ROR scan for each of trees in disjunction */
double roru_intersect_part = 1.0;
{
Opt_trace_context *const trace = &thd->opt_trace;
Opt_trace_array trace_analyze_ror(trace, "analyzing_roworder_scans");
AccessPath **cur_child = range_scans;
AccessPath **cur_roru_plan = &roru_read_plans[0];
for (auto tree_it = imerge->trees.begin(); tree_it != imerge->trees.end();
tree_it++, cur_child++, cur_roru_plan++) {
Opt_trace_object path(trace);
if (unlikely(trace->is_started()))
trace_basic_info(thd, *cur_child, param, &path);
const auto &child_param = (*cur_child)->index_range_scan();
/*
Assume the best ROR scan is the one that has cheapest
full-row-retrieval scan cost.
Also accumulate index_only scan costs as we'll need them to
calculate overall index_intersection cost.
*/
double scan_cost = 0.0;
if (child_param.can_be_used_for_ror) {
/* Ok, we have index_only cost, now get full rows scan cost */
scan_cost = table->file
->read_cost(child_param.index, 1,
(*cur_child)->num_output_rows())
.total_cost();
scan_cost += table->cost_model()->row_evaluate_cost(
(*cur_child)->num_output_rows());
} else
scan_cost = read_cost;
AccessPath *prev_plan = *cur_child;
if (!(*cur_roru_plan = get_best_ror_intersect(
thd, param, table, index_merge_intersect_allowed, *tree_it,
scan_cost,
/*force_index_merge_result=*/false, /*reuse_handler=*/false))) {
if (child_param.can_be_used_for_ror)
*cur_roru_plan = prev_plan;
else
return nullptr;
}
roru_index_cost += (*cur_roru_plan)->cost();
roru_total_records += (*cur_roru_plan)->num_output_rows();
roru_intersect_part *=
(*cur_roru_plan)->num_output_rows() / table->file->stats.records;
}
}
/*
rows to retrieve=
SUM(rows_in_scan_i) - table_rows * PROD(rows_in_scan_i / table_rows).
This is valid because index_merge construction guarantees that conditions
in disjunction do not share key parts.
*/
roru_total_records -=
static_cast<ha_rows>(roru_intersect_part * table->file->stats.records);
/* ok, got a ROR read plan for each of the disjuncts
Calculate cost:
cost(index_union_scan(scan_1, ... scan_n)) =
SUM_i(cost_of_index_only_scan(scan_i)) +
queue_use_cost(rowid_len, n) +
cost_of_row_retrieval
See get_merge_buffers_cost function for queue_use_cost formula derivation.
*/
double roru_total_cost;
{
JOIN *join = param->query_block->join;
const bool is_interrupted = join && join->tables != 1;
Cost_estimate sweep_cost;
get_sweep_read_cost(table, roru_total_records, is_interrupted, &sweep_cost);
roru_total_cost = sweep_cost.total_cost();
roru_total_cost += roru_index_cost;
roru_total_cost += table->cost_model()->key_compare_cost(
rows2double(roru_total_records) * std::log2(roru_read_plans.size()));
}
trace_best_disjunct->add("index_roworder_union_cost", roru_total_cost)
.add("members", roru_read_plans.size());
if (roru_total_cost < read_cost || force_index_merge) {
trace_best_disjunct->add("chosen", true);
auto *children = new (param->return_mem_root)
Mem_root_array<AccessPath *>(param->return_mem_root);
children->reserve(roru_read_plans.size());
for (AccessPath *child : roru_read_plans) {
// NOTE: This overwrites parameters in paths that may be used
// for something else, but since we've already decided that
// we are to choose a ROR union, it doesn't matter. If we are
// to keep multiple candidates around, we need to clone the
// AccessPaths here.
switch (child->type) {
case AccessPath::INDEX_RANGE_SCAN:
child->index_range_scan().need_rows_in_rowid_order = true;
break;
case AccessPath::ROWID_INTERSECTION:
child->rowid_intersection().need_rows_in_rowid_order = true;
child->rowid_intersection().retrieve_full_rows = false;
break;
default:
assert(false);
}
children->push_back(child);
}
AccessPath *path = new (param->return_mem_root) AccessPath;
path->type = AccessPath::ROWID_UNION;
path->set_cost(roru_total_cost);
path->set_num_output_rows(roru_total_records);
path->rowid_union().table = table;
path->rowid_union().children = children;
path->rowid_union().forced_by_hint = force_index_merge;
return path;
}
return nullptr;
}
/*
Get best plan for a SEL_IMERGE disjunctive expression.
SYNOPSIS
get_best_disjunct_quick()
param Parameter from check_quick_select function
index_merge_union_allowed
index_merge_sort_union_allowed
index_merge_intersect_allowed
interesting_order The sort order the range access method must be able
to provide. Three-value logic: asc/desc/don't care
skip_records_in_range Same value as JOIN_TAB::skip_records_in_range().
imerge Expression to use
imerge_cost_buff Buffer for index_merge cost estimates
cost_est Don't create scans with cost > cost_est
needed_reg [out] Bits for keys with may be used if all prev regs are read
NOTES
index_merge cost is calculated as follows:
index_merge_cost =
cost(index_reads) + (see #1)
cost(rowid_to_row_scan) + (see #2)
cost(unique_use) (see #3)
1. cost(index_reads) =SUM_i(cost(index_read_i))
For non-CPK scans,
cost(index_read_i) = {cost of ordinary 'index only' scan}
For CPK scan,
cost(index_read_i) = {cost of non-'index only' scan}
2. cost(rowid_to_row_scan)
If table PK is clustered then
cost(rowid_to_row_scan) =
{cost of ordinary clustered PK scan with n_ranges=n_rows}
Otherwise, we use the following model to calculate costs:
We need to retrieve n_rows rows from file that occupies n_blocks blocks.
We assume that offsets of rows we need are independent variates with
uniform distribution in [0..max_file_offset] range.
We'll denote block as "busy" if it contains row(s) we need to retrieve
and "empty" if doesn't contain rows we need.
Probability that a block is empty is (1 - 1/n_blocks)^n_rows (this
applies to any block in file). Let x_i be a variate taking value 1 if
block #i is empty and 0 otherwise.
Then E(x_i) = (1 - 1/n_blocks)^n_rows;
E(n_empty_blocks) = E(sum(x_i)) = sum(E(x_i)) =
= n_blocks * ((1 - 1/n_blocks)^n_rows) =
~= n_blocks * exp(-n_rows/n_blocks).
E(n_busy_blocks) = n_blocks*(1 - (1 - 1/n_blocks)^n_rows) =
~= n_blocks * (1 - exp(-n_rows/n_blocks)).
Average size of "hole" between neighbor non-empty blocks is
E(hole_size) = n_blocks/E(n_busy_blocks).
The total cost of reading all needed blocks in one "sweep" is:
E(n_busy_blocks) * disk_seek_cost(n_blocks/E(n_busy_blocks))
This cost estimate is calculated in get_sweep_read_cost().
3. Cost of Unique use is calculated in Unique::get_use_cost function.
ROR-union cost is calculated in the same way index_merge, but instead of
Unique a priority queue is used.
RETURN
Created read plan
NULL - Out of memory or no read scan could be built.
*/
static AccessPath *get_best_disjunct_quick(
THD *thd, RANGE_OPT_PARAM *param, TABLE *table,
bool index_merge_union_allowed, bool index_merge_sort_union_allowed,
bool index_merge_intersect_allowed, bool skip_records_in_range,
SEL_IMERGE *imerge, const double cost_est, Key_map *needed_reg) {
double imerge_cost = 0.0;
ha_rows cpk_scan_records = 0;
ha_rows non_cpk_scan_records = 0;
bool all_scans_ror_able = true;
const Cost_model_table *const cost_model = table->cost_model();