Fix LIMIT and OFFSET when they are used in a subquery

This fixes LIMIT and OFFSET when they are used in a subquery where the grouping of the inner query is different than the grouping of the outer query. When organizing tag sets, the grouping of the outer query is used so the final result is in the correct order. But, unfortunately, the optimization incorrectly limited the number of points based on the grouping in the outer query rather than the grouping in the inner query. The ideal solution would be to use the outer grouping to further organize it by the grouping for the inner subquery, but that's more difficult to do at the moment. As an easier fix, the query engine now limits the output of each series. This may result in these types of queries being slower in some situations like this one: SELECT mean(value) FROM (SELECT value FROM cpu GROUP BY host LIMIT 1) This will be slower in a situation where the `cpu` measurement has a high cardinality and many different tags. This also fixes `last()` and `first()` when they are used in a subquery because those functions use `LIMIT 1` as an internal optimization.
clebs · Feb 6, 2017 · 95831b3 · 95831b3
1 parent 4d97dd4
commit 95831b3
Show file tree

Hide file tree

Showing 2 changed files with 24 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,6 +9,7 @@
 - [#7906](https://github.com/influxdata/influxdb/issues/7906): Anchors not working as expected with case-insensitive regex
 - [#7895](https://github.com/influxdata/influxdb/issues/7895): Fix incorrect math when aggregates that emit different times are used.
 - [#7946](https://github.com/influxdata/influxdb/issues/7946): Fix authentication when subqueries are present.
+- [#7885](https://github.com/influxdata/influxdb/issues/7885): Fix LIMIT and OFFSET when they are used in a subquery.
 
 ## v1.2.0 [2017-01-24]
 

diff --git a/tsdb/engine/tsm1/engine.go b/tsdb/engine/tsm1/engine.go
@@ -1349,14 +1349,36 @@ func (e *Engine) createVarRefIterator(measurement string, opt influxql.IteratorO
 				continue
 			}
 
+			// If we have a LIMIT or OFFSET and the grouping of the outer query
+			// is different than the current grouping, we need to perform the
+			// limit on each of the individual series keys instead to improve
+			// performance.
+			if (opt.Limit > 0 || opt.Offset > 0) && len(opt.Dimensions) != len(opt.GroupBy) {
+				for i, input := range inputs {
+					inputs[i] = newLimitIterator(input, opt)
+				}
+			}
+
 			itr, err := influxql.Iterators(inputs).Merge(opt)
 			if err != nil {
 				influxql.Iterators(inputs).Close()
 				return err
 			}
 
+			// Apply a limit on the merged iterator.
 			if opt.Limit > 0 || opt.Offset > 0 {
-				itr = newLimitIterator(itr, opt)
+				if len(opt.Dimensions) == len(opt.GroupBy) {
+					// When the final dimensions and the current grouping are
+					// the same, we will only produce one series so we can use
+					// the faster limit iterator.
+					itr = newLimitIterator(itr, opt)
+				} else {
+					// When the dimensions are different than the current
+					// grouping, we need to account for the possibility there
+					// will be multiple series. The limit iterator in the
+					// influxql package handles that scenario.
+					itr = influxql.NewLimitIterator(itr, opt)
+				}
 			}
 			itrs = append(itrs, itr)
 		}