Skip to content

Commit

Permalink
Fix LIMIT and OFFSET when they are used in a subquery
Browse files Browse the repository at this point in the history
This fixes LIMIT and OFFSET when they are used in a subquery where the
grouping of the inner query is different than the grouping of the outer
query. When organizing tag sets, the grouping of the outer query is
used so the final result is in the correct order. But, unfortunately,
the optimization incorrectly limited the number of points based on the
grouping in the outer query rather than the grouping in the inner query.

The ideal solution would be to use the outer grouping to further
organize it by the grouping for the inner subquery, but that's more
difficult to do at the moment. As an easier fix, the query engine now
limits the output of each series. This may result in these types of
queries being slower in some situations like this one:

    SELECT mean(value) FROM (SELECT value FROM cpu GROUP BY host LIMIT 1)

This will be slower in a situation where the `cpu` measurement has a
high cardinality and many different tags.

This also fixes `last()` and `first()` when they are used in a subquery
because those functions use `LIMIT 1` as an internal optimization.
  • Loading branch information
jsternberg committed Feb 6, 2017
1 parent 4d97dd4 commit 95831b3
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- [#7906](https://github.com/influxdata/influxdb/issues/7906): Anchors not working as expected with case-insensitive regex
- [#7895](https://github.com/influxdata/influxdb/issues/7895): Fix incorrect math when aggregates that emit different times are used.
- [#7946](https://github.com/influxdata/influxdb/issues/7946): Fix authentication when subqueries are present.
- [#7885](https://github.com/influxdata/influxdb/issues/7885): Fix LIMIT and OFFSET when they are used in a subquery.

## v1.2.0 [2017-01-24]

Expand Down
24 changes: 23 additions & 1 deletion tsdb/engine/tsm1/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -1349,14 +1349,36 @@ func (e *Engine) createVarRefIterator(measurement string, opt influxql.IteratorO
continue
}

// If we have a LIMIT or OFFSET and the grouping of the outer query
// is different than the current grouping, we need to perform the
// limit on each of the individual series keys instead to improve
// performance.
if (opt.Limit > 0 || opt.Offset > 0) && len(opt.Dimensions) != len(opt.GroupBy) {
for i, input := range inputs {
inputs[i] = newLimitIterator(input, opt)
}
}

itr, err := influxql.Iterators(inputs).Merge(opt)
if err != nil {
influxql.Iterators(inputs).Close()
return err
}

// Apply a limit on the merged iterator.
if opt.Limit > 0 || opt.Offset > 0 {
itr = newLimitIterator(itr, opt)
if len(opt.Dimensions) == len(opt.GroupBy) {
// When the final dimensions and the current grouping are
// the same, we will only produce one series so we can use
// the faster limit iterator.
itr = newLimitIterator(itr, opt)
} else {
// When the dimensions are different than the current
// grouping, we need to account for the possibility there
// will be multiple series. The limit iterator in the
// influxql package handles that scenario.
itr = influxql.NewLimitIterator(itr, opt)
}
}
itrs = append(itrs, itr)
}
Expand Down

0 comments on commit 95831b3

Please sign in to comment.