Skip to content

Commit

Permalink
[SPARK-16291][SQL] CheckAnalysis should capture nested aggregate func…
Browse files Browse the repository at this point in the history
…tions that reference no input attributes

## What changes were proposed in this pull request?

`MAX(COUNT(*))` is invalid since aggregate expression can't be nested within another aggregate expression. This case should be captured at analysis phase, but somehow sneaks off to runtime.

The reason is that when checking aggregate expressions in `CheckAnalysis`, a checking branch treats all expressions that reference no input attributes as valid ones. However, `MAX(COUNT(*))` is translated into `MAX(COUNT(1))` at analysis phase and also references no input attribute.

This PR fixes this issue by removing the aforementioned branch.

## How was this patch tested?

New test case added in `AnalysisErrorSuite`.

Author: Cheng Lian <[email protected]>

Closes apache#13968 from liancheng/spark-16291-nested-agg-functions.
  • Loading branch information
liancheng authored and cloud-fan committed Jun 29, 2016
1 parent 757dc2c commit d1e8108
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,6 @@ trait CheckAnalysis extends PredicateHelper {
"Add to group by or wrap in first() (or first_value) if you don't care " +
"which value you get.")
case e if groupingExprs.exists(_.semanticEquals(e)) => // OK
case e if e.references.isEmpty => // OK
case e => e.children.foreach(checkValidAggregateExpression)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, Complete, Count}
import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, Complete, Count, Max}
import org.apache.spark.sql.catalyst.plans.{Inner, LeftOuter, RightOuter}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData, MapData}
Expand Down Expand Up @@ -162,6 +162,16 @@ class AnalysisErrorSuite extends AnalysisTest {
UnspecifiedFrame)).as('window)),
"Distinct window functions are not supported" :: Nil)

errorTest(
"nested aggregate functions",
testRelation.groupBy('a)(
AggregateExpression(
Max(AggregateExpression(Count(Literal(1)), Complete, isDistinct = false)),
Complete,
isDistinct = false)),
"not allowed to use an aggregate function in the argument of another aggregate function." :: Nil
)

errorTest(
"offset window function",
testRelation2.select(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,8 @@ import java.math.MathContext
import java.sql.Timestamp

import org.apache.spark.AccumulatorSuite
import org.apache.spark.sql.catalyst.FunctionIdentifier
import org.apache.spark.sql.catalyst.analysis.UnresolvedException
import org.apache.spark.sql.catalyst.catalog.{CatalogTestUtils, ExternalCatalog, SessionCatalog}
import org.apache.spark.sql.catalyst.expressions.{ExpressionDescription, SortOrder}
import org.apache.spark.sql.catalyst.expressions.SortOrder
import org.apache.spark.sql.catalyst.plans.logical.Aggregate
import org.apache.spark.sql.catalyst.util.StringUtils
import org.apache.spark.sql.execution.aggregate
Expand Down

0 comments on commit d1e8108

Please sign in to comment.