Skip to content

Commit

Permalink
[SPARK-16425][R] describe() should not fail with non-numeric columns
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

This PR prevents ERRORs when `summary(df)` is called for `SparkDataFrame` with not-numeric columns. This failure happens only in `SparkR`.

**Before**
```r
> df <- createDataFrame(faithful)
> df <- withColumn(df, "boolean", df$waiting==79)
> summary(df)
16/07/07 14:15:16 ERROR RBackendHandler: describe on 34 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
  org.apache.spark.sql.AnalysisException: cannot resolve 'avg(`boolean`)' due to data type mismatch: function average requires numeric types, not BooleanType;
```

**After**
```r
> df <- createDataFrame(faithful)
> df <- withColumn(df, "boolean", df$waiting==79)
> summary(df)
SparkDataFrame[summary:string, eruptions:string, waiting:string]
```

## How was this patch tested?

Pass the Jenkins with a updated testcase.

Author: Dongjoon Hyun <[email protected]>

Closes apache#14096 from dongjoon-hyun/SPARK-16425.
  • Loading branch information
dongjoon-hyun authored and shivaram committed Jul 8, 2016
1 parent f4767bc commit 6aa7d09
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 4 deletions.
3 changes: 1 addition & 2 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2622,8 +2622,7 @@ setMethod("describe",
setMethod("describe",
signature(x = "SparkDataFrame"),
function(x) {
colList <- as.list(c(columns(x)))
sdf <- callJMethod(x@sdf, "describe", colList)
sdf <- callJMethod(x@sdf, "describe", list())
dataFrame(sdf)
})

Expand Down
8 changes: 6 additions & 2 deletions R/pkg/inst/tests/testthat/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -1824,13 +1824,17 @@ test_that("describe() and summarize() on a DataFrame", {
expect_equal(collect(stats)[2, "age"], "24.5")
expect_equal(collect(stats)[3, "age"], "7.7781745930520225")
stats <- describe(df)
expect_equal(collect(stats)[4, "name"], "Andy")
expect_equal(collect(stats)[4, "name"], NULL)
expect_equal(collect(stats)[5, "age"], "30")

stats2 <- summary(df)
expect_equal(collect(stats2)[4, "name"], "Andy")
expect_equal(collect(stats2)[4, "name"], NULL)
expect_equal(collect(stats2)[5, "age"], "30")

# SPARK-16425: SparkR summary() fails on column of type logical
df <- withColumn(df, "boolean", df$age == 30)
summary(df)

# Test base::summary is working
expect_equal(length(summary(attenu, digits = 4)), 35)
})
Expand Down

0 comments on commit 6aa7d09

Please sign in to comment.