Skip to content


Fix typos detected by
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Fixing typos is sometimes very hard. It's not so easy to visually review them. Recently, I discovered a very useful tool for it, [misspell](

This pull request fixes minor typos detected by [misspell]( except for the false positives. If you would like me to work on other files as well, let me know.

## How was this patch tested?

### before

$ misspell . | grep -v '.js'
R/pkg/R/SQLContext.R:354:43: "definiton" is a misspelling of "definition"
R/pkg/R/SQLContext.R:424:43: "definiton" is a misspelling of "definition"
R/pkg/R/SQLContext.R:445:43: "definiton" is a misspelling of "definition"
R/pkg/R/SQLContext.R:495:43: "definiton" is a misspelling of "definition"
NOTICE-binary:454:16: "containd" is a misspelling of "contained"
R/pkg/R/context.R:46:43: "definiton" is a misspelling of "definition"
R/pkg/R/context.R:74:43: "definiton" is a misspelling of "definition"
R/pkg/R/DataFrame.R:591:48: "persistance" is a misspelling of "persistence"
R/pkg/R/streaming.R:166:44: "occured" is a misspelling of "occurred"
R/pkg/inst/worker/worker.R:65:22: "ouput" is a misspelling of "output"
R/pkg/tests/fulltests/test_utils.R:106:25: "environemnt" is a misspelling of "environment"
common/kvstore/src/test/java/org/apache/spark/util/kvstore/ "existant" is a misspelling of "existent"
common/kvstore/src/test/java/org/apache/spark/util/kvstore/ "existant" is a misspelling of "existent"
common/network-common/src/main/java/org/apache/spark/network/crypto/ "transfered" is a misspelling of "transferred"
common/network-common/src/main/java/org/apache/spark/network/sasl/ "transfered" is a misspelling of "transferred"
common/network-common/src/main/java/org/apache/spark/network/sasl/ "transfered" is a misspelling of "transferred"
common/network-common/src/main/java/org/apache/spark/network/sasl/ "transfered" is a misspelling of "transferred"
common/network-common/src/main/java/org/apache/spark/network/sasl/ "transfered" is a misspelling of "transferred"
common/network-common/src/main/java/org/apache/spark/network/util/ "transfered" is a misspelling of "transferred"
common/unsafe/src/test/scala/org/apache/spark/unsafe/types/UTF8StringPropertyCheckSuite.scala:195:15: "orgin" is a misspelling of "origin"
core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:621:39: "gauranteed" is a misspelling of "guaranteed"
core/src/main/scala/org/apache/spark/status/storeTypes.scala:113:29: "ect" is a misspelling of "etc"
core/src/main/scala/org/apache/spark/storage/DiskStore.scala:282:18: "transfered" is a misspelling of "transferred"
core/src/main/scala/org/apache/spark/util/ListenerBus.scala:64:17: "overriden" is a misspelling of "overridden"
core/src/test/scala/org/apache/spark/ShuffleSuite.scala:211:7: "substracted" is a misspelling of "subtracted"
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:1922:49: "agriculteur" is a misspelling of "agriculture"
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:2468:84: "truely" is a misspelling of "truly"
core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala:25:18: "persistance" is a misspelling of "persistence"
core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala:26:69: "persistance" is a misspelling of "persistence"
data/streaming/AFINN-111.txt:1219:0: "humerous" is a misspelling of "humorous"
dev/run-pip-tests:55:28: "enviroments" is a misspelling of "environments"
dev/run-pip-tests:91:37: "virutal" is a misspelling of "virtual"
dev/ "accross" is a misspelling of "across"
dev/ "accross" is a misspelling of "across"
dev/run-pip-tests:126:25: "enviroments" is a misspelling of "environments"
docs/ "overriden" is a misspelling of "overridden"
docs/ "processs" is a misspelling of "processes"
docs/ "BETWEN" is a misspelling of "BETWEEN"
docs/ "behaivor" is a misspelling of "behavior"
examples/src/main/python/sql/ "substract" is a misspelling of "subtract"
examples/src/main/python/sql/ "substract" is a misspelling of "subtract"
licenses/LICENSE-heapq.txt:5:63: "Stichting" is a misspelling of "Stitching"
licenses/LICENSE-heapq.txt:6:2: "Mathematisch" is a misspelling of "Mathematics"
licenses/LICENSE-heapq.txt:262:29: "Stichting" is a misspelling of "Stitching"
licenses/LICENSE-heapq.txt:262:39: "Mathematisch" is a misspelling of "Mathematics"
licenses/LICENSE-heapq.txt:269:49: "Stichting" is a misspelling of "Stitching"
licenses/LICENSE-heapq.txt:269:59: "Mathematisch" is a misspelling of "Mathematics"
licenses/LICENSE-heapq.txt:274:2: "STICHTING" is a misspelling of "STITCHING"
licenses/LICENSE-heapq.txt:274:12: "MATHEMATISCH" is a misspelling of "MATHEMATICS"
licenses/LICENSE-heapq.txt:276:29: "STICHTING" is a misspelling of "STITCHING"
licenses/LICENSE-heapq.txt:276:39: "MATHEMATISCH" is a misspelling of "MATHEMATICS"
licenses-binary/LICENSE-heapq.txt:5:63: "Stichting" is a misspelling of "Stitching"
licenses-binary/LICENSE-heapq.txt:6:2: "Mathematisch" is a misspelling of "Mathematics"
licenses-binary/LICENSE-heapq.txt:262:29: "Stichting" is a misspelling of "Stitching"
licenses-binary/LICENSE-heapq.txt:262:39: "Mathematisch" is a misspelling of "Mathematics"
licenses-binary/LICENSE-heapq.txt:269:49: "Stichting" is a misspelling of "Stitching"
licenses-binary/LICENSE-heapq.txt:269:59: "Mathematisch" is a misspelling of "Mathematics"
licenses-binary/LICENSE-heapq.txt:274:2: "STICHTING" is a misspelling of "STITCHING"
licenses-binary/LICENSE-heapq.txt:274:12: "MATHEMATISCH" is a misspelling of "MATHEMATICS"
licenses-binary/LICENSE-heapq.txt:276:29: "STICHTING" is a misspelling of "STITCHING"
licenses-binary/LICENSE-heapq.txt:276:39: "MATHEMATISCH" is a misspelling of "MATHEMATICS"
mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/hungarian.txt:170:0: "teh" is a misspelling of "the"
mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/portuguese.txt:53:0: "eles" is a misspelling of "eels"
mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala:99:20: "Euclidian" is a misspelling of "Euclidean"
mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala:539:11: "Euclidian" is a misspelling of "Euclidean"
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala:77:36: "Teh" is a misspelling of "The"
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala:230:24: "inital" is a misspelling of "initial"
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala:276:9: "Euclidian" is a misspelling of "Euclidean"
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala:237:26: "descripiton" is a misspelling of "descriptions"
python/pyspark/ "enviroment" is a misspelling of "environment"
python/pyspark/ "supress" is a misspelling of "suppress"
python/pyspark/ "supress" is a misspelling of "suppress"
python/pyspark/ "supress" is a misspelling of "suppress"
python/pyspark/ "supress" is a misspelling of "suppress"
python/pyspark/ "Stichting" is a misspelling of "Stitching"
python/pyspark/ "Mathematisch" is a misspelling of "Mathematics"
python/pyspark/ "Stichting" is a misspelling of "Stitching"
python/pyspark/ "Mathematisch" is a misspelling of "Mathematics"
python/pyspark/ "Stichting" is a misspelling of "Stitching"
python/pyspark/ "Mathematisch" is a misspelling of "Mathematics"
python/pyspark/ "STICHTING" is a misspelling of "STITCHING"
python/pyspark/ "MATHEMATISCH" is a misspelling of "MATHEMATICS"
python/pyspark/ "STICHTING" is a misspelling of "STITCHING"
python/pyspark/ "MATHEMATISCH" is a misspelling of "MATHEMATICS"
python/pyspark/ "probabilty" is a misspelling of "probability"
python/pyspark/ml/ "Currenlty" is a misspelling of "Currently"
python/pyspark/ml/ "Euclidian" is a misspelling of "Euclidean"
python/pyspark/ml/ "paramter" is a misspelling of "parameter"
python/pyspark/mllib/stat/ "probabilty" is a misspelling of "probability"
python/pyspark/ "paramter" is a misspelling of "parameter"
python/pyspark/streaming/ "retuns" is a misspelling of "returns"
python/pyspark/sql/ "initalization" is a misspelling of "initialization"
python/pyspark/sql/ "initalize" is a misspelling of "initialize"
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala:120:39: "arbitary" is a misspelling of "arbitrary"
resource-managers/mesos/src/test/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcherArgumentsSuite.scala:26:45: "sucessfully" is a misspelling of "successfully"
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala:358:27: "constaints" is a misspelling of "constraints"
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala:111:24: "senstive" is a misspelling of "sensitive"
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:1063:5: "overwirte" is a misspelling of "overwrite"
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala:1348:17: "compatability" is a misspelling of "compatibility"
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:77:36: "paramter" is a misspelling of "parameter"
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:1374:22: "precendence" is a misspelling of "precedence"
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala:238:27: "unnecassary" is a misspelling of "unnecessary"
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala:212:17: "whn" is a misspelling of "when"
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala:147:60: "timestmap" is a misspelling of "timestamp"
sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala:150:45: "precentage" is a misspelling of "percentage"
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala:135:29: "infered" is a misspelling of "inferred"
sql/hive/src/test/resources/golden/udf_instr-1-2e76f819563dbaba4beb51e3a130b922:1:52: "occurance" is a misspelling of "occurrence"
sql/hive/src/test/resources/golden/udf_instr-2-32da357fc754badd6e3898dcc8989182:1:52: "occurance" is a misspelling of "occurrence"
sql/hive/src/test/resources/golden/udf_locate-1-6e41693c9c6dceea4d7fab4c02884e4e:1:63: "occurance" is a misspelling of "occurrence"
sql/hive/src/test/resources/golden/udf_locate-2-d9b5934457931447874d6bb7c13de478:1:63: "occurance" is a misspelling of "occurrence"
sql/hive/src/test/resources/golden/udf_translate-2-f7aa38a33ca0df73b7a1e6b6da4b7fe8:9:79: "occurence" is a misspelling of "occurrence"
sql/hive/src/test/resources/golden/udf_translate-2-f7aa38a33ca0df73b7a1e6b6da4b7fe8:13:110: "occurence" is a misspelling of "occurrence"
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/annotate_stats_join.q:46:105: "distint" is a misspelling of "distinct"
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/auto_sortmerge_join_11.q:29:3: "Currenly" is a misspelling of "Currently"
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/avro_partitioned.q:72:15: "existant" is a misspelling of "existent"
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/decimal_udf.q:25:3: "substraction" is a misspelling of "subtraction"
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct.q:16:51: "funtion" is a misspelling of "function"
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/groupby_sort_8.q:15:30: "issueing" is a misspelling of "issuing"
sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala:669:52: "wiht" is a misspelling of "with"
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/ "Refering" is a misspelling of "Referring"

### after

$ misspell . | grep -v '.js'
common/network-common/src/main/java/org/apache/spark/network/util/ "transfered" is a misspelling of "transferred"
core/src/main/scala/org/apache/spark/status/storeTypes.scala:113:29: "ect" is a misspelling of "etc"
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:1922:49: "agriculteur" is a misspelling of "agriculture"
data/streaming/AFINN-111.txt:1219:0: "humerous" is a misspelling of "humorous"
licenses/LICENSE-heapq.txt:5:63: "Stichting" is a misspelling of "Stitching"
licenses/LICENSE-heapq.txt:6:2: "Mathematisch" is a misspelling of "Mathematics"
licenses/LICENSE-heapq.txt:262:29: "Stichting" is a misspelling of "Stitching"
licenses/LICENSE-heapq.txt:262:39: "Mathematisch" is a misspelling of "Mathematics"
licenses/LICENSE-heapq.txt:269:49: "Stichting" is a misspelling of "Stitching"
licenses/LICENSE-heapq.txt:269:59: "Mathematisch" is a misspelling of "Mathematics"
licenses/LICENSE-heapq.txt:274:2: "STICHTING" is a misspelling of "STITCHING"
licenses/LICENSE-heapq.txt:274:12: "MATHEMATISCH" is a misspelling of "MATHEMATICS"
licenses/LICENSE-heapq.txt:276:29: "STICHTING" is a misspelling of "STITCHING"
licenses/LICENSE-heapq.txt:276:39: "MATHEMATISCH" is a misspelling of "MATHEMATICS"
licenses-binary/LICENSE-heapq.txt:5:63: "Stichting" is a misspelling of "Stitching"
licenses-binary/LICENSE-heapq.txt:6:2: "Mathematisch" is a misspelling of "Mathematics"
licenses-binary/LICENSE-heapq.txt:262:29: "Stichting" is a misspelling of "Stitching"
licenses-binary/LICENSE-heapq.txt:262:39: "Mathematisch" is a misspelling of "Mathematics"
licenses-binary/LICENSE-heapq.txt:269:49: "Stichting" is a misspelling of "Stitching"
licenses-binary/LICENSE-heapq.txt:269:59: "Mathematisch" is a misspelling of "Mathematics"
licenses-binary/LICENSE-heapq.txt:274:2: "STICHTING" is a misspelling of "STITCHING"
licenses-binary/LICENSE-heapq.txt:274:12: "MATHEMATISCH" is a misspelling of "MATHEMATICS"
licenses-binary/LICENSE-heapq.txt:276:29: "STICHTING" is a misspelling of "STITCHING"
licenses-binary/LICENSE-heapq.txt:276:39: "MATHEMATISCH" is a misspelling of "MATHEMATICS"
mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/hungarian.txt:170:0: "teh" is a misspelling of "the"
mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/portuguese.txt:53:0: "eles" is a misspelling of "eels"
mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala:99:20: "Euclidian" is a misspelling of "Euclidean"
mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala:539:11: "Euclidian" is a misspelling of "Euclidean"
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala:77:36: "Teh" is a misspelling of "The"
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala:276:9: "Euclidian" is a misspelling of "Euclidean"
python/pyspark/ "Stichting" is a misspelling of "Stitching"
python/pyspark/ "Mathematisch" is a misspelling of "Mathematics"
python/pyspark/ "Stichting" is a misspelling of "Stitching"
python/pyspark/ "Mathematisch" is a misspelling of "Mathematics"
python/pyspark/ "Stichting" is a misspelling of "Stitching"
python/pyspark/ "Mathematisch" is a misspelling of "Mathematics"
python/pyspark/ "STICHTING" is a misspelling of "STITCHING"
python/pyspark/ "MATHEMATISCH" is a misspelling of "MATHEMATICS"
python/pyspark/ "STICHTING" is a misspelling of "STITCHING"
python/pyspark/ "MATHEMATISCH" is a misspelling of "MATHEMATICS"
python/pyspark/ml/ "Euclidian" is a misspelling of "Euclidean"

Closes apache#22070 from seratch/fix-typo.

Authored-by: Kazuhiro Sera <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
  • Loading branch information
seratch authored and srowen committed Aug 12, 2018
1 parent 4855d5c commit 8ec25cd
Show file tree
Hide file tree
Showing 61 changed files with 81 additions and 81 deletions.
4 changes: 2 additions & 2 deletions NOTICE-binary
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ which has the following notices:
PureJavaCrc32C from apache-hadoop-common
(Apache 2.0 license)

This library containd statically linked libstdc++. This inclusion is allowed by
This library contains statically linked libstdc++. This inclusion is allowed by
"GCC RUntime Library Exception"

Expand Down Expand Up @@ -1192,4 +1192,4 @@ Apache Solr (
Copyright 2014 The Apache Software Foundation

Apache Mahout (
Copyright 2014 The Apache Software Foundation
Copyright 2014 The Apache Software Foundation
2 changes: 1 addition & 1 deletion R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -588,7 +588,7 @@ setMethod("cache",
#' \url{}.
#' @param x the SparkDataFrame to persist.
#' @param newLevel storage level chosen for the persistance. See available options in
#' @param newLevel storage level chosen for the persistence. See available options in
#' the description.
#' @family SparkDataFrame functions
Expand Down
8 changes: 4 additions & 4 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@ setMethod("toDF", signature(x = "RDD"),
read.json.default <- function(path, ...) {
sparkSession <- getSparkSession()
options <- varargsToStrEnv(...)
# Allow the user to have a more flexible definiton of the text file path
# Allow the user to have a more flexible definition of the text file path
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
Expand Down Expand Up @@ -421,7 +421,7 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
read.orc <- function(path, ...) {
sparkSession <- getSparkSession()
options <- varargsToStrEnv(...)
# Allow the user to have a more flexible definiton of the ORC file path
# Allow the user to have a more flexible definition of the ORC file path
path <- suppressWarnings(normalizePath(path))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
Expand All @@ -442,7 +442,7 @@ read.orc <- function(path, ...) {
read.parquet.default <- function(path, ...) {
sparkSession <- getSparkSession()
options <- varargsToStrEnv(...)
# Allow the user to have a more flexible definiton of the Parquet file path
# Allow the user to have a more flexible definition of the Parquet file path
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
Expand Down Expand Up @@ -492,7 +492,7 @@ parquetFile <- function(x, ...) {
read.text.default <- function(path, ...) {
sparkSession <- getSparkSession()
options <- varargsToStrEnv(...)
# Allow the user to have a more flexible definiton of the text file path
# Allow the user to have a more flexible definition of the text file path
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
Expand Down
4 changes: 2 additions & 2 deletions R/pkg/R/context.R
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ getMinPartitions <- function(sc, minPartitions) {
#' lines <- textFile(sc, "myfile.txt")
textFile <- function(sc, path, minPartitions = NULL) {
# Allow the user to have a more flexible definiton of the text file path
# Allow the user to have a more flexible definition of the text file path
path <- suppressWarnings(normalizePath(path))
# Convert a string vector of paths to a string containing comma separated paths
path <- paste(path, collapse = ",")
Expand Down Expand Up @@ -71,7 +71,7 @@ textFile <- function(sc, path, minPartitions = NULL) {
#' rdd <- objectFile(sc, "myfile")
objectFile <- function(sc, path, minPartitions = NULL) {
# Allow the user to have a more flexible definiton of the text file path
# Allow the user to have a more flexible definition of the text file path
path <- suppressWarnings(normalizePath(path))
# Convert a string vector of paths to a string containing comma separated paths
path <- paste(path, collapse = ",")
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/streaming.R
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ setMethod("isActive",
#' @param x a StreamingQuery.
#' @param timeout time to wait in milliseconds, if omitted, wait indefinitely until \code{stopQuery}
#' is called or an error has occured.
#' is called or an error has occurred.
#' @return TRUE if query has terminated within the timeout period; nothing if timeout is not
#' specified.
#' @rdname awaitTermination
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/inst/worker/worker.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ compute <- function(mode, partition, serializer, deserializer, key,
# Transform the result data.frame back to a list of rows
output <- split(output, seq(nrow(output)))
} else {
# Serialize the ouput to a byte array
# Serialize the output to a byte array
stopifnot(serializer == "byte")
} else {
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/tests/fulltests/test_utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ test_that("cleanClosure on R functions", {
expect_true("l" %in% ls(env))
expect_true("f" %in% ls(env))
expect_equal(get("l", envir = env, inherits = FALSE), l)
# "y" should be in the environemnt of g.
# "y" should be in the environment of g.
newG <- get("g", envir = env, inherits = FALSE)
env <- environment(newG)
expect_equal(length(ls(env)), 1)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ public void testObjectWriteReadDelete() throws Exception {

try {, t.key);
fail("Expected exception for non-existant object.");
fail("Expected exception for non-existent object.");
} catch (NoSuchElementException nsee) {
// Expected.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ public void testObjectWriteReadDelete() throws Exception {

try {, t.key);
fail("Expected exception for non-existant object.");
fail("Expected exception for non-existent object.");
} catch (NoSuchElementException nsee) {
// Expected.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ public boolean release(int decrement) {

public long transferTo(WritableByteChannel target, long position) throws IOException {
Preconditions.checkArgument(position == transfered(), "Invalid position.");
Preconditions.checkArgument(position == transferred(), "Invalid position.");

do {
if (currentEncrypted == null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -231,17 +231,17 @@ public boolean release(int decrement) {
* data into memory at once, and can avoid ballooning memory usage when transferring large
* messages such as shuffle blocks.
* The {@link #transfered()} counter also behaves a little funny, in that it won't go forward
* The {@link #transferred()} counter also behaves a little funny, in that it won't go forward
* until a whole chunk has been written. This is done because the code can't use the actual
* number of bytes written to the channel as the transferred count (see {@link #count()}).
* Instead, once an encrypted chunk is written to the output (including its header), the
* size of the original block will be added to the {@link #transfered()} amount.
* size of the original block will be added to the {@link #transferred()} amount.
public long transferTo(final WritableByteChannel target, final long position)
throws IOException {

Preconditions.checkArgument(position == transfered(), "Invalid position.");
Preconditions.checkArgument(position == transferred(), "Invalid position.");

long reportedWritten = 0L;
long actuallyWritten = 0L;
Expand Down Expand Up @@ -273,7 +273,7 @@ public long transferTo(final WritableByteChannel target, final long position)
currentChunkSize = 0;
currentReportedBytes = 0;
} while (currentChunk == null && transfered() + reportedWritten < count());
} while (currentChunk == null && transferred() + reportedWritten < count());

// Returning 0 triggers a backoff mechanism in netty which may harm performance. Instead,
// we return 1 until we can (i.e. until the reported count would actually match the size
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,8 +192,8 @@ class UTF8StringPropertyCheckSuite extends FunSuite with GeneratorDrivenProperty
val nullalbeSeq = Gen.listOf(Gen.oneOf[String](null: String, randomString))

test("concat") {
def concat(orgin: Seq[String]): String =
if (orgin.contains(null)) null else orgin.mkString
def concat(origin: Seq[String]): String =
if (origin.contains(null)) null else origin.mkString

forAll { (inputs: Seq[String]) =>
assert(UTF8String.concat( _*) === toUTF8(inputs.mkString))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -622,7 +622,7 @@ private[spark] class PythonAccumulatorV2(
override def merge(other: AccumulatorV2[Array[Byte], JList[Array[Byte]]]): Unit = synchronized {
val otherPythonAccumulator = other.asInstanceOf[PythonAccumulatorV2]
// This conditional isn't strictly speaking needed - merging only currently happens on the
// driver program - but that isn't gauranteed so incase this changes.
// driver program - but that isn't guaranteed so incase this changes.
if (serverHost == null) {
// We are on the worker
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ private class ReadableChannelFileRegion(source: ReadableByteChannel, blockSize:
override def transferred(): Long = _transferred

override def transferTo(target: WritableByteChannel, pos: Long): Long = {
assert(pos == transfered(), "Invalid position.")
assert(pos == transferred(), "Invalid position.")

var written = 0L
var lastWrite = -1L
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging {

* This can be overriden by subclasses if there is any extra cleanup to do when removing a
* This can be overridden by subclasses if there is any extra cleanup to do when removing a
* listener. In particular AsyncEventQueues can clean up queues in the LiveListenerBus.
def removeListenerOnError(listener: L): Unit = {
Expand Down
2 changes: 1 addition & 1 deletion core/src/test/scala/org/apache/spark/ShuffleSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ abstract class ShuffleSuite extends SparkFunSuite with Matchers with LocalSparkC
val pairs2: RDD[MutablePair[Int, String]] = sc.parallelize(data2, 2)
val results = new SubtractedRDD(pairs1, pairs2, new HashPartitioner(2)).collect()
results should have length (1)
// substracted rdd return results as Tuple2
// subtracted rdd return results as Tuple2
results(0) should be ((3, 33))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2465,7 +2465,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi
taskSets(1).tasks(1), Success, makeMapStatus("hostA", 2)))

// Both tasks in rddB should be resubmitted, because none of them has succeeded truely.
// Both tasks in rddB should be resubmitted, because none of them has succeeded truly.
// Complete the task(stageId=1, stageAttemptId=1, partitionId=0) successfully.
// Task(stageId=1, stageAttemptId=1, partitionId=1) of this new active stage attempt
// is still running.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ import org.apache.spark._
class FlatmapIteratorSuite extends SparkFunSuite with LocalSparkContext {
/* Tests the ability of Spark to deal with user provided iterators from flatMap
* calls, that may generate more data then available memory. In any
* memory based persistance Spark will unroll the iterator into an ArrayBuffer
* for caching, however in the case that the use defines DISK_ONLY persistance,
* memory based persistence Spark will unroll the iterator into an ArrayBuffer
* for caching, however in the case that the use defines DISK_ONLY persistence,
* the iterator will be fed directly to the serializer and written to disk.
* This also tests the ObjectOutputStream reset rate. When serializing using the
Expand Down
4 changes: 2 additions & 2 deletions dev/
Original file line number Diff line number Diff line change
Expand Up @@ -374,8 +374,8 @@ def standardize_jira_ref(text):
>>> standardize_jira_ref("[SPARK-979] a LRU scheduler for load balancing in TaskSchedulerImpl")
'[SPARK-979] a LRU scheduler for load balancing in TaskSchedulerImpl'
>>> standardize_jira_ref(
... "SPARK-1094 Support MiMa for reporting binary compatibility accross versions.")
'[SPARK-1094] Support MiMa for reporting binary compatibility accross versions.'
... "SPARK-1094 Support MiMa for reporting binary compatibility across versions.")
'[SPARK-1094] Support MiMa for reporting binary compatibility across versions.'
>>> standardize_jira_ref("[WIP] [SPARK-1146] Vagrant support for Spark")
'[SPARK-1146][WIP] Vagrant support for Spark'
>>> standardize_jira_ref(
Expand Down
6 changes: 3 additions & 3 deletions dev/run-pip-tests
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ if hash virtualenv 2>/dev/null && [ ! -n "$USE_CONDA" ]; then
elif hash conda 2>/dev/null; then
echo "Using conda virtual enviroments"
echo "Using conda virtual environments"
Expand Down Expand Up @@ -88,7 +88,7 @@ for python in "${PYTHON_EXECS[@]}"; do
virtualenv --python=$python "$VIRTUALENV_PATH"
source "$VIRTUALENV_PATH"/bin/activate
# Upgrade pip & friends if using virutal env
# Upgrade pip & friends if using virtual env
if [ ! -n "$USE_CONDA" ]; then
pip install --upgrade pip pypandoc wheel numpy
Expand Down Expand Up @@ -123,7 +123,7 @@ for python in "${PYTHON_EXECS[@]}"; do

cd "$FWDIR"

# conda / virtualenv enviroments need to be deactivated differently
# conda / virtualenv environments need to be deactivated differently
if [ -n "$USE_CONDA" ]; then
source deactivate
Expand Down
2 changes: 1 addition & 1 deletion docs/
Original file line number Diff line number Diff line change
Expand Up @@ -1827,7 +1827,7 @@ Apart from these, the following properties are also available, and may be useful
executors w.r.t. full parallelism.
Defaults to 1.0 to give maximum parallelism.
0.5 will divide the target number of executors by 2
The target number of executors computed by the dynamicAllocation can still be overriden
The target number of executors computed by the dynamicAllocation can still be overridden
by the <code>spark.dynamicAllocation.minExecutors</code> and
<code>spark.dynamicAllocation.maxExecutors</code> settings
Expand Down
2 changes: 1 addition & 1 deletion docs/
Original file line number Diff line number Diff line change
Expand Up @@ -1888,7 +1888,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
- Since Spark 2.4, renaming a managed table to existing location is not allowed. An exception is thrown when attempting to rename a managed table to existing location.
- Since Spark 2.4, the type coercion rules can automatically promote the argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest common type, no matter how the input arguments order. In prior Spark versions, the promotion could fail in some specific orders (e.g., TimestampType, IntegerType and StringType) and throw an exception.
- Since Spark 2.4, Spark has enabled non-cascading SQL cache invalidation in addition to the traditional cache invalidation mechanism. The non-cascading cache invalidation mechanism allows users to remove a cache without impacting its dependent caches. This new cache invalidation mechanism is used in scenarios where the data of the cache to be removed is still valid, e.g., calling unpersist() on a Dataset, or dropping a temporary view. This allows users to free up memory and keep the desired caches valid at the same time.
- In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone. Therefore, these 2 functions can return unexpected results. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care about this problem and want to retain the previous behaivor to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a temporary workaround.
- In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone. Therefore, these 2 functions can return unexpected results. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care about this problem and want to retain the previous behavior to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a temporary workaround.
- In version 2.3 and earlier, Spark converts Parquet Hive tables by default but ignores table properties like `TBLPROPERTIES (parquet.compression 'NONE')`. This happens for ORC Hive table properties like `TBLPROPERTIES (orc.compress 'NONE')` in case of `spark.sql.hive.convertMetastoreOrc=true`, too. Since Spark 2.4, Spark respects Parquet/ORC specific table properties while converting Parquet/ORC Hive tables. As an example, `CREATE TABLE t(id int) STORED AS PARQUET TBLPROPERTIES (parquet.compression 'NONE')` would generate Snappy parquet files during insertion in Spark 2.3, and in Spark 2.4, the result would be uncompressed parquet files.
- Since Spark 2.0, Spark converts Parquet Hive tables by default for better performance. Since Spark 2.4, Spark converts ORC Hive tables by default, too. It means Spark uses its own ORC support by default instead of Hive SerDe. As an example, `CREATE TABLE t(id int) STORED AS ORC` would be handled with Hive SerDe in Spark 2.3, and in Spark 2.4, it would be converted into Spark's ORC data source table and ORC vectorization would be applied. To set `false` to `spark.sql.hive.convertMetastoreOrc` restores the previous behavior.
- In version 2.3 and earlier, CSV rows are considered as malformed if at least one column value in the row is malformed. CSV parser dropped such rows in the DROPMALFORMED mode or outputs an error in the FAILFAST mode. Since Spark 2.4, CSV row is considered as malformed only when it contains malformed column values requested from CSV datasource, other values can be ignored. As an example, CSV file contains the "id,name" header and one row "1234". In Spark 2.4, selection of the id column consists of a row with one column value 1234 but in Spark 2.3 and earlier it is empty in the DROPMALFORMED mode. To restore the previous behavior, set `spark.sql.csv.parser.columnPruning.enabled` to `false`.
Expand Down

0 comments on commit 8ec25cd

Please sign in to comment.