Skip to content

Commit

Permalink
[SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.

There are several problems with it:

- It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".

- > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.

  (see  joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))

To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.

There fixes are similar instances that previously fixed. Please refer apache#15999 and apache#16013

Note that this only fixes **errors** not **warnings**. Please see my observation apache#17389 (comment) for spurious errors by warnings.

## How was this patch tested?

Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.

This was tested via manually adding `time.time()` as below:

```diff
     profiles_and_goals = build_profiles + sbt_goals

     print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
           " ".join(profiles_and_goals))

+    import time
+    st = time.time()
     exec_sbt(profiles_and_goals)
+    print("Elapsed :[%s]" % str(time.time() - st))
```

produces

```
...
========================================================================
Building Unidoc API Documentation
========================================================================
...
[info] Main Java API documentation successful.
...
Elapsed :[94.8746569157]
...

Author: hyukjinkwon <[email protected]>

Closes apache#17477 from HyukjinKwon/SPARK-18692.
  • Loading branch information
HyukjinKwon authored and srowen committed Apr 12, 2017
1 parent 2e1fd46 commit ceaf77a
Showing 49 changed files with 140 additions and 106 deletions.
10 changes: 5 additions & 5 deletions core/src/main/scala/org/apache/spark/rpc/RpcEndpoint.scala
Original file line number Diff line number Diff line change
@@ -35,7 +35,7 @@ private[spark] trait RpcEnvFactory {
*
* The life-cycle of an endpoint is:
*
* constructor -> onStart -> receive* -> onStop
* {@code constructor -> onStart -> receive* -> onStop}
*
* Note: `receive` can be called concurrently. If you want `receive` to be thread-safe, please use
* [[ThreadSafeRpcEndpoint]]
@@ -63,16 +63,16 @@ private[spark] trait RpcEndpoint {
}

/**
* Process messages from [[RpcEndpointRef.send]] or [[RpcCallContext.reply)]]. If receiving a
* unmatched message, [[SparkException]] will be thrown and sent to `onError`.
* Process messages from `RpcEndpointRef.send` or `RpcCallContext.reply`. If receiving a
* unmatched message, `SparkException` will be thrown and sent to `onError`.
*/
def receive: PartialFunction[Any, Unit] = {
case _ => throw new SparkException(self + " does not implement 'receive'")
}

/**
* Process messages from [[RpcEndpointRef.ask]]. If receiving a unmatched message,
* [[SparkException]] will be thrown and sent to `onError`.
* Process messages from `RpcEndpointRef.ask`. If receiving a unmatched message,
* `SparkException` will be thrown and sent to `onError`.
*/
def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case _ => context.sendFailure(new SparkException(self + " won't reply anything"))
2 changes: 1 addition & 1 deletion core/src/main/scala/org/apache/spark/rpc/RpcTimeout.scala
Original file line number Diff line number Diff line change
@@ -26,7 +26,7 @@ import org.apache.spark.SparkConf
import org.apache.spark.util.{ThreadUtils, Utils}

/**
* An exception thrown if RpcTimeout modifies a [[TimeoutException]].
* An exception thrown if RpcTimeout modifies a `TimeoutException`.
*/
private[rpc] class RpcTimeoutException(message: String, cause: TimeoutException)
extends TimeoutException(message) { initCause(cause) }
Original file line number Diff line number Diff line change
@@ -607,7 +607,7 @@ class DAGScheduler(
* @param resultHandler callback to pass each result to
* @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name
*
* @throws Exception when the job fails
* @note Throws `Exception` when the job fails
*/
def runJob[T, U](
rdd: RDD[T],
@@ -644,7 +644,7 @@ class DAGScheduler(
*
* @param rdd target RDD to run tasks on
* @param func a function to run on each partition of the RDD
* @param evaluator [[ApproximateEvaluator]] to receive the partial results
* @param evaluator `ApproximateEvaluator` to receive the partial results
* @param callSite where in the user program this job was called
* @param timeout maximum time to wait for the job, in milliseconds
* @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name
Original file line number Diff line number Diff line change
@@ -42,7 +42,7 @@ private[spark] trait ExternalClusterManager {

/**
* Create a scheduler backend for the given SparkContext and scheduler. This is
* called after task scheduler is created using [[ExternalClusterManager.createTaskScheduler()]].
* called after task scheduler is created using `ExternalClusterManager.createTaskScheduler()`.
* @param sc SparkContext
* @param masterURL the master URL
* @param scheduler TaskScheduler that will be used with the scheduler backend.
Original file line number Diff line number Diff line change
@@ -38,7 +38,7 @@ import org.apache.spark.util.{AccumulatorV2, ThreadUtils, Utils}

/**
* Schedules tasks for multiple types of clusters by acting through a SchedulerBackend.
* It can also work with a local setup by using a [[LocalSchedulerBackend]] and setting
* It can also work with a local setup by using a `LocalSchedulerBackend` and setting
* isLocal to true. It handles common logic, like determining a scheduling order across jobs, waking
* up to launch speculative tasks, etc.
*
@@ -704,12 +704,12 @@ private[spark] object TaskSchedulerImpl {
* Used to balance containers across hosts.
*
* Accepts a map of hosts to resource offers for that host, and returns a prioritized list of
* resource offers representing the order in which the offers should be used. The resource
* resource offers representing the order in which the offers should be used. The resource
* offers are ordered such that we'll allocate one container on each host before allocating a
* second container on any host, and so on, in order to reduce the damage if a host fails.
*
* For example, given <h1, [o1, o2, o3]>, <h2, [o4]>, <h1, [o5, o6]>, returns
* [o1, o5, o4, 02, o6, o3]
* For example, given {@literal <h1, [o1, o2, o3]>}, {@literal <h2, [o4]>} and
* {@literal <h3, [o5, o6]>}, returns {@literal [o1, o5, o4, o2, o6, o3]}.
*/
def prioritizeContainers[K, T] (map: HashMap[K, ArrayBuffer[T]]): List[T] = {
val _keyList = new ArrayBuffer[K](map.size)
Original file line number Diff line number Diff line change
@@ -66,7 +66,7 @@ private[spark] trait BlockData {
/**
* Returns a Netty-friendly wrapper for the block's data.
*
* @see [[ManagedBuffer#convertToNetty()]]
* Please see `ManagedBuffer.convertToNetty()` for more details.
*/
def toNetty(): Object

4 changes: 2 additions & 2 deletions core/src/test/scala/org/apache/spark/AccumulatorSuite.scala
Original file line number Diff line number Diff line change
@@ -243,7 +243,7 @@ private[spark] object AccumulatorSuite {
import InternalAccumulator._

/**
* Create a long accumulator and register it to [[AccumulatorContext]].
* Create a long accumulator and register it to `AccumulatorContext`.
*/
def createLongAccum(
name: String,
@@ -258,7 +258,7 @@ private[spark] object AccumulatorSuite {
}

/**
* Make an [[AccumulableInfo]] out of an [[Accumulable]] with the intent to use the
* Make an `AccumulableInfo` out of an [[Accumulable]] with the intent to use the
* info as an accumulator update.
*/
def makeInfo(a: AccumulatorV2[_, _]): AccumulableInfo = a.toInfo(Some(a.value), None)
Original file line number Diff line number Diff line change
@@ -27,7 +27,7 @@ import org.apache.spark.network.shuffle.{ExternalShuffleBlockHandler, ExternalSh
/**
* This suite creates an external shuffle server and routes all shuffle fetches through it.
* Note that failures in this suite may arise due to changes in Spark that invalidate expectations
* set up in [[ExternalShuffleBlockHandler]], such as changing the format of shuffle files or how
* set up in `ExternalShuffleBlockHandler`, such as changing the format of shuffle files or how
* we hash files into folders.
*/
class ExternalShuffleServiceSuite extends ShuffleSuite with BeforeAndAfterAll {
Original file line number Diff line number Diff line change
@@ -22,7 +22,7 @@ import org.scalatest.BeforeAndAfterAll
import org.scalatest.BeforeAndAfterEach
import org.scalatest.Suite

/** Manages a local `sc` {@link SparkContext} variable, correctly stopping it after each test. */
/** Manages a local `sc` `SparkContext` variable, correctly stopping it after each test. */
trait LocalSparkContext extends BeforeAndAfterEach with BeforeAndAfterAll { self: Suite =>

@transient var sc: SparkContext = _
Original file line number Diff line number Diff line change
@@ -95,12 +95,12 @@ abstract class SchedulerIntegrationSuite[T <: MockBackend: ClassTag] extends Spa
}

/**
* A map from partition -> results for all tasks of a job when you call this test framework's
* A map from partition to results for all tasks of a job when you call this test framework's
* [[submit]] method. Two important considerations:
*
* 1. If there is a job failure, results may or may not be empty. If any tasks succeed before
* the job has failed, they will get included in `results`. Instead, check for job failure by
* checking [[failure]]. (Also see [[assertDataStructuresEmpty()]])
* checking [[failure]]. (Also see `assertDataStructuresEmpty()`)
*
* 2. This only gets cleared between tests. So you'll need to do special handling if you submit
* more than one job in one test.
Original file line number Diff line number Diff line change
@@ -29,7 +29,7 @@ import org.apache.spark.serializer.KryoTest.RegistratorWithoutAutoReset
/**
* Tests to ensure that [[Serializer]] implementations obey the API contracts for methods that
* describe properties of the serialized stream, such as
* [[Serializer.supportsRelocationOfSerializedObjects]].
* `Serializer.supportsRelocationOfSerializedObjects`.
*/
class SerializerPropertiesSuite extends SparkFunSuite {

15 changes: 15 additions & 0 deletions dev/run-tests.py
Original file line number Diff line number Diff line change
@@ -344,6 +344,19 @@ def build_spark_sbt(hadoop_version):
exec_sbt(profiles_and_goals)


def build_spark_unidoc_sbt(hadoop_version):
set_title_and_block("Building Unidoc API Documentation", "BLOCK_DOCUMENTATION")
# Enable all of the profiles for the build:
build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
sbt_goals = ["unidoc"]
profiles_and_goals = build_profiles + sbt_goals

print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
" ".join(profiles_and_goals))

exec_sbt(profiles_and_goals)


def build_spark_assembly_sbt(hadoop_version):
# Enable all of the profiles for the build:
build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
@@ -352,6 +365,8 @@ def build_spark_assembly_sbt(hadoop_version):
print("[info] Building Spark assembly (w/Hive 1.2.1) using SBT with these arguments: ",
" ".join(profiles_and_goals))
exec_sbt(profiles_and_goals)
# Make sure that Java and Scala API documentation can be generated
build_spark_unidoc_sbt(hadoop_version)


def build_apache_spark(build_tool, hadoop_version):
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@ import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

/**
* Provides a method to run tests against a {@link SparkContext} variable that is correctly stopped
* Provides a method to run tests against a `SparkContext` variable that is correctly stopped
* after each test.
*/
trait LocalSparkContext {
Original file line number Diff line number Diff line change
@@ -74,7 +74,7 @@ abstract class Classifier[
* and features (`Vector`).
* @param numClasses Number of classes label can take. Labels must be integers in the range
* [0, numClasses).
* @throws SparkException if any label is not an integer >= 0
* @note Throws `SparkException` if any label is a non-integer or is negative
*/
protected def extractLabeledPoints(dataset: Dataset[_], numClasses: Int): RDD[LabeledPoint] = {
require(numClasses > 0, s"Classifier (in extractLabeledPoints) found numClasses =" +
8 changes: 6 additions & 2 deletions mllib/src/test/scala/org/apache/spark/ml/PipelineSuite.scala
Original file line number Diff line number Diff line change
@@ -230,7 +230,9 @@ class PipelineSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul
}


/** Used to test [[Pipeline]] with [[MLWritable]] stages */
/**
* Used to test [[Pipeline]] with `MLWritable` stages
*/
class WritableStage(override val uid: String) extends Transformer with MLWritable {

final val intParam: IntParam = new IntParam(this, "intParam", "doc")
@@ -257,7 +259,9 @@ object WritableStage extends MLReadable[WritableStage] {
override def load(path: String): WritableStage = super.load(path)
}

/** Used to test [[Pipeline]] with non-[[MLWritable]] stages */
/**
* Used to test [[Pipeline]] with non-`MLWritable` stages
*/
class UnWritableStage(override val uid: String) extends Transformer {

final val intParam: IntParam = new IntParam(this, "intParam", "doc")
12 changes: 8 additions & 4 deletions mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala
Original file line number Diff line number Diff line change
@@ -29,17 +29,21 @@ private[ml] object LSHTest {
* the following property is satisfied.
*
* There exist dist1, dist2, p1, p2, so that for any two elements e1 and e2,
* If dist(e1, e2) <= dist1, then Pr{h(x) == h(y)} >= p1
* If dist(e1, e2) >= dist2, then Pr{h(x) == h(y)} <= p2
* If dist(e1, e2) is less than or equal to dist1, then Pr{h(x) == h(y)} is greater than
* or equal to p1
* If dist(e1, e2) is greater than or equal to dist2, then Pr{h(x) == h(y)} is less than
* or equal to p2
*
* This is called locality sensitive property. This method checks the property on an
* existing dataset and calculate the probabilities.
* (https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Definition)
*
* This method hashes each elements to hash buckets using LSH, and calculate the false positive
* and false negative:
* False positive: Of all (e1, e2) sharing any bucket, the probability of dist(e1, e2) > distFP
* False negative: Of all (e1, e2) not sharing buckets, the probability of dist(e1, e2) < distFN
* False positive: Of all (e1, e2) sharing any bucket, the probability of dist(e1, e2) is greater
* than distFP
* False negative: Of all (e1, e2) not sharing buckets, the probability of dist(e1, e2) is less
* than distFN
*
* @param dataset The dataset to verify the locality sensitive hashing property.
* @param lsh The lsh instance to perform the hashing
Original file line number Diff line number Diff line change
@@ -377,7 +377,7 @@ class ParamsSuite extends SparkFunSuite {
object ParamsSuite extends SparkFunSuite {

/**
* Checks common requirements for [[Params.params]]:
* Checks common requirements for `Params.params`:
* - params are ordered by names
* - param parent has the same UID as the object's UID
* - param name is the same as the param method name
Original file line number Diff line number Diff line change
@@ -34,7 +34,7 @@ private[ml] object TreeTests extends SparkFunSuite {
* Convert the given data to a DataFrame, and set the features and label metadata.
* @param data Dataset. Categorical features and labels must already have 0-based indices.
* This must be non-empty.
* @param categoricalFeatures Map: categorical feature index -> number of distinct values
* @param categoricalFeatures Map: categorical feature index to number of distinct values
* @param numClasses Number of classes label can take. If 0, mark as continuous.
* @return DataFrame with metadata
*/
@@ -69,7 +69,9 @@ private[ml] object TreeTests extends SparkFunSuite {
df("label").as("label", labelMetadata))
}

/** Java-friendly version of [[setMetadata()]] */
/**
* Java-friendly version of `setMetadata()`
*/
def setMetadata(
data: JavaRDD[LabeledPoint],
categoricalFeatures: java.util.Map[java.lang.Integer, java.lang.Integer],
Original file line number Diff line number Diff line change
@@ -81,20 +81,20 @@ trait DefaultReadWriteTest extends TempDirectory { self: Suite =>
/**
* Default test for Estimator, Model pairs:
* - Explicitly set Params, and train model
* - Test save/load using [[testDefaultReadWrite()]] on Estimator and Model
* - Test save/load using `testDefaultReadWrite` on Estimator and Model
* - Check Params on Estimator and Model
* - Compare model data
*
* This requires that [[Model]]'s [[Param]]s should be a subset of [[Estimator]]'s [[Param]]s.
* This requires that `Model`'s `Param`s should be a subset of `Estimator`'s `Param`s.
*
* @param estimator Estimator to test
* @param dataset Dataset to pass to [[Estimator.fit()]]
* @param testEstimatorParams Set of [[Param]] values to set in estimator
* @param testModelParams Set of [[Param]] values to set in model
* @param checkModelData Method which takes the original and loaded [[Model]] and compares their
* data. This method does not need to check [[Param]] values.
* @tparam E Type of [[Estimator]]
* @tparam M Type of [[Model]] produced by estimator
* @param dataset Dataset to pass to `Estimator.fit()`
* @param testEstimatorParams Set of `Param` values to set in estimator
* @param testModelParams Set of `Param` values to set in model
* @param checkModelData Method which takes the original and loaded `Model` and compares their
* data. This method does not need to check `Param` values.
* @tparam E Type of `Estimator`
* @tparam M Type of `Model` produced by estimator
*/
def testEstimatorAndModelReadWrite[
E <: Estimator[M] with MLWritable, M <: Model[M] with MLWritable](
Original file line number Diff line number Diff line change
@@ -105,8 +105,8 @@ class StopwatchSuite extends SparkFunSuite with MLlibTestSparkContext {
private object StopwatchSuite extends SparkFunSuite {

/**
* Checks the input stopwatch on a task that takes a random time (<10ms) to finish. Validates and
* returns the duration reported by the stopwatch.
* Checks the input stopwatch on a task that takes a random time (less than 10ms) to finish.
* Validates and returns the duration reported by the stopwatch.
*/
def checkStopwatch(sw: Stopwatch): Long = {
val ubStart = now
Original file line number Diff line number Diff line change
@@ -30,7 +30,9 @@ trait TempDirectory extends BeforeAndAfterAll { self: Suite =>

private var _tempDir: File = _

/** Returns the temporary directory as a [[File]] instance. */
/**
* Returns the temporary directory as a `File` instance.
*/
protected def tempDir: File = _tempDir

override def beforeAll(): Unit = {
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@ import org.apache.spark.SparkFunSuite
import org.apache.spark.mllib.tree.impurity.{EntropyAggregator, GiniAggregator}

/**
* Test suites for [[GiniAggregator]] and [[EntropyAggregator]].
* Test suites for `GiniAggregator` and `EntropyAggregator`.
*/
class ImpuritySuite extends SparkFunSuite {
test("Gini impurity does not support negative labels") {
Original file line number Diff line number Diff line change
@@ -60,7 +60,7 @@ trait MLlibTestSparkContext extends TempDirectory { self: Suite =>
* A helper object for importing SQL implicits.
*
* Note that the alternative of importing `spark.implicits._` is not possible here.
* This is because we create the [[SQLContext]] immediately before the first test is run,
* This is because we create the `SQLContext` immediately before the first test is run,
* but the implicits import is needed in the constructor.
*/
protected object testImplicits extends SQLImplicits {
Original file line number Diff line number Diff line change
@@ -239,7 +239,7 @@ trait MesosSchedulerUtils extends Logging {
}

/**
* Converts the attributes from the resource offer into a Map of name -> Attribute Value
* Converts the attributes from the resource offer into a Map of name to Attribute Value
* The attribute values are the mesos attribute types and they are
*
* @param offerAttributes the attributes offered
@@ -296,7 +296,7 @@ trait MesosSchedulerUtils extends Logging {

/**
* Parses the attributes constraints provided to spark and build a matching data struct:
* Map[<attribute-name>, Set[values-to-match]]
* {@literal Map[<attribute-name>, Set[values-to-match]}
* The constraints are specified as ';' separated key-value pairs where keys and values
* are separated by ':'. The ':' implies equality (for singular values) and "is one of" for
* multiple values (comma separated). For example:
@@ -354,7 +354,7 @@ trait MesosSchedulerUtils extends Logging {
* container overheads.
*
* @param sc SparkContext to use to get `spark.mesos.executor.memoryOverhead` value
* @return memory requirement as (0.1 * <memoryOverhead>) or MEMORY_OVERHEAD_MINIMUM
* @return memory requirement as (0.1 * memoryOverhead) or MEMORY_OVERHEAD_MINIMUM
* (whichever is larger)
*/
def executorMemory(sc: SparkContext): Int = {
Original file line number Diff line number Diff line change
@@ -117,11 +117,11 @@ object RandomDataGenerator {
}

/**
* Returns a function which generates random values for the given [[DataType]], or `None` if no
* Returns a function which generates random values for the given `DataType`, or `None` if no
* random data generator is defined for that data type. The generated values will use an external
* representation of the data type; for example, the random generator for [[DateType]] will return
* instances of [[java.sql.Date]] and the generator for [[StructType]] will return a [[Row]].
* For a [[UserDefinedType]] for a class X, an instance of class X is returned.
* representation of the data type; for example, the random generator for `DateType` will return
* instances of [[java.sql.Date]] and the generator for `StructType` will return a [[Row]].
* For a `UserDefinedType` for a class X, an instance of class X is returned.
*
* @param dataType the type to generate values for
* @param nullable whether null values should be generated
Loading

0 comments on commit ceaf77a

Please sign in to comment.