Skip to content

Commit

Permalink
[SPARK-2191][SQL] Make sure InsertIntoHiveTable doesn't execute more …
Browse files Browse the repository at this point in the history
…than once.

Author: Michael Armbrust <[email protected]>

Closes apache#1129 from marmbrus/doubleCreateAs and squashes the following commits:

9c6d9e4 [Michael Armbrust] Fix typo.
5128fe2 [Michael Armbrust] Make sure InsertIntoHiveTable doesn't execute each time you ask for its result.
  • Loading branch information
marmbrus authored and rxin committed Jun 19, 2014
1 parent bce0897 commit 777c595
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -344,12 +344,16 @@ case class InsertIntoHiveTable(
writer.commitJob()
}

override def execute() = result

/**
* Inserts all the rows in the table into Hive. Row objects are properly serialized with the
* `org.apache.hadoop.hive.serde2.SerDe` and the
* `org.apache.hadoop.mapred.OutputFormat` provided by the table definition.
*
* Note: this is run once and then kept to avoid double insertions.
*/
def execute() = {
private lazy val result: RDD[Row] = {
val childRdd = child.execute()
assert(childRdd != null)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ import org.apache.spark.sql.{SchemaRDD, execution, Row}
*/
class HiveQuerySuite extends HiveComparisonTest {

test("CREATE TABLE AS runs once") {
hql("CREATE TABLE foo AS SELECT 1 FROM src LIMIT 1").collect()
assert(hql("SELECT COUNT(*) FROM foo").collect().head.getLong(0) === 1,
"Incorrect number of rows in created table")
}

createQueryTest("between",
"SELECT * FROM src WHERE key Between 1 and 2")

Expand Down

0 comments on commit 777c595

Please sign in to comment.