Skip to content

Commit

Permalink
[SPARK-4742][SQL] The name of Parquet File generated by AppendingParq…
Browse files Browse the repository at this point in the history
…uetOutputFormat should be zero padded

When I use Parquet File as a output file using ParquetOutputFormat#getDefaultWorkFile, the file name is not zero padded while RDD#saveAsText does zero padding.

Author: Sasaki Toru <[email protected]>

Closes apache#3602 from sasakitoa/parquet-zeroPadding and squashes the following commits:

6b0e58f [Sasaki Toru] Merge branch 'master' of git://github.com/apache/spark into parquet-zeroPadding
20dc79d [Sasaki Toru] Fixed the name of Parquet File generated by AppendingParquetOutputFormat
  • Loading branch information
sasakitoa authored and marmbrus committed Dec 12, 2014
1 parent 0abbff2 commit 8091dd6
Showing 1 changed file with 6 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ package org.apache.spark.sql.parquet
import java.io.IOException
import java.lang.{Long => JLong}
import java.text.SimpleDateFormat
import java.text.NumberFormat
import java.util.concurrent.{Callable, TimeUnit}
import java.util.{ArrayList, Collections, Date, List => JList}

Expand Down Expand Up @@ -338,9 +339,13 @@ private[parquet] class AppendingParquetOutputFormat(offset: Int)

// override to choose output filename so not overwrite existing ones
override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = {
val numfmt = NumberFormat.getInstance()
numfmt.setMinimumIntegerDigits(5)
numfmt.setGroupingUsed(false)

val taskId: TaskID = getTaskAttemptID(context).getTaskID
val partition: Int = taskId.getId
val filename = s"part-r-${partition + offset}.parquet"
val filename = "part-r-" + numfmt.format(partition + offset) + ".parquet"
val committer: FileOutputCommitter =
getOutputCommitter(context).asInstanceOf[FileOutputCommitter]
new Path(committer.getWorkPath, filename)
Expand Down

0 comments on commit 8091dd6

Please sign in to comment.