Skip to content

Commit

Permalink
Merge pull request apache#62 from Igosuki/avro
Browse files Browse the repository at this point in the history
Add basic AVRO files (translated copies of the parquet testing files to avro)
  • Loading branch information
alamb authored Sep 9, 2021
2 parents 2c29a73 + a150499 commit 1ec12d1
Show file tree
Hide file tree
Showing 18 changed files with 37 additions and 0 deletions.
37 changes: 37 additions & 0 deletions data/avro/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
This directory contains AVRO files corresponding to the parquet testing files at https://github.com/apache/parquet-testing/blob/master/data/

These files were created by using spark using the commands from https://gist.github.com/Igosuki/324b011f40185269d3fc552350d21744

Roughly:
```scala
import com.github.mrpowers.spark.daria.sql.DariaWriters
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import org.apache.commons.io.FilenameUtils

val fileGlobs = sc.getConf.get("spark.driver.globs")
val dest = sc.getConf.get("spark.driver.out")

val fs = FileSystem.get(new Configuration(true));
val status = fs.globStatus(new Path(fileGlobs))
for (fileStatus <- status) {
val path = fileStatus.getPath().toString()
try {
val dfin = spark.read.format("parquet").load(path)
val fileName = fileStatus.getPath().getName();
val fileNameWithOutExt = FilenameUtils.removeExtension(fileName);
val destination = s"${dest}/${fileNameWithOutExt}.avro"
println(s"Converting $path to avro at $destination")
DariaWriters.writeSingleFile(
df = dfin,
format = "avro",
sc = spark.sparkContext,
tmpFolder = s"/tmp/dw/${fileName}",
filename = destination
)
} catch {
case e: Throwable => println(s"failed to convert $path : ${e.getMessage}")
}
}
```
Binary file added data/avro/alltypes_dictionary.avro
Binary file not shown.
Binary file added data/avro/alltypes_plain.avro
Binary file not shown.
Binary file added data/avro/alltypes_plain.snappy.avro
Binary file not shown.
Binary file added data/avro/binary.avro
Binary file not shown.
Binary file added data/avro/datapage_v2.snappy.avro
Binary file not shown.
Binary file added data/avro/dict-page-offset-zero.avro
Binary file not shown.
Binary file added data/avro/fixed_length_decimal.avro
Binary file not shown.
Binary file added data/avro/fixed_length_decimal_legacy.avro
Binary file not shown.
Binary file added data/avro/int32_decimal.avro
Binary file not shown.
Binary file added data/avro/int64_decimal.avro
Binary file not shown.
Binary file added data/avro/list_columns.avro
Binary file not shown.
Binary file added data/avro/nested_lists.snappy.avro
Binary file not shown.
Binary file added data/avro/nonnullable.impala.avro
Binary file not shown.
Binary file added data/avro/nullable.impala.avro
Binary file not shown.
Binary file added data/avro/nulls.snappy.avro
Binary file not shown.
Binary file added data/avro/repeated_no_annotation.avro
Binary file not shown.
Binary file added data/avro/single_nan.avro
Binary file not shown.

0 comments on commit 1ec12d1

Please sign in to comment.