You can build this project using maven:
mvn clean install -DskipTests
The build produces a shaded Jar that can be run using the hadoop
command:
hadoop jar parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main
For a shorter command-line invocation, add an alias to your shell like this:
alias parquet="hadoop jar /path/to/parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main --dollar-zero parquet"
To run from the target directory instead of using the hadoop
command, first copy the dependencies to a folder:
mvn dependency:copy-dependencies
Then, run the command-line and add target/dependencies/*
to the classpath:
java -cp 'target/parquet-cli-1.12.3.jar:target/dependency/*' org.apache.parquet.cli.Main
Note that you shouldn't include the runtime jar used above into the classpath in this case.
In that jar, the org.apache.avro package
is relocated for avoiding conflict with Hadoop's one.
That relocation changes method signatures, so it can cause NoSuchMethodError
depending on the class loading order.
See PARQUET-2142 for details.
The parquet
tool includes help for the included commands:
parquet help
Usage: parquet [options] [command] [command options]
Options:
-v, --verbose, --debug
Print extra debugging information
Commands:
help
Retrieves details on the functions of other commands
meta
Print a Parquet file's metadata
pages
Print page summaries for a Parquet file
dictionary
Print dictionaries for a Parquet column
check-stats
Check Parquet files for corrupt page and column stats (PARQUET-251)
schema
Print the Avro schema for a file
csv-schema
Build a schema from a CSV data sample
convert-csv
Create a file from CSV data
convert
Create a Parquet file from a data file
to-avro
Create an Avro file from a data file
cat
Print the first N records from a file
head
Print the first N records from a file
Examples:
# print information for create
parquet help meta
See 'parquet help <command>' for more information on a specific command.