Skip to content

Latest commit

 

History

History
 
 

parquet-cli

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Building

You can build this project using maven:

mvn clean install -DskipTests

Running

The build produces a shaded Jar that can be run using the hadoop command:

hadoop jar parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main

For a shorter command-line invocation, add an alias to your shell like this:

alias parquet="hadoop jar /path/to/parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main --dollar-zero parquet"

Running without Hadoop

To run from the target directory instead of using the hadoop command, first copy the dependencies to a folder:

mvn dependency:copy-dependencies

Then, run the command-line and add target/dependencies/* to the classpath:

java -cp 'target/parquet-cli-1.12.3.jar:target/dependency/*' org.apache.parquet.cli.Main

Note that you shouldn't include the runtime jar used above into the classpath in this case. In that jar, the org.apache.avro package is relocated for avoiding conflict with Hadoop's one. That relocation changes method signatures, so it can cause NoSuchMethodError depending on the class loading order. See PARQUET-2142 for details.

Help

The parquet tool includes help for the included commands:

parquet help
Usage: parquet [options] [command] [command options]

  Options:

    -v, --verbose, --debug
        Print extra debugging information

  Commands:

    help
        Retrieves details on the functions of other commands
    meta
        Print a Parquet file's metadata
    pages
        Print page summaries for a Parquet file
    dictionary
        Print dictionaries for a Parquet column
    check-stats
        Check Parquet files for corrupt page and column stats (PARQUET-251)
    schema
        Print the Avro schema for a file
    csv-schema
        Build a schema from a CSV data sample
    convert-csv
        Create a file from CSV data
    convert
        Create a Parquet file from a data file
    to-avro
        Create an Avro file from a data file
    cat
        Print the first N records from a file
    head
        Print the first N records from a file

  Examples:

    # print information for create
    parquet help meta

  See 'parquet help <command>' for more information on a specific command.