Skip to content

simo7/protoc-gen-parquet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

protoc-gen-parquet

A protoc plugin that generates parquet schemas from protobuf files. See examples.

Installation

go get github.com/simo7/protoc-gen-parquet

Alternatively clone the repo and build the plugin:

go build -o bin/protoc-gen-parquet .

export PATH=$PWD/bin:$PATH

Usage

Generate parquet schema:

protoc \
    --parquet_out=no_unsigned=true,go_file=true:./ \
    --parquet_opt=paths=source_relative \
    --go_opt=paths=source_relative \
    --go_out=./ \
    examples/person.proto

Re-generate parquet_options stubs:

protoc \
    --go_opt=paths=source_relative \
    --go_out=./ \
    parquet_options/parquet_options.proto

Flags

no_unsigned (bool): Avoid unsigned integers and use the corresponding intenger instead.

timestamp_int96 (bool): Fields extended as timestamps (see timestamp_type in parquet options can be defined as INT96 instead of INT64 to ensure compatibility with all Hive and Presto versions.

go_file (bool): An additional .go file containing the schema as a string constant will be generated. It makes it easier to import a versioned schema into a Go application.

Parquet Annotations

The following annotations are not implemented.

  • (DATE)
  • (UUID)
  • (MAP), (MAP_KEY_VALUE)
  • (STRING) (all UTF8 by default)

Well-known Protobuf types

Reference: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf.

The following types are supported:

  • google.protobuf.Timestamp

Compatibility

It's tested against the new protobuf API google.golang.org/protobuf or version 1.4.0 of the legacy API github.com/golang/protobuf.

About

A protobuf plugin to generate parquet schemas.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages