Skip to content

Commit

Permalink
feat: disable Parquet by default (breaking change) (minio#9920)
Browse files Browse the repository at this point in the history
I have built a fuzz test and it crashes heavily in seconds and will OOM shortly after.
It seems like supporting Parquet is basically a completely open way to crash the 
server if you can upload a file and run s3 select on it.

Until Parquet is more hardened it is DISABLED by default since hostile 
crafted input can easily crash the server.

If you are in a controlled environment where it is safe to assume no hostile
content can be uploaded to your cluster you can safely enable Parquet.

To enable Parquet set the environment variable `MINIO_API_SELECT_PARQUET=on`
while starting the MinIO server.

Furthermore, we guard parquet by recover functions.
  • Loading branch information
klauspost authored Aug 18, 2020
1 parent d2a3f92 commit adca288
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 2 deletions.
15 changes: 14 additions & 1 deletion docs/select/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,26 @@ Traditional retrieval of objects is always as whole entities, i.e GetObject for

You can use the Select API to query objects with following features:

- CSV, JSON and Parquet - Objects must be in CSV, JSON, or Parquet format.
- Objects must be in CSV, JSON, or Parquet(*) format.
- UTF-8 is the only encoding type the Select API supports.
- GZIP or BZIP2 - CSV and JSON files can be compressed using GZIP or BZIP2. The Select API supports columnar compression for Parquet using GZIP, Snappy, LZ4. Whole object compression is not supported for Parquet objects.
- Server-side encryption - The Select API supports querying objects that are protected with server-side encryption.

Type inference and automatic conversion of values is performed based on the context when the value is un-typed (such as when reading CSV data). If present, the CAST function overrides automatic conversion.

The [mc sql](https://docs.min.io/docs/minio-client-complete-guide.html#sql) command can be used for executing queries using the command line.

(*) Parquet is disabled on the MinIO server by default. See below how to enable it.

## Enabling Parquet Format

Parquet is DISABLED by default since hostile crafted input can easily crash the server.

If you are in a controlled environment where it is safe to assume no hostile content can be uploaded to your cluster you can safely enable Parquet.
To enable Parquet set the environment variable `MINIO_API_SELECT_PARQUET=on`.

# Example using Python API

## 1. Prerequisites
- Install MinIO Server from [here](http://docs.min.io/docs/minio-quickstart-guide).
- Familiarity with AWS S3 API.
Expand Down
14 changes: 13 additions & 1 deletion pkg/s3select/parquet/reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
package parquet

import (
"fmt"
"io"

"github.com/bcicen/jstream"
Expand All @@ -34,6 +35,12 @@ type Reader struct {

// Read - reads single record.
func (r *Reader) Read(dst sql.Record) (rec sql.Record, rerr error) {
defer func() {
if rec := recover(); rec != nil {
rerr = fmt.Errorf("panic reading parquet record: %v", rec)
}
}()

parquetRecord, err := r.reader.Read()
if err != nil {
if err != io.EOF {
Expand Down Expand Up @@ -92,7 +99,12 @@ func (r *Reader) Close() error {
}

// NewReader - creates new Parquet reader using readerFunc callback.
func NewReader(getReaderFunc func(offset, length int64) (io.ReadCloser, error), args *ReaderArgs) (*Reader, error) {
func NewReader(getReaderFunc func(offset, length int64) (io.ReadCloser, error), args *ReaderArgs) (r *Reader, err error) {
defer func() {
if rec := recover(); rec != nil {
err = fmt.Errorf("panic reading parquet header: %v", rec)
}
}()
reader, err := parquetgo.NewReader(getReaderFunc, nil)
if err != nil {
if err != io.EOF {
Expand Down
4 changes: 4 additions & 0 deletions pkg/s3select/select.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (
"io"
"io/ioutil"
"net/http"
"os"
"strings"
"sync"

Expand Down Expand Up @@ -334,6 +335,9 @@ func (s3Select *S3Select) Open(getReader func(offset, length int64) (io.ReadClos
}
return nil
case parquetFormat:
if !strings.EqualFold(os.Getenv("MINIO_API_SELECT_PARQUET"), "on") {
return errors.New("parquet format parsing not enabled on server")
}
var err error
s3Select.recordReader, err = parquet.NewReader(getReader, &s3Select.Input.ParquetArgs)
return err
Expand Down
2 changes: 2 additions & 0 deletions pkg/s3select/select_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -925,6 +925,8 @@ func TestJSONInput(t *testing.T) {
}

func TestParquetInput(t *testing.T) {
os.Setenv("MINIO_API_SELECT_PARQUET", "on")
defer os.Setenv("MINIO_API_SELECT_PARQUET", "off")

var testTable = []struct {
requestXML []byte
Expand Down

0 comments on commit adca288

Please sign in to comment.