-
Notifications
You must be signed in to change notification settings - Fork 19
Open partitioned parquet files #16
Comments
As a complement, I check using the duckdb cli and the following query works. I'm using Metabase v1.46.2 DuckDB v0.8.1 select * from read_parquet('/app/dados/dados_parquet/sicor_operacao_basica.parquet/*/*') |
Im using duckdb views to do this and also setting the datatype of each field instead of using star notation.
With this you can add or remove data from the path and it will automatically update to metabase. Not sure how well the caching works when you do change the data. |
I was having this issue when using a Duckdb database "in memory". But following what you did, I'm now creating a duckdb database on disk, inserting a bunch of views that use |
I misread you comment. After some testing I realized that if trying to open a partitioned parquet inside metabase, one must declare all fields contained in the files. For instance, this works: select REF_BACEN, NU_ORDEM from read_parquet('/dados/dados_parquet/sicor_operacao_basica.parquet/*/*') but this does not select * from read_parquet('/dados/dados_parquet/sicor_operacao_basica.parquet/*/*') I'm reopening the issue because I think that In case it helps, I'm attaching the metabase log that I get when I execute the query. metabase_partitioned_parquet_errorerr.log Not sure it's related but I'm also seeing this warning when I re-scan the duckdb fields in metabase:
Sorry for the message in portuguese. But it's basically saying that the driver is using Honey SQL1, which was discontinued in version 0.46.0 and will be excluded in the future. |
Hi! Did you try this thing direct in duckdb? Is it working? |
Yes, it works using duckdb CLI. I pasted the output a couple of comments above |
Sorry for the long delay in answering. I'm installing metabase and duckdb driver using the following
|
I just tested using a metabase installed as shown above and a partitioned parquet dataset. All worked fine! The following syntaxes worked: select * from read_parquet('/data/iris_part/**/*.parquet')
select * from read_parquet('/data/iris_part/**/*')
select * from read_parquet('/data/iris_part/*/*.parquet')
select * from read_parquet('/data/iris_part/*/*')
select * from read_parquet('/data/iris_part/**') |
I'm trying out metabase with duckdb for processing viewing a large parquet file. I'm running Metabase from a Docker container following the instructions found on this repo readme.
I was able to query several single parquet files using
However, I can't seam to find out how to do the same for a parquet file that was created with partitions. My parquet file is partitioned into years like so:
This is what I tried:
and
None worked
Is it possible to open partitioned parquet files in duckdb?
Edit: Looking at the duckdb documentation, one should use
parquet_scan
for this. But I'm getting Cannot invoke "Object.getClass()" because "target" is nullhttps://duckdb.org/docs/archive/0.8.1/data/partitioning/hive_partitioning
The text was updated successfully, but these errors were encountered: