forked from oap-project/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-10247: [C++][Dataset] Support writing datasets partitioned on d…
…ictionary columns Enables usage of dictionary columns as partition columns on write. Additionally resolves some partition-related follow ups from apache#8894 (@pitrou): - raise an error status [instead of aborting](apache#8894) for overflowing maximum group count - handle dictionary index types [other than int32](apache#8894) - don't build an unused null bitmap [in CountsToOffsets](apache#8894) - improve docstrings for [MakeGroupings, ApplyGroupings](apache#8894) At some point, we'll probably want to support null grouping criteria. (For now, this PR adds a test asserting that nulls in any grouping column raise an error.) This will require adding an option/overload/... of dictionary_encode which places nulls in the dictionary instead of the indices, and ensuring Partitionings can format nulls appropriately. This would allow users to write a partitioned dataset which preserves nulls sensibly: ``` data/ col=a/ part-0.parquet # col is "a" throughout col=b/ part-1.parquet # col is "b" throughout part-2.parquet # col is null throughout ``` Closes apache#9130 from bkietz/10247-Cannot-write-dataset-with Lead-authored-by: Benjamin Kietzman <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
- Loading branch information
1 parent
864c2b0
commit eaa7b7a
Showing
12 changed files
with
356 additions
and
91 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.