forked from databricks/spark-csv
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This pull request adds functionality to spark-csv with the goal of having the ability to write null values to file and read them back out again as null. Two changes were made to enable this. First, since the `com.databricks.spark.csv` package previously had the null string hardcoded to "`null`" when saving to a csv file, this was changed to read the null token out of the passed in parameters map, from the value for "`nullToken`", enabling writing null values as empty strings by use of this option. The default is left to "`null`" to maintain the previous behavior of the library. Secondly, the `castTo` method from `com.databricks.spark.csv.util.TypeCast` had an impossible-to-reach case statement when the `castType` was an instance of `StringType`. As a result, it was not possible to read string values from file as null. This pull request adds a setting 'treatEmptyValuesAsNulls' that allows empty string values in fields that are marked as nullable to be read as null values, as expected. Again, the previous behavior is enabled by default, so this pull request only changes the behavior when `treatEmptyValuesAsNulls` is explicitly set to true. The appropriate changes to `CsvParser` and `CsvRelation` were made to include this new setting. Additionally, a unit test has been added to `CsvSuite` to test the ability to round-trip (both string and non-string) null values by writing nulls and reading them back out again as nulls. Author: Andres Perez <[email protected]> Closes databricks#147 from andy327/feat-set-null-tokens.
- Loading branch information
Showing
7 changed files
with
57 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters