Skip to content

Latest commit

 

History

History
81 lines (56 loc) · 2.94 KB

sql-data-sources-json.md

File metadata and controls

81 lines (56 loc) · 2.94 KB
layout title displayTitle
global
JSON Files
JSON Files
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`. This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`, or a JSON file.

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON.

For a regular multi-line JSON file, set the multiLine option to true.

{% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}

Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset`. This conversion can be done using `SparkSession.read().json()` on either a `Dataset`, or a JSON file.

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON.

For a regular multi-line JSON file, set the multiLine option to true.

{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}

Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. This conversion can be done using `SparkSession.read.json` on a JSON file.

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON.

For a regular multi-line JSON file, set the multiLine parameter to True.

{% include_example json_dataset python/sql/datasource.py %}

Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. using the `read.json()` function, which loads data from a directory of JSON files where each line of the files is a JSON object.

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON.

For a regular multi-line JSON file, set a named parameter multiLine to TRUE.

{% include_example json_dataset r/RSparkSQLExample.R %}

{% highlight sql %}

CREATE TEMPORARY VIEW jsonTable USING org.apache.spark.sql.json OPTIONS ( path "examples/src/main/resources/people.json" )

SELECT * FROM jsonTable

{% endhighlight %}