Skip to content

Latest commit

 

History

History
4146 lines (3541 loc) · 125 KB

table_api.md

File metadata and controls

4146 lines (3541 loc) · 125 KB
title is_beta nav-parent_id nav-pos
Table and SQL
true
apis
3

Table API and SQL are experimental features

The Table API is a SQL-like expression language for relational stream and batch processing that can be easily embedded in Flink's DataSet and DataStream APIs (Java and Scala). The Table API and SQL interface operate on a relational Table abstraction, which can be created from external data sources, or existing DataSets and DataStreams. With the Table API, you can apply relational operators such as selection, aggregation, and joins on Tables.

Tables can also be queried with regular SQL, as long as they are registered (see Registering Tables). The Table API and SQL offer equivalent functionality and can be mixed in the same program. When a Table is converted back into a DataSet or DataStream, the logical plan, which was defined by relational operators and SQL queries, is optimized using Apache Calcite and transformed into a DataSet or DataStream program.

  • This will be replaced by the TOC {:toc}

Using the Table API and SQL

The Table API and SQL are part of the flink-table Maven project. The following dependency must be added to your project in order to use the Table API and SQL:

{% highlight xml %} org.apache.flink flink-table{{ site.scala_version_suffix }} {{site.version }} {% endhighlight %}

Note: The Table API is currently not part of the binary distribution. See linking with it for cluster execution [here]({{ site.baseurl }}/dev/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).

Registering Tables

TableEnvironments have an internal table catalog to which tables can be registered with a unique name. After registration, a table can be accessed from the TableEnvironment by its name.

Note: DataSets or DataStreams can be directly converted into Tables without registering them in the TableEnvironment.

Register a DataSet

A DataSet is registered as a Table in a BatchTableEnvironment as follows:

{% highlight java %} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

// register the DataSet cust as table "Customers" with fields derived from the dataset tableEnv.registerDataSet("Customers", cust)

// register the DataSet ord as table "Orders" with fields user, product, and amount tableEnv.registerDataSet("Orders", ord, "user, product, amount"); {% endhighlight %}

{% highlight scala %} val env = ExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(env)

// register the DataSet cust as table "Customers" with fields derived from the dataset tableEnv.registerDataSet("Customers", cust)

// register the DataSet ord as table "Orders" with fields user, product, and amount tableEnv.registerDataSet("Orders", ord, 'user, 'product, 'amount) {% endhighlight %}

Note: The name of a DataSet Table must not match the ^_DataSetTable_[0-9]+ pattern which is reserved for internal use only.

Register a DataStream

A DataStream is registered as a Table in a StreamTableEnvironment as follows:

{% highlight java %} StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

// register the DataStream cust as table "Customers" with fields derived from the datastream tableEnv.registerDataStream("Customers", cust)

// register the DataStream ord as table "Orders" with fields user, product, and amount tableEnv.registerDataStream("Orders", ord, "user, product, amount"); {% endhighlight %}

{% highlight scala %} val env = StreamExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(env)

// register the DataStream cust as table "Customers" with fields derived from the datastream tableEnv.registerDataStream("Customers", cust)

// register the DataStream ord as table "Orders" with fields user, product, and amount tableEnv.registerDataStream("Orders", ord, 'user, 'product, 'amount) {% endhighlight %}

Note: The name of a DataStream Table must not match the ^_DataStreamTable_[0-9]+ pattern which is reserved for internal use only.

Register a Table

A Table that originates from a Table API operation or a SQL query is registered in a TableEnvironment as follows:

{% highlight java %} // works for StreamExecutionEnvironment identically ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

// convert a DataSet into a Table Table custT = tableEnv .toTable(custDs, "name, zipcode") .where("zipcode = '12345'") .select("name")

// register the Table custT as table "custNames" tableEnv.registerTable("custNames", custT) {% endhighlight %}

{% highlight scala %} // works for StreamExecutionEnvironment identically val env = ExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(env)

// convert a DataSet into a Table val custT = custDs .toTable(tableEnv, 'name, 'zipcode) .where('zipcode === "12345") .select('name)

// register the Table custT as table "custNames" tableEnv.registerTable("custNames", custT) {% endhighlight %}

A registered Table that originates from a Table API operation or SQL query is treated similarly as a view as known from relational DBMS, i.e., it can be inlined when optimizing the query.

Register an external Table using a TableSource

An external table is registered in a TableEnvironment using a TableSource as follows:

{% highlight java %} // works for StreamExecutionEnvironment identically ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

TableSource custTS = new CsvTableSource("/path/to/file", ...)

// register a TableSource as external table "Customers" tableEnv.registerTableSource("Customers", custTS) {% endhighlight %}

{% highlight scala %} // works for StreamExecutionEnvironment identically val env = ExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(env)

val custTS: TableSource = new CsvTableSource("/path/to/file", ...)

// register a TableSource as external table "Customers" tableEnv.registerTableSource("Customers", custTS)

{% endhighlight %}

A TableSource can provide access to data stored in various storage systems such as databases (MySQL, HBase, ...), file formats (CSV, Apache Parquet, Avro, ORC, ...), or messaging systems (Apache Kafka, RabbitMQ, ...).

Currently, Flink provides the CsvTableSource to read CSV files and the Kafka08JsonTableSource/Kafka09JsonTableSource to read JSON objects from Kafka. A custom TableSource can be defined by implementing the BatchTableSource or StreamTableSource interface.

Available Table Sources

| Class name | Maven dependency | Batch? | Streaming? | Description | CsvTableSouce | flink-table | Y | Y | A simple source for CSV files. | Kafka08JsonTableSource | flink-connector-kafka-0.8 | N | Y | A Kafka 0.8 source for JSON data. | Kafka09JsonTableSource | flink-connector-kafka-0.9 | N | Y | A Kafka 0.9 source for JSON data.

All sources that come with the flink-table dependency can be directly used by your Table programs. For all other table sources, you have to add the respective dependency in addition to the flink-table dependency.

KafkaJsonTableSource

To use the Kafka JSON source, you have to add the Kafka connector dependency to your project:

  • flink-connector-kafka-0.8 for Kafka 0.8, and
  • flink-connector-kafka-0.9 for Kafka 0.9, respectively.

You can then create the source as follows (example for Kafka 0.8):

// The JSON field names and types
String[] fieldNames =  new String[] { "id", "name", "score"};
Class<?>[] fieldTypes = new Class<?>[] { Integer.class, String.class, Double.class };

KafkaJsonTableSource kafkaTableSource = new Kafka08JsonTableSource(
    kafkaTopic,
    kafkaProperties,
    fieldNames,
    fieldTypes);

By default, a missing JSON field does not fail the source. You can configure this via:

// Fail on missing JSON field
tableSource.setFailOnMissingField(true);

You can work with the Table as explained in the rest of the Table API guide:

tableEnvironment.registerTableSource("kafka-source", kafkaTableSource);
Table result = tableEnvironment.ingest("kafka-source");

CsvTableSource

The CsvTableSource is already included in flink-table without additional dependecies.

It can be configured with the following properties:

  • path The path to the CSV file, required.
  • fieldNames The names of the table fields, required.
  • fieldTypes The types of the table fields, required.
  • fieldDelim The field delimiter, "," by default.
  • rowDelim The row delimiter, "\n" by default.
  • quoteCharacter An optional quote character for String values, null by default.
  • ignoreFirstLine Flag to ignore the first line, false by default.
  • ignoreComments An optional prefix to indicate comments, null by default.
  • lenient Flag to skip records with parse error instead to fail, false by default.

You can create the source as follows:

{% highlight java %} CsvTableSource csvTableSource = new CsvTableSource( "/path/to/your/file.csv", new String[] { "name", "id", "score", "comments" }, new TypeInformation[] { Types.STRING(), Types.INT(), Types.DOUBLE(), Types.STRING() }, "#", // fieldDelim "$", // rowDelim null, // quoteCharacter true, // ignoreFirstLine "%", // ignoreComments false); // lenient {% endhighlight %}
{% highlight scala %} val csvTableSource = new CsvTableSource( "/path/to/your/file.csv", Array("name", "id", "score", "comments"), Array( Types.STRING, Types.INT, Types.DOUBLE, Types.STRING ), fieldDelim = "#", rowDelim = "$", ignoreFirstLine = true, ignoreComments = "%") {% endhighlight %}

You can work with the Table as explained in the rest of the Table API guide in both stream and batch TableEnvironments:

{% highlight java %} tableEnvironment.registerTableSource("mycsv", csvTableSource);

Table streamTable = streamTableEnvironment.ingest("mycsv");

Table batchTable = batchTableEnvironment.scan("mycsv"); {% endhighlight %}

{% highlight scala %} tableEnvironment.registerTableSource("mycsv", csvTableSource)

val streamTable = streamTableEnvironment.ingest("mycsv")

val batchTable = batchTableEnvironment.scan("mycsv") {% endhighlight %}

Table API

The Table API provides methods to apply relational operations on DataSets and Datastreams both in Scala and Java.

The central concept of the Table API is a Table which represents a table with relational schema (or relation). Tables can be created from a DataSet or DataStream, converted into a DataSet or DataStream, or registered in a table catalog using a TableEnvironment. A Table is always bound to a specific TableEnvironment. It is not possible to combine Tables of different TableEnvironments.

Note: The only operations currently supported on streaming Tables are selection, projection, and union.

When using Flink's Java DataSet API, DataSets are converted to Tables and Tables to DataSets using a `TableEnvironment`. The following example shows:
  • how a DataSet is converted to a Table,
  • how relational queries are specified, and
  • how a Table is converted back to a DataSet.

{% highlight java %} public class WC {

public WC(String word, int count) { this.word = word; this.count = count; }

public WC() {} // empty constructor to satisfy POJO requirements

public String word; public int count; }

...

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);

DataSet input = env.fromElements( new WC("Hello", 1), new WC("Ciao", 1), new WC("Hello", 1));

Table table = tEnv.fromDataSet(input);

Table wordCounts = table .groupBy("word") .select("word, count.sum as count");

DataSet result = tableEnv.toDataSet(wordCounts, WC.class); {% endhighlight %}

With Java, expressions must be specified by Strings. The embedded expression DSL is not supported.

{% highlight java %} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

// register the DataSet cust as table "Customers" with fields derived from the dataset tableEnv.registerDataSet("Customers", cust)

// register the DataSet ord as table "Orders" with fields user, product, and amount tableEnv.registerDataSet("Orders", ord, "user, product, amount"); {% endhighlight %}

Please refer to the Javadoc for a full list of supported operations and a description of the expression syntax.

The Table API is enabled by importing `org.apache.flink.api.scala.table._`. This enables implicit conversions to convert a `DataSet` or `DataStream` to a Table. The following example shows:
  • how a DataSet is converted to a Table,
  • how relational queries are specified, and
  • how a Table is converted back to a DataSet.

{% highlight scala %} import org.apache.flink.api.scala._ import org.apache.flink.api.scala.table._

case class WC(word: String, count: Int)

val env = ExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvironment.getTableEnvironment(env)

val input = env.fromElements(WC("hello", 1), WC("hello", 1), WC("ciao", 1)) val expr = input.toTable(tEnv) val result = expr .groupBy('word) .select('word, 'count.sum as 'count) .toDataSet[WC] {% endhighlight %}

The expression DSL uses Scala symbols to refer to field names and code generation to transform expressions to efficient runtime code. Please note that the conversion to and from Tables only works when using Scala case classes or Java POJOs. Please refer to the [Type Extraction and Serialization]({{ site.baseurl }}/internals/types_serialization.html) section to learn the characteristics of a valid POJO.

Another example shows how to join two Tables:

{% highlight scala %} case class MyResult(a: String, d: Int)

val input1 = env.fromElements(...).toTable(tEnv).as('a, 'b) val input2 = env.fromElements(...).toTable(tEnv, 'c, 'd)

val joined = input1.join(input2) .where("a = c && d > 42") .select("a, d") .toDataSet[MyResult] {% endhighlight %}

Notice, how the field names of a Table can be changed with as() or specified with toTable() when converting a DataSet to a Table. In addition, the example shows how to use Strings to specify relational expressions.

Creating a Table from a DataStream works in a similar way. The following example shows how to convert a DataStream to a Table and filter it with the Table API.

{% highlight scala %} import org.apache.flink.api.scala._ import org.apache.flink.api.scala.table._

val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvironment.getTableEnvironment(env)

val inputStream = env.addSource(...) val result = inputStream .toTable(tEnv, 'a, 'b, 'c) .filter('a === 3) val resultStream = result.toDataStream[Row] {% endhighlight %}

Please refer to the Scaladoc for a full list of supported operations and a description of the expression syntax.

{% top %}

Access a registered Table

A registered table can be accessed from a TableEnvironment as follows:

  • tEnv.scan("tName") scans a Table that was registered as "tName" in a BatchTableEnvironment.
  • tEnv.ingest("tName") ingests a Table that was registered as "tName" in a StreamTableEnvironment.

{% top %}

Table API Operators

The Table API features a domain-specific language to execute language-integrated queries on structured data in Scala and Java. This section gives a brief overview of the available operators. You can find more details of operators in the Javadoc.

<tr>
  <td><strong>As</strong></td>
  <td>
    <p>Renames fields.</p>

{% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.as("d, e, f"); {% endhighlight %}

<tr>
  <td><strong>Where / Filter</strong></td>
  <td>
    <p>Similar to a SQL WHERE clause. Filters out rows that do not pass the filter predicate.</p>

{% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.where("b = 'red'"); {% endhighlight %} or {% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.filter("a % 2 = 0"); {% endhighlight %}

<tr>
  <td><strong>Join</strong></td>
  <td>
    <p>Similar to a SQL JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined through join operator or using a where or filter operator.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.join(right).where("a = d").select("a, b, e"); {% endhighlight %}

<tr>
  <td><strong>LeftOuterJoin</strong></td>
  <td>
    <p>Similar to a SQL LEFT OUTER JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.leftOuterJoin(right, "a = d").select("a, b, e"); {% endhighlight %}

<tr>
  <td><strong>RightOuterJoin</strong></td>
  <td>
    <p>Similar to a SQL RIGHT OUTER JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.rightOuterJoin(right, "a = d").select("a, b, e"); {% endhighlight %}

<tr>
  <td><strong>FullOuterJoin</strong></td>
  <td>
    <p>Similar to a SQL FULL OUTER JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.fullOuterJoin(right, "a = d").select("a, b, e"); {% endhighlight %}

<tr>
  <td><strong>Union</strong></td>
  <td>
    <p>Similar to a SQL UNION clause. Unions two tables with duplicate records removed. Both tables must have identical field types.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "a, b, c"); Table result = left.union(right); {% endhighlight %}

<tr>
  <td><strong>UnionAll</strong></td>
  <td>
    <p>Similar to a SQL UNION ALL clause. Unions two tables. Both tables must have identical field types.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "a, b, c"); Table result = left.unionAll(right); {% endhighlight %}

<tr>
  <td><strong>Intersect</strong></td>
  <td>
    <p>Similar to a SQL INTERSECT clause. Intersect returns records that exist in both tables. If a record is present one or both tables more than once, it is returned just once, i.e., the resulting table has no duplicate records. Both tables must have identical field types.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.intersect(right); {% endhighlight %}

<tr>
  <td><strong>IntersectAll</strong></td>
  <td>
    <p>Similar to a SQL INTERSECT ALL clause. IntersectAll returns records that exist in both tables. If a record is present in both tables more than once, it is returned as many times as it is present in both tables, i.e., the resulting table might have duplicate records. Both tables must have identical field types.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.intersectAll(right); {% endhighlight %}

<tr>
  <td><strong>Minus</strong></td>
  <td>
    <p>Similar to a SQL EXCEPT clause. Minus returns records from the left table that do not exist in the right table. Duplicate records in the left table are returned exactly once, i.e., duplicates are removed. Both tables must have identical field types.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "a, b, c"); Table result = left.minus(right); {% endhighlight %}

<tr>
  <td><strong>MinusAll</strong></td>
  <td>
    <p>Similar to a SQL EXCEPT ALL clause. MinusAll returns the records that do not exist in the right table. A record that is present n times in the left table and m times in the right table is returned (n - m) times, i.e., as many duplicates as are present in the right table are removed. Both tables must have identical field types.</p>

{% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "a, b, c"); Table result = left.minusAll(right); {% endhighlight %}

<tr>
  <td><strong>Distinct</strong></td>
  <td>
    <p>Similar to a SQL DISTINCT clause. Returns records with distinct value combinations.</p>

{% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.distinct(); {% endhighlight %}

<tr>
  <td><strong>Order By</strong></td>
  <td>
    <p>Similar to a SQL ORDER BY clause. Returns records globally sorted across all parallel partitions.</p>

{% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.orderBy("a.asc"); {% endhighlight %}

<tr>
  <td><strong>Limit</strong></td>
  <td>
    <p>Similar to a SQL LIMIT clause. Limits a sorted result to a specified number of records from an offset position. Limit is technically part of the Order By operator and thus must be preceded by it.</p>

{% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.orderBy("a.asc").limit(3); // returns unlimited number of records beginning with the 4th record {% endhighlight %} or {% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.orderBy("a.asc").limit(3, 5); // returns 5 records beginning with the 4th record {% endhighlight %}

Operators Description
Select

Similar to a SQL SELECT statement. Performs a select operation.

{% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.select("a, c as d"); {% endhighlight %}

You can use star (*) to act as a wild card, selecting all of the columns in the table.

{% highlight java %} Table result = in.select("*"); {% endhighlight %}
GroupBy

Similar to a SQL GROUPBY clause. Groups the rows on the grouping keys, with a following aggregation operator to aggregate rows group-wise.

{% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.groupBy("a").select("a, b.sum as d"); {% endhighlight %}
<tr>
  <td><strong>As</strong></td>
  <td>
    <p>Renames fields.</p>

{% highlight scala %} val in = ds.toTable(tableEnv).as('a, 'b, 'c); {% endhighlight %}

<tr>
  <td><strong>Where / Filter</strong></td>
  <td>
    <p>Similar to a SQL WHERE clause. Filters out rows that do not pass the filter predicate.</p>

{% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.filter('a % 2 === 0) {% endhighlight %} or {% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.where('b === "red"); {% endhighlight %}

<tr>
  <td><strong>GroupBy</strong></td>
  <td>
    <p>Similar to a SQL GROUPBY clause. Groups rows on the grouping keys, with a following aggregation
    operator to aggregate rows group-wise.</p>

{% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.groupBy('a).select('a, 'b.sum as 'd); {% endhighlight %}

<tr>
  <td><strong>Join</strong></td>
  <td>
    <p>Similar to a SQL JOIN clause. Joins two tables. Both tables must have distinct field names and an equality join predicate must be defined using a where or filter operator.</p>

{% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c); val right = ds2.toTable(tableEnv, 'd, 'e, 'f); val result = left.join(right).where('a === 'd).select('a, 'b, 'e); {% endhighlight %}

<tr>
  <td><strong>LeftOuterJoin</strong></td>
  <td>
    <p>Similar to a SQL LEFT OUTER JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined.</p>

{% highlight scala %} val left = tableEnv.fromDataSet(ds1, 'a, 'b, 'c) val right = tableEnv.fromDataSet(ds2, 'd, 'e, 'f) val result = left.leftOuterJoin(right, 'a === 'd).select('a, 'b, 'e) {% endhighlight %}

<tr>
  <td><strong>RightOuterJoin</strong></td>
  <td>
    <p>Similar to a SQL RIGHT OUTER JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined.</p>

{% highlight scala %} val left = tableEnv.fromDataSet(ds1, 'a, 'b, 'c) val right = tableEnv.fromDataSet(ds2, 'd, 'e, 'f) val result = left.rightOuterJoin(right, 'a === 'd).select('a, 'b, 'e) {% endhighlight %}

<tr>
  <td><strong>FullOuterJoin</strong></td>
  <td>
    <p>Similar to a SQL FULL OUTER JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined.</p>

{% highlight scala %} val left = tableEnv.fromDataSet(ds1, 'a, 'b, 'c) val right = tableEnv.fromDataSet(ds2, 'd, 'e, 'f) val result = left.fullOuterJoin(right, 'a === 'd).select('a, 'b, 'e) {% endhighlight %}

<tr>
  <td><strong>Union</strong></td>
  <td>
    <p>Similar to a SQL UNION clause. Unions two tables with duplicate records removed, both tables must have identical field types.</p>

{% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c); val right = ds2.toTable(tableEnv, 'a, 'b, 'c); val result = left.union(right); {% endhighlight %}

<tr>
  <td><strong>UnionAll</strong></td>
  <td>
    <p>Similar to a SQL UNION ALL clause. Unions two tables, both tables must have identical field types.</p>

{% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c); val right = ds2.toTable(tableEnv, 'a, 'b, 'c); val result = left.unionAll(right); {% endhighlight %}

<tr>
  <td><strong>Intersect</strong></td>
  <td>
    <p>Similar to a SQL INTERSECT clause. Intersect returns records that exist in both tables. If a record is present in one or both tables more than once, it is returned just once, i.e., the resulting table has no duplicate records. Both tables must have identical field types.</p>

{% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c); val right = ds2.toTable(tableEnv, 'e, 'f, 'g); val result = left.intersect(right); {% endhighlight %}

<tr>
  <td><strong>IntersectAll</strong></td>
  <td>
    <p>Similar to a SQL INTERSECT ALL clause. IntersectAll returns records that exist in both tables. If a record is present in both tables more than once, it is returned as many times as it is present in both tables, i.e., the resulting table might have duplicate records. Both tables must have identical field types.</p>

{% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c); val right = ds2.toTable(tableEnv, 'e, 'f, 'g); val result = left.intersectAll(right); {% endhighlight %}

<tr>
  <td><strong>Minus</strong></td>
  <td>
    <p>Similar to a SQL EXCEPT clause. Minus returns records from the left table that do not exist in the right table. Duplicate records in the left table are returned exactly once, i.e., duplicates are removed. Both tables must have identical field types.</p>

{% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c); val right = ds2.toTable(tableEnv, 'a, 'b, 'c); val result = left.minus(right); {% endhighlight %}

<tr>
  <td><strong>MinusAll</strong></td>
  <td>
    <p>Similar to a SQL EXCEPT ALL clause. MinusAll returns the records that do not exist in the right table. A record that is present n times in the left table and m times in the right table is returned (n - m) times, i.e., as many duplicates as are present in the right table are removed. Both tables must have identical field types.</p>

{% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c); val right = ds2.toTable(tableEnv, 'a, 'b, 'c); val result = left.minusAll(right); {% endhighlight %}

<tr>
  <td><strong>Distinct</strong></td>
  <td>
    <p>Similar to a SQL DISTINCT clause. Returns records with distinct value combinations.</p>

{% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.distinct(); {% endhighlight %}

<tr>
  <td><strong>Order By</strong></td>
  <td>
    <p>Similar to a SQL ORDER BY clause. Returns records globally sorted across all parallel partitions.</p>

{% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.orderBy('a.asc); {% endhighlight %}

<tr>
  <td><strong>Limit</strong></td>
  <td>
    <p>Similar to a SQL LIMIT clause. Limits a sorted result to a specified number of records from an offset position. Limit is technically part of the Order By operator and thus must be preceded by it.</p>

{% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.orderBy('a.asc).limit(3); // returns unlimited number of records beginning with the 4th record {% endhighlight %} or {% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.orderBy('a.asc).limit(3, 5); // returns 5 records beginning with the 4th record {% endhighlight %}

Operators Description
Select

Similar to a SQL SELECT statement. Performs a select operation.

{% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.select('a, 'c as 'd); {% endhighlight %}

You can use star (*) to act as a wild card, selecting all of the columns in the table.

{% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c); val result = in.select('*); {% endhighlight %}

{% top %}

Expression Syntax

Some of the operators in previous sections expect one or more expressions. Expressions can be specified using an embedded Scala DSL or as Strings. Please refer to the examples above to learn how expressions can be specified.

This is the EBNF grammar for expressions:

{% highlight ebnf %}

expressionList = expression , { "," , expression } ;

expression = alias ;

alias = logic | ( logic , "AS" , fieldReference ) ;

logic = comparison , [ ( "&&" | "||" ) , comparison ] ;

comparison = term , [ ( "=" | "==" | "===" | "!=" | "!==" | ">" | ">=" | "<" | "<=" ) , term ] ;

term = product , [ ( "+" | "-" ) , product ] ;

product = unary , [ ( "*" | "/" | "%") , unary ] ;

unary = [ "!" | "-" ] , composite ;

composite = suffixed | atom ;

suffixed = interval | cast | as | aggregation | if | functionCall ;

timeInterval = composite , "." , ("year" | "years" | "month" | "months" | "day" | "days" | "hour" | "hours" | "minute" | "minutes" | "second" | "seconds" | "milli" | "millis") ;

rowInterval = composite , "." , "rows" ;

cast = composite , ".cast(" , dataType , ")" ;

dataType = "BYTE" | "SHORT" | "INT" | "LONG" | "FLOAT" | "DOUBLE" | "BOOLEAN" | "STRING" | "DECIMAL" | "DATE" | "TIME" | "TIMESTAMP" | "INTERVAL_MONTHS" | "INTERVAL_MILLIS" ;

as = composite , ".as(" , fieldReference , ")" ;

aggregation = composite , ( ".sum" | ".min" | ".max" | ".count" | ".avg" | ".start" | ".end" ) , [ "()" ] ;

if = composite , ".?(" , expression , "," , expression , ")" ;

functionCall = composite , "." , functionIdentifier , [ "(" , [ expression , { "," , expression } ] , ")" ] ;

atom = ( "(" , expression , ")" ) | literal | nullLiteral | fieldReference ;

fieldReference = "*" | identifier ;

nullLiteral = "Null(" , dataType , ")" ;

timeIntervalUnit = "YEAR" | "YEAR_TO_MONTH" | "MONTH" | "DAY" | "DAY_TO_HOUR" | "DAY_TO_MINUTE" | "DAY_TO_SECOND" | "HOUR" | "HOUR_TO_MINUTE" | "HOUR_TO_SECOND" | "MINUTE" | "MINUTE_TO_SECOND" | "SECOND" ;

timePointUnit = "YEAR" | "MONTH" | "DAY" | "HOUR" | "MINUTE" | "SECOND" | "QUARTER" | "WEEK" | "MILLISECOND" | "MICROSECOND" ;

{% endhighlight %}

Here, literal is a valid Java literal, fieldReference specifies a column in the data (or all columns if * is used), and functionIdentifier specifies a supported scalar function. The column names and function names follow Java identifier syntax. The column name rowtime is a reserved logical attribute in streaming environments. Expressions specified as Strings can also use prefix notation instead of suffix notation to call operators and functions.

If working with exact numeric values or large decimals is required, the Table API also supports Java's BigDecimal type. In the Scala Table API decimals can be defined by BigDecimal("123456") and in Java by appending a "p" for precise e.g. 123456p.

In order to work with temporal values the Table API supports Java SQL's Date, Time, and Timestamp types. In the Scala Table API literals can be defined by using java.sql.Date.valueOf("2016-06-27"), java.sql.Time.valueOf("10:10:42"), or java.sql.Timestamp.valueOf("2016-06-27 10:10:42.123"). The Java and Scala Table API also support calling "2016-06-27".toDate(), "10:10:42".toTime(), and "2016-06-27 10:10:42.123".toTimestamp() for converting Strings into temporal types. Note: Since Java's temporal SQL types are time zone dependent, please make sure that the Flink Client and all TaskManagers use the same time zone.

Temporal intervals can be represented as number of months (Types.INTERVAL_MONTHS) or number of milliseconds (Types.INTERVAL_MILLIS). Intervals of same type can be added or subtracted (e.g. 1.hour + 10.minutes). Intervals of milliseconds can be added to time points (e.g. "2016-08-10".toDate + 5.days).

{% top %}

Windows

The Table API is a declarative API to define queries on batch and streaming tables. Projection, selection, and union operations can be applied both on streaming and batch tables without additional semantics. Aggregations on (possibly) infinite streaming tables, however, can only be computed on finite groups of records. Group-window aggregates group rows into finite groups based on time or row-count intervals and evaluate aggregation functions once per group. For batch tables, group-windows are a convenient shortcut to group records by time intervals.

Group-windows are defined using the window(w: GroupWindow) clause. The following example shows how to define a group-window aggregation on a table.

{% highlight java %} Table table = input .window(GroupWindow w) // define window .select("b.sum") // aggregate {% endhighlight %}
{% highlight scala %} val table = input .window(w: GroupWindow) // define window .select('b.sum) // aggregate {% endhighlight %}

In streaming environments, group-window aggregates can only be computed in parallel, if they are keyed, i.e., there is an additional groupBy attribute. Group-window aggregates without additional groupBy, such as in the example above, can only be evaluated in a single, non-parallel task. The following example shows how to define a keyed group-window aggregation on a table.

{% highlight java %} Table table = input .groupBy("a") .window(GroupWindow w) // define window .select("a, b.sum") // aggregate {% endhighlight %}
{% highlight scala %} val table = input .groupBy('a) .window(w: GroupWindow) // define window .select('a, 'b.sum) // aggregate {% endhighlight %}

The GroupWindow parameter defines how rows are mapped to windows. GroupWindow is not an interface that users can implement. Instead, the Table API provides a set of predefined GroupWindow classes with specific semantics, which are translated into underlying DataStream or DataSet operations. The supported window definitions are listed below. By assigning the group-window an alias using as, properties such as the start and end timestamp of a time window can be accessed in the select statement.

{% highlight java %} Table table = input .groupBy("a") .window(XXX.as("myWin")) // define window alias .select("a, myWin.start, myWin.end, b.count") // aggregate {% endhighlight %}
{% highlight scala %} val table = input .groupBy('a) .window(XXX as 'myWin) // define window alias .select('a, 'myWin.start, 'myWin.end, 'b.count) // aggregate {% endhighlight %}

Tumble (Tumbling Windows)

A tumbling window assigns rows to non-overlapping, continuous windows of fixed length. For example, a tumbling window of 5 minutes groups rows in 5 minutes intervals. Tumbling windows can be defined on event-time, processing-time, or on a row-count.

Tumbling windows are defined by using the Tumble class as follows:

Method Required? Description
over Required. Defines the length the window, either as time or row-count interval.
on Required for streaming event-time windows and windows on batch tables. Defines the time mode for streaming tables (rowtime is a logical system attribute); for batch tables, the time attribute on which records are grouped.
as Optional. Assigns an alias to the window that can be used in the following select() clause to access window properties such as window start or end time.
{% highlight java %} // Tumbling Event-time Window .window(Tumble.over("10.minutes").on("rowtime").as("w"))

// Tumbling Processing-time Window .window(Tumble.over("10.minutes").as("w"))

// Tumbling Row-count Window .window(Tumble.over("10.rows").as("w")) {% endhighlight %}

{% highlight scala %} // Tumbling Event-time Window .window(Tumble over 10.minutes on 'rowtime as 'w)

// Tumbling Processing-time Window .window(Tumble over 10.minutes as 'w)

// Tumbling Row-count Window .window(Tumble over 10.rows as 'w) {% endhighlight %}

Slide (Sliding Windows)

A sliding window has a fixed size and slides by a specified slide interval. If the slide interval is smaller than the window size, sliding windows are overlapping. Thus, rows can be assigned to multiple windows. For example, a sliding window of 15 minutes size and 5 minute slide interval assigns each row to 3 different windows of 15 minute size, which are evaluated in an interval of 5 minutes. Sliding windows can be defined on event-time, processing-time, or on a row-count.

Sliding windows are defined by using the Slide class as follows:

Method Required? Description
over Required. Defines the length of the window, either as time or row-count interval.
every Required. Defines the slide interval, either as time or row-count interval. The slide interval must be of the same type as the size interval.
on Required for event-time windows and windows on batch tables. Defines the time mode for streaming tables (rowtime is a logical system attribute); for batch tables, the time attribute on which records are grouped
as Optional. Assigns an alias to the window that can be used in the following select() clause to access window properties such as window start or end time.
{% highlight java %} // Sliding Event-time Window .window(Slide.over("10.minutes").every("5.minutes").on("rowtime").as("w"))

// Sliding Processing-time window .window(Slide.over("10.minutes").every("5.minutes").as("w"))

// Sliding Row-count window .window(Slide.over("10.rows").every("5.rows").as("w")) {% endhighlight %}

{% highlight scala %} // Sliding Event-time Window .window(Slide over 10.minutes every 5.minutes on 'rowtime as 'w)

// Sliding Processing-time window .window(Slide over 10.minutes every 5.minutes as 'w)

// Sliding Row-count window .window(Slide over 10.rows every 5.rows as 'w) {% endhighlight %}

Session (Session Windows)

Session windows do not have a fixed size but their bounds are defined by an interval of inactivity, i.e., a session window is closes if no event appears for a defined gap period. For example a session window with a 30 minute gap starts when a row is observed after 30 minutes inactivity (otherwise the row would be added to an existing window) and is closed if no row is added within 30 minutes. Session windows can work on event-time or processing-time.

A session window is defined by using the Session class as follows:

Method Required? Description
withGap Required. Defines the gap between two windows as time interval.
on Required for event-time windows and windows on batch tables. Defines the time mode for streaming tables (rowtime is a logical system attribute); for batch tables, the time attribute on which records are grouped
as Optional. Assigns an alias to the window that can be used in the following select() clause to access window properties such as window start or end time.
{% highlight java %} // Session Event-time Window .window(Session.withGap("10.minutes").on("rowtime").as("w"))

// Session Processing-time Window .window(Session.withGap("10.minutes").as("w")) {% endhighlight %}

{% highlight scala %} // Session Event-time Window .window(Session withGap 10.minutes on 'rowtime as 'w)

// Session Processing-time Window .window(Session withGap 10.minutes as 'w) {% endhighlight %}

Limitations

Currently the following features are not supported yet:

  • Row-count windows on event-time
  • Windows on batch tables

SQL

SQL queries are specified using the sql() method of the TableEnvironment. The method returns the result of the SQL query as a Table which can be converted into a DataSet or DataStream, used in subsequent Table API queries, or written to a TableSink (see Writing Tables to External Sinks). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single DataStream or DataSet program.

A Table, DataSet, DataStream, or external TableSource must be registered in the TableEnvironment in order to be accessible by a SQL query (see Registering Tables).

Note: Flink's SQL support is not feature complete, yet. Queries that include unsupported SQL features will cause a TableException. The limitations of SQL on batch and streaming tables are listed in the following sections.

SQL on Batch Tables

{% highlight java %} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

// read a DataSet from an external source DataSet<Tuple3<Long, String, Integer>> ds = env.readCsvFile(...); // register the DataSet as table "Orders" tableEnv.registerDataSet("Orders", ds, "user, product, amount"); // run a SQL query on the Table and retrieve the result as a new Table Table result = tableEnv.sql( "SELECT SUM(amount) FROM Orders WHERE product LIKE '%Rubber%'"); {% endhighlight %}

{% highlight scala %} val env = ExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(env)

// read a DataSet from an external source val ds: DataSet[(Long, String, Integer)] = env.readCsvFile(...) // register the DataSet under the name "Orders" tableEnv.registerDataSet("Orders", ds, 'user, 'product, 'amount) // run a SQL query on the Table and retrieve the result as a new Table val result = tableEnv.sql( "SELECT SUM(amount) FROM Orders WHERE product LIKE '%Rubber%'") {% endhighlight %}

Limitations

The current version supports selection (filter), projection, inner equi-joins, grouping, non-distinct aggregates, and sorting on batch tables.

Among others, the following SQL features are not supported, yet:

  • Timestamps and intervals are limited to milliseconds precision
  • Interval arithmetic is currenly limited
  • Distinct aggregates (e.g., COUNT(DISTINCT name))
  • Non-equi joins and Cartesian products
  • Grouping sets

Note: Tables are joined in the order in which they are specified in the FROM clause. In some cases the table order must be manually tweaked to resolve Cartesian products.

SQL on Streaming Tables

SQL queries can be executed on streaming Tables (Tables backed by DataStream or StreamTableSource) like standard SQL.

{% highlight java %} StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

// ingest a DataStream from an external source DataStream<Tuple3<Long, String, Integer>> ds = env.addSource(...); // register the DataStream as table "Orders" tableEnv.registerDataStream("Orders", ds, "user, product, amount"); // run a SQL query on the Table and retrieve the result as a new Table Table result = tableEnv.sql( "SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'"); {% endhighlight %}

{% highlight scala %} val env = StreamExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(env)

// read a DataStream from an external source val ds: DataStream[(Long, String, Integer)] = env.addSource(...) // register the DataStream under the name "Orders" tableEnv.registerDataStream("Orders", ds, 'user, 'product, 'amount) // run a SQL query on the Table and retrieve the result as a new Table val result = tableEnv.sql( "SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'") {% endhighlight %}

Limitations

The current version of streaming SQL only supports SELECT, FROM, WHERE, and UNION clauses. Aggregations or joins are not supported yet.

{% top %}

SQL Syntax

Flink uses Apache Calcite for SQL parsing. Currently, Flink SQL only supports query-related SQL syntax and only a subset of the comprehensive SQL standard. The following BNF-grammar describes the supported SQL features:


query:
  values
  | {
      select
      | selectWithoutFrom
      | query UNION [ ALL ] query
      | query EXCEPT query
      | query INTERSECT query
    }
    [ ORDER BY orderItem [, orderItem ]* ]
    [ LIMIT { count | ALL } ]
    [ OFFSET start { ROW | ROWS } ]
    [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY]

orderItem:
  expression [ ASC | DESC ]

select:
  SELECT [ ALL | DISTINCT ]
  { * | projectItem [, projectItem ]* }
  FROM tableExpression
  [ WHERE booleanExpression ]
  [ GROUP BY { groupItem [, groupItem ]* } ]
  [ HAVING booleanExpression ]

selectWithoutFrom:
  SELECT [ ALL | DISTINCT ]
  { * | projectItem [, projectItem ]* }

projectItem:
  expression [ [ AS ] columnAlias ]
  | tableAlias . *

tableExpression:
  tableReference [, tableReference ]*
  | tableExpression [ NATURAL ] [ LEFT | RIGHT | FULL ] JOIN tableExpression [ joinCondition ]

joinCondition:
  ON booleanExpression
  | USING '(' column [, column ]* ')'

tableReference:
  tablePrimary
  [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]

tablePrimary:
  [ TABLE ] [ [ catalogName . ] schemaName . ] tableName

values:
  VALUES expression [, expression ]*

groupItem:
  expression
  | '(' ')'
  | '(' expression [, expression ]* ')'

For a better definition of SQL queries within a Java String, Flink SQL uses a lexical policy similar to Java:

  • The case of identifiers is preserved whether or not they are quoted.
  • After which, identifiers are matched case-sensitively.
  • Unlike Java, back-ticks allow identifiers to contain non-alphanumeric characters (e.g. "SELECT a AS my field FROM t").

{% top %}

Reserved Keywords

Although not every SQL feature is implemented yet, some string combinations are already reserved as keywords for future use. If you want to use one of the following strings as a field name, make sure to surround them with backticks (e.g. `value`, `count`).

{% highlight sql %}

A, ABS, ABSOLUTE, ACTION, ADA, ADD, ADMIN, AFTER, ALL, ALLOCATE, ALLOW, ALTER, ALWAYS, AND, ANY, ARE, ARRAY, AS, ASC, ASENSITIVE, ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTRIBUTE, ATTRIBUTES, AUTHORIZATION, AVG, BEFORE, BEGIN, BERNOULLI, BETWEEN, BIGINT, BINARY, BIT, BLOB, BOOLEAN, BOTH, BREADTH, BY, C, CALL, CALLED, CARDINALITY, CASCADE, CASCADED, CASE, CAST, CATALOG, CATALOG_NAME, CEIL, CEILING, CENTURY, CHAIN, CHAR, CHARACTER, CHARACTERISTICTS, CHARACTERS, CHARACTER_LENGTH, CHARACTER_SET_CATALOG, CHARACTER_SET_NAME, CHARACTER_SET_SCHEMA, CHAR_LENGTH, CHECK, CLASS_ORIGIN, CLOB, CLOSE, COALESCE, COBOL, COLLATE, COLLATION, COLLATION_CATALOG, COLLATION_NAME, COLLATION_SCHEMA, COLLECT, COLUMN, COLUMN_NAME, COMMAND_FUNCTION, COMMAND_FUNCTION_CODE, COMMIT, COMMITTED, CONDITION, CONDITION_NUMBER, CONNECT, CONNECTION, CONNECTION_NAME, CONSTRAINT, CONSTRAINTS, CONSTRAINT_CATALOG, CONSTRAINT_NAME, CONSTRAINT_SCHEMA, CONSTRUCTOR, CONTAINS, CONTINUE, CONVERT, CORR, CORRESPONDING, COUNT, COVAR_POP, COVAR_SAMP, CREATE, CROSS, CUBE, CUME_DIST, CURRENT, CURRENT_CATALOG, CURRENT_DATE, CURRENT_DEFAULT_TRANSFORM_GROUP, CURRENT_PATH, CURRENT_ROLE, CURRENT_SCHEMA, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_TRANSFORM_GROUP_FOR_TYPE, CURRENT_USER, CURSOR, CURSOR_NAME, CYCLE, DATA, DATABASE, DATE, DATETIME_INTERVAL_CODE, DATETIME_INTERVAL_PRECISION, DAY, DEALLOCATE, DEC, DECADE, DECIMAL, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE, DEFERRED, DEFINED, DEFINER, DEGREE, DELETE, DENSE_RANK, DEPTH, DEREF, DERIVED, DESC, DESCRIBE, DESCRIPTION, DESCRIPTOR, DETERMINISTIC, DIAGNOSTICS, DISALLOW, DISCONNECT, DISPATCH, DISTINCT, DOMAIN, DOUBLE, DOW, DOY, DROP, DYNAMIC, DYNAMIC_FUNCTION, DYNAMIC_FUNCTION_CODE, EACH, ELEMENT, ELSE, END, END-EXEC, EPOCH, EQUALS, ESCAPE, EVERY, EXCEPT, EXCEPTION, EXCLUDE, EXCLUDING, EXEC, EXECUTE, EXISTS, EXP, EXPLAIN, EXTEND, EXTERNAL, EXTRACT, FALSE, FETCH, FILTER, FINAL, FIRST, FIRST_VALUE, FLOAT, FLOOR, FOLLOWING, FOR, FOREIGN, FORTRAN, FOUND, FRAC_SECOND, FREE, FROM, FULL, FUNCTION, FUSION, G, GENERAL, GENERATED, GET, GLOBAL, GO, GOTO, GRANT, GRANTED, GROUP, GROUPING, HAVING, HIERARCHY, HOLD, HOUR, IDENTITY, IMMEDIATE, IMPLEMENTATION, IMPORT, IN, INCLUDING, INCREMENT, INDICATOR, INITIALLY, INNER, INOUT, INPUT, INSENSITIVE, INSERT, INSTANCE, INSTANTIABLE, INT, INTEGER, INTERSECT, INTERSECTION, INTERVAL, INTO, INVOKER, IS, ISOLATION, JAVA, JOIN, K, KEY, KEY_MEMBER, KEY_TYPE, LABEL, LANGUAGE, LARGE, LAST, LAST_VALUE, LATERAL, LEADING, LEFT, LENGTH, LEVEL, LIBRARY, LIKE, LIMIT, LN, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATOR, LOWER, M, MAP, MATCH, MATCHED, MAX, MAXVALUE, MEMBER, MERGE, MESSAGE_LENGTH, MESSAGE_OCTET_LENGTH, MESSAGE_TEXT, METHOD, MICROSECOND, MILLENNIUM, MIN, MINUTE, MINVALUE, MOD, MODIFIES, MODULE, MONTH, MORE, MULTISET, MUMPS, NAME, NAMES, NATIONAL, NATURAL, NCHAR, NCLOB, NESTING, NEW, NEXT, NO, NONE, NORMALIZE, NORMALIZED, NOT, NULL, NULLABLE, NULLIF, NULLS, NUMBER, NUMERIC, OBJECT, OCTETS, OCTET_LENGTH, OF, OFFSET, OLD, ON, ONLY, OPEN, OPTION, OPTIONS, OR, ORDER, ORDERING, ORDINALITY, OTHERS, OUT, OUTER, OUTPUT, OVER, OVERLAPS, OVERLAY, OVERRIDING, PAD, PARAMETER, PARAMETER_MODE, PARAMETER_NAME, PARAMETER_ORDINAL_POSITION, PARAMETER_SPECIFIC_CATALOG, PARAMETER_SPECIFIC_NAME, PARAMETER_SPECIFIC_SCHEMA, PARTIAL, PARTITION, PASCAL, PASSTHROUGH, PATH, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, PLACING, PLAN, PLI, POSITION, POWER, PRECEDING, PRECISION, PREPARE, PRESERVE, PRIMARY, PRIOR, PRIVILEGES, PROCEDURE, PUBLIC, QUARTER, RANGE, RANK, READ, READS, REAL, RECURSIVE, REF, REFERENCES, REFERENCING, REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX, REGR_SXY, REGR_SYY, RELATIVE, RELEASE, REPEATABLE, RESET, RESTART, RESTRICT, RESULT, RETURN, RETURNED_CARDINALITY, RETURNED_LENGTH, RETURNED_OCTET_LENGTH, RETURNED_SQLSTATE, RETURNS, REVOKE, RIGHT, ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINE_CATALOG, ROUTINE_NAME, ROUTINE_SCHEMA, ROW, ROWS, ROW_COUNT, ROW_NUMBER, SAVEPOINT, SCALE, SCHEMA, SCHEMA_NAME, SCOPE, SCOPE_CATALOGS, SCOPE_NAME, SCOPE_SCHEMA, SCROLL, SEARCH, SECOND, SECTION, SECURITY, SELECT, SELF, SENSITIVE, SEQUENCE, SERIALIZABLE, SERVER, SERVER_NAME, SESSION, SESSION_USER, SET, SETS, SIMILAR, SIMPLE, SIZE, SMALLINT, SOME, SOURCE, SPACE, SPECIFIC, SPECIFICTYPE, SPECIFIC_NAME, SQL, SQLEXCEPTION, SQLSTATE, SQLWARNING, SQL_TSI_DAY, SQL_TSI_FRAC_SECOND, SQL_TSI_HOUR, SQL_TSI_MICROSECOND, SQL_TSI_MINUTE, SQL_TSI_MONTH, SQL_TSI_QUARTER, SQL_TSI_SECOND, SQL_TSI_WEEK, SQL_TSI_YEAR, SQRT, START, STATE, STATEMENT, STATIC, STDDEV_POP, STDDEV_SAMP, STREAM, STRUCTURE, STYLE, SUBCLASS_ORIGIN, SUBMULTISET, SUBSTITUTE, SUBSTRING, SUM, SYMMETRIC, SYSTEM, SYSTEM_USER, TABLE, TABLESAMPLE, TABLE_NAME, TEMPORARY, THEN, TIES, TIME, TIMESTAMP, TIMESTAMPADD, TIMESTAMPDIFF, TIMEZONE_HOUR, TIMEZONE_MINUTE, TINYINT, TO, TOP_LEVEL_COUNT, TRAILING, TRANSACTION, TRANSACTIONS_ACTIVE, TRANSACTIONS_COMMITTED, TRANSACTIONS_ROLLED_BACK, TRANSFORM, TRANSFORMS, TRANSLATE, TRANSLATION, TREAT, TRIGGER, TRIGGER_CATALOG, TRIGGER_NAME, TRIGGER_SCHEMA, TRIM, TRUE, TYPE, UESCAPE, UNBOUNDED, UNCOMMITTED, UNDER, UNION, UNIQUE, UNKNOWN, UNNAMED, UNNEST, UPDATE, UPPER, UPSERT, USAGE, USER, USER_DEFINED_TYPE_CATALOG, USER_DEFINED_TYPE_CODE, USER_DEFINED_TYPE_NAME, USER_DEFINED_TYPE_SCHEMA, USING, VALUE, VALUES, VARBINARY, VARCHAR, VARYING, VAR_POP, VAR_SAMP, VERSION, VIEW, WEEK, WHEN, WHENEVER, WHERE, WIDTH_BUCKET, WINDOW, WITH, WITHIN, WITHOUT, WORK, WRAPPER, WRITE, XML, YEAR, ZONE

{% endhighlight %}

{% top %}

Data Types

The Table API is built on top of Flink's DataSet and DataStream API. Internally, it also uses Flink's TypeInformation to distinguish between types. The Table API does not support all Flink types so far. All supported simple types are listed in org.apache.flink.api.table.Types. The following table summarizes the relation between Table API types, SQL types, and the resulting Java class.

Table API SQL Java type
Types.STRING VARCHAR java.lang.String
Types.BOOLEAN BOOLEAN java.lang.Boolean
Types.BYTE TINYINT java.lang.Byte
Types.SHORT SMALLINT java.lang.Short
Types.INT INTEGER, INT java.lang.Integer
Types.LONG BIGINT java.lang.Long
Types.FLOAT REAL, FLOAT java.lang.Float
Types.DOUBLE DOUBLE java.lang.Double
Types.DECIMAL DECIMAL java.math.BigDecimal
Types.DATE DATE java.sql.Date
Types.TIME TIME java.sql.Time
Types.TIMESTAMP TIMESTAMP(3) java.sql.Timestamp
Types.INTERVAL_MONTHS INTERVAL YEAR TO MONTH java.lang.Integer
Types.INTERVAL_MILLIS INTERVAL DAY TO SECOND(3) java.lang.Long

Advanced types such as generic types, composite types (e.g. POJOs or Tuples), and array types (object or primitive arrays) can be fields of a row.

Generic types are treated as a black box within Table API and SQL yet.

Composite types, however, are fully supported types where fields of a composite type can be accessed using the .get() operator in Table API and dot operator (e.g. MyTable.pojoColumn.myField) in SQL. Composite types can also be flattened using .flatten() in Table API or MyTable.pojoColumn.* in SQL.

Array types can be accessed using the myArray.at(1) operator in Table API and myArray[1] operator in SQL. Array literals can be created using array(1, 2, 3) in Table API and ARRAY[1, 2, 3] in SQL.

{% top %}

Built-in Functions

Both the Table API and SQL come with a set of built-in functions for data transformations. This section gives a brief overview of the available functions so far.

<tr>
  <td>
    {% highlight java %}

ANY.isNull {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

ANY.isNotNull {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

BOOLEAN.isTrue {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

BOOLEAN.isFalse {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

BOOLEAN.isNotTrue {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

BOOLEAN.isNotFalse {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.exp() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.log10() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.ln() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.power(NUMERIC) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.sqrt() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.abs() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.floor() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.ceil() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.substring(INT, INT) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.substring(INT) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.trim(LEADING, STRING) STRING.trim(TRAILING, STRING) STRING.trim(BOTH, STRING) STRING.trim(BOTH) STRING.trim() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.charLength() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.upperCase() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.lowerCase() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.initCap() {% endhighlight %}

  <td>
    <p>Converts the initial letter of each word in a string to uppercase. Assumes a string containing only [A-Za-z0-9], everything else is treated as whitespace.</p>
  </td>
</tr>

<tr>
  <td>
    {% highlight java %}

STRING.like(STRING) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.similar(STRING) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.position(STRING) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.overlay(STRING, INT) STRING.overlay(STRING, INT, INT) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.toDate() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.toTime() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

STRING.toTimestamp() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.year {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.month {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.day {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.hour {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.minute {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.second {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.milli {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

TEMPORAL.extract(TIMEINTERVALUNIT) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

DATE.quarter() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

TIMEPOINT.floor(TIMEINTERVALUNIT) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

TIMEPOINT.ceil(TIMEINTERVALUNIT) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

currentDate() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

currentTime() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

currentTimestamp() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

localTime() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

localTimestamp() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

temporalOverlaps(TIMEPOINT, TEMPORAL, TIMEPOINT, TEMPORAL) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

NUMERIC.rows {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

ANY.flatten() {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

COMPOSITE.get(STRING) COMPOSITE.get(INT) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

ARRAY.at(INT) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

array(ANY [, ANY ]*) {% endhighlight %}

<tr>
  <td>
    {% highlight java %}

ARRAY.cardinality() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

ARRAY.element() {% endhighlight %}

Function Description
{% highlight java %} ANY.as(name [, name ]* ) {% endhighlight %}

Specifies a name for an expression i.e. a field. Additional names can be specified if the expression expands to multiple fields.

Returns true if the given expression is null.

Returns true if the given expression is not null.

Returns true if the given boolean expression is true. False otherwise (for null and false).

Returns true if given boolean expression is false. False otherwise (for null and true).

Returns true if the given boolean expression is not true (for null and false). False otherwise.

Returns true if given boolean expression is not false (for null and true). False otherwise.

Calculates the Euler's number raised to the given power.

Calculates the base 10 logarithm of given value.

Calculates the natural logarithm of given value.

Calculates the given number raised to the power of the other value.

Calculates the square root of a given value.

Calculates the absolute value of given value.

Calculates the largest integer less than or equal to a given number.

Calculates the smallest integer greater than or equal to a given number.

Creates a substring of the given string at the given index for the given length. The index starts at 1 and is inclusive, i.e., the character at the index is included in the substring. The substring has the specified length or less.

Creates a substring of the given string beginning at the given index to the end. The start index starts at 1 and is inclusive.

Removes leading and/or trailing characters from the given string. By default, whitespaces at both sides are removed.

Returns the length of a String.

Returns all of the characters in a string in upper case using the rules of the default locale.

Returns all of the characters in a string in lower case using the rules of the default locale.

Returns true, if a string matches the specified LIKE pattern. E.g. "Jo_n%" matches all strings that start with "Jo(arbitrary letter)n".

Returns true, if a string matches the specified SQL regex pattern. E.g. "A+" matches all strings that consist of at least one "A".

Returns the position of string in an other string starting at 1. Returns 0 if string could not be found. E.g. 'a'.position('bbbbba') leads to 6.

Replaces a substring of string with a string starting at a position (starting at 1). An optional length specifies how many characters should be removed. E.g. 'xxxxxtest'.overlay('xxxx', 6) leads to "xxxxxxxxx", 'xxxxxtest'.overlay('xxxx', 6, 2) leads to "xxxxxxxxxst".

Parses a date string in the form "yy-mm-dd" to a SQL date.

Parses a time string in the form "hh:mm:ss" to a SQL time.

Parses a timestamp string in the form "yy-mm-dd hh:mm:ss.fff" to a SQL timestamp.

Creates an interval of months for a given number of years.

Creates an interval of months for a given number of months.

Creates an interval of milliseconds for a given number of days.

Creates an interval of milliseconds for a given number of hours.

Creates an interval of milliseconds for a given number of minutes.

Creates an interval of milliseconds for a given number of seconds.

Creates an interval of milliseconds.

Extracts parts of a time point or time interval. Returns the part as a long value. E.g. '2006-06-05'.toDate.extract(DAY) leads to 5.

Returns the quarter of a year from a SQL date. E.g. '1994-09-27'.toDate.quarter() leads to 3.

Rounds a time point down to the given unit. E.g. '12:44:31'.toDate.floor(MINUTE) leads to 12:44:00.

Rounds a time point up to the given unit. E.g. '12:44:31'.toTime.floor(MINUTE) leads to 12:45:00.

Returns the current SQL date in UTC time zone.

Returns the current SQL time in UTC time zone.

Returns the current SQL timestamp in UTC time zone.

Returns the current SQL time in local time zone.

Returns the current SQL timestamp in local time zone.

Determines whether two anchored time intervals overlap. Time point and temporal are transformed into a range defined by two time points (start, end). The function evaluates leftEnd >= rightStart && rightEnd >= leftStart. E.g. temporalOverlaps("2:55:00".toTime, 1.hour, "3:30:00".toTime, 2.hour) leads to true.

Creates an interval of rows.

Converts a Flink composite type (such as Tuple, POJO, etc.) and all of its direct subtypes into a flat representation where every subtype is a separate field. In most cases the fields of the flat representation are named similarly to the original fields but with a dollar separator (e.g. mypojo$mytuple$f0).

Accesses the field of a Flink composite type (such as Tuple, POJO, etc.) by index or name and returns it's value. E.g. pojo.get('myField') or tuple.get(0).

Returns the element at a particular position in an array. The index starts at 1.

Creates an array from a list of values. The array will be an array of objects (not primitives).

Returns the number of elements of an array.

Returns the sole element of an array with a single element. Returns null if the array is empty. Throws an exception if the array has more than one element.

<tr>
  <td>
    {% highlight scala %}

ANY.isNull {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

ANY.isNotNull {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

BOOLEAN.isTrue {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

BOOLEAN.isFalse {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

BOOLEAN.isNotTrue {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

BOOLEAN.isNotFalse {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.exp() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.log10() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.ln() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.power(NUMERIC) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.sqrt() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.abs() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.floor() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.ceil() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.substring(INT, INT) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.substring(INT) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.trim( leading = true, trailing = true, character = " ") {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.charLength() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.upperCase() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.lowerCase() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.initCap() {% endhighlight %}

  <td>
    <p>Converts the initial letter of each word in a string to uppercase. Assumes a string containing only [A-Za-z0-9], everything else is treated as whitespace.</p>
  </td>
</tr>

<tr>
  <td>
    {% highlight scala %}

STRING.like(STRING) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.similar(STRING) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.position(STRING) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.overlay(STRING, INT) STRING.overlay(STRING, INT, INT) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.toDate {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.toTime {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

STRING.toTimestamp {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.year {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.month {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.day {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.hour {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.minute {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.second {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.milli {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

TEMPORAL.extract(TimeIntervalUnit) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

DATE.quarter() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

TIMEPOINT.floor(TimeIntervalUnit) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

TIMEPOINT.ceil(TimeIntervalUnit) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

currentDate() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

currentTime() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

currentTimestamp() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

localTime() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

localTimestamp() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

temporalOverlaps(TIMEPOINT, TEMPORAL, TIMEPOINT, TEMPORAL) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

NUMERIC.rows {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

ANY.flatten() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

COMPOSITE.get(STRING) COMPOSITE.get(INT) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

ARRAY.at(INT) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

array(ANY [, ANY ]*) {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

ARRAY.cardinality() {% endhighlight %}

<tr>
  <td>
    {% highlight scala %}

ARRAY.element() {% endhighlight %}

Function Description
{% highlight scala %} ANY.as(name [, name ]* ) {% endhighlight %}

Specifies a name for an expression i.e. a field. Additional names can be specified if the expression expands to multiple fields.

Returns true if the given expression is null.

Returns true if the given expression is not null.

Returns true if the given boolean expression is true. False otherwise (for null and false).

Returns true if given boolean expression is false. False otherwise (for null and true).

Returns true if the given boolean expression is not true (for null and false). False otherwise.

Returns true if given boolean expression is not false (for null and true). False otherwise.

Calculates the Euler's number raised to the given power.

Calculates the base 10 logarithm of given value.

Calculates the natural logarithm of given value.

Calculates the given number raised to the power of the other value.

Calculates the square root of a given value.

Calculates the absolute value of given value.

Calculates the largest integer less than or equal to a given number.

Calculates the smallest integer greater than or equal to a given number.

Creates a substring of the given string at the given index for the given length. The index starts at 1 and is inclusive, i.e., the character at the index is included in the substring. The substring has the specified length or less.

Creates a substring of the given string beginning at the given index to the end. The start index starts at 1 and is inclusive.

Removes leading and/or trailing characters from the given string.

Returns the length of a String.

Returns all of the characters in a string in upper case using the rules of the default locale.

Returns all of the characters in a string in lower case using the rules of the default locale.

Returns true, if a string matches the specified LIKE pattern. E.g. "Jo_n%" matches all strings that start with "Jo(arbitrary letter)n".

Returns true, if a string matches the specified SQL regex pattern. E.g. "A+" matches all strings that consist of at least one "A".

Returns the position of string in an other string starting at 1. Returns 0 if string could not be found. E.g. "a".position("bbbbba") leads to 6.

Replaces a substring of string with a string starting at a position (starting at 1). An optional length specifies how many characters should be removed. E.g. "xxxxxtest".overlay("xxxx", 6) leads to "xxxxxxxxx", "xxxxxtest".overlay('xxxx', 6, 2) leads to "xxxxxxxxxst".

Parses a date string in the form "yy-mm-dd" to a SQL date.

Parses a time string in the form "hh:mm:ss" to a SQL time.

Parses a timestamp string in the form "yy-mm-dd hh:mm:ss.fff" to a SQL timestamp.

Creates an interval of months for a given number of years.

Creates an interval of months for a given number of months.

Creates an interval of milliseconds for a given number of days.

Creates an interval of milliseconds for a given number of hours.

Creates an interval of milliseconds for a given number of minutes.

Creates an interval of milliseconds for a given number of seconds.

Creates an interval of milliseconds.

Extracts parts of a time point or time interval. Returns the part as a long value. E.g. "2006-06-05".toDate.extract(TimeIntervalUnit.DAY) leads to 5.

Returns the quarter of a year from a SQL date. E.g. "1994-09-27".toDate.quarter() leads to 3.

Rounds a time point down to the given unit. E.g. "12:44:31".toTime.floor(TimeIntervalUnit.MINUTE) leads to 12:44:00.

Rounds a time point up to the given unit. E.g. "12:44:31".toTime.floor(TimeIntervalUnit.MINUTE) leads to 12:45:00.

Returns the current SQL date in UTC time zone.

Returns the current SQL time in UTC time zone.

Returns the current SQL timestamp in UTC time zone.

Returns the current SQL time in local time zone.

Returns the current SQL timestamp in local time zone.

Determines whether two anchored time intervals overlap. Time point and temporal are transformed into a range defined by two time points (start, end). The function evaluates leftEnd >= rightStart && rightEnd >= leftStart. E.g. temporalOverlaps('2:55:00'.toTime, 1.hour, '3:30:00'.toTime, 2.hours) leads to true.

Creates an interval of rows.

Converts a Flink composite type (such as Tuple, POJO, etc.) and all of its direct subtypes into a flat representation where every subtype is a separate field. In most cases the fields of the flat representation are named similarly to the original fields but with a dollar separator (e.g. mypojo$mytuple$f0).

Accesses the field of a Flink composite type (such as Tuple, POJO, etc.) by index or name and returns it's value. E.g. 'pojo.get("myField") or 'tuple.get(0).

Returns the element at a particular position in an array. The index starts at 1.

Creates an array from a list of values. The array will be an array of objects (not primitives).

Returns the number of elements of an array.

Returns the sole element of an array with a single element. Returns null if the array is empty. Throws an exception if the array has more than one element.

The Flink SQL functions (including their syntax) are a subset of Apache Calcite's built-in functions. Most of the documentation has been adopted from the Calcite SQL reference.

<tr>
  <td>
    {% highlight text %}

value1 <> value2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value1 > value2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value1 >= value2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value1 < value2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value1 <= value2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value IS NULL {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value IS NOT NULL {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value1 IS DISTINCT FROM value2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value1 IS NOT DISTINCT FROM value2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value1 BETWEEN [ASYMMETRIC | SYMMETRIC] value2 AND value3 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value1 NOT BETWEEN value2 AND value3 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

string1 LIKE string2 [ ESCAPE string3 ] {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

string1 NOT LIKE string2 [ ESCAPE string3 ] {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

string1 SIMILAR TO string2 [ ESCAPE string3 ] {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

string1 NOT SIMILAR TO string2 [ ESCAPE string3 ] {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value IN (value [, value]* ) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

value NOT IN (value [, value]* ) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

EXISTS (sub-query) {% endhighlight %}

Comparison functions Description
{% highlight text %} value1 = value2 {% endhighlight %}

Equals.

Not equal.

Greater than.

Greater than or equal.

Less than.

Less than or equal.

Whether value is null.

Whether value is not null.

Whether two values are not equal, treating null values as the same.

Whether two values are equal, treating null values as the same.

Whether value1 is greater than or equal to value2 and less than or equal to value3.

Whether value1 is less than value2 or greater than value3.

Whether string1 matches pattern string2. An escape character can be defined if necessary.

Whether string1 does not match pattern string2. An escape character can be defined if necessary.

Whether string1 matches regular expression string2. An escape character can be defined if necessary.

Whether string1 does not match regular expression string2. An escape character can be defined if necessary.

Whether value is equal to a value in a list.

Whether value is not equal to every value in a list.

Whether sub-query returns at least one row. Only supported if the operation can be rewritten in a join and group operation.

<tr>
  <td>
    {% highlight text %}

boolean1 AND boolean2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

NOT boolean {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

boolean IS FALSE {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

boolean IS NOT FALSE {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

boolean IS TRUE {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

boolean IS NOT TRUE {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

boolean IS UNKNOWN {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

boolean IS NOT UNKNOWN {% endhighlight %}

Logical functions Description
{% highlight text %} boolean1 OR boolean2 {% endhighlight %}

Whether boolean1 is TRUE or boolean2 is TRUE.

Whether boolean1 and boolean2 are both TRUE.

Whether boolean is not TRUE; returns UNKNOWN if boolean is UNKNOWN.

Whether boolean is FALSE; returns FALSE if boolean is UNKNOWN.

Whether boolean is not FALSE; returns TRUE if boolean is UNKNOWN.

Whether boolean is TRUE; returns FALSE if boolean is UNKNOWN.

Whether boolean is not TRUE; returns TRUE if boolean is UNKNOWN.

Whether boolean is UNKNOWN.

Whether boolean is not UNKNOWN.

<tr>
  <td>
    {% highlight text %}
  • numeric {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

numeric1 - numeric2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

numeric1 * numeric2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

numeric1 / numeric2 {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

POWER(numeric1, numeric2) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

ABS(numeric) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

MOD(numeric1, numeric2) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

SQRT(numeric) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

LN(numeric) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

LOG10(numeric) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

EXP(numeric) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

CEIL(numeric) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

FLOOR(numeric) {% endhighlight %}

Arithmetic functions Description
{% highlight text %} + numeric {% endhighlight %}

Returns numeric.

Returns negative numeric.

{% highlight text %}

numeric1 + numeric2 {% endhighlight %}

Returns numeric1 plus numeric2.

Returns numeric1 minus numeric2.

Returns numeric1 multiplied by numeric2.

Returns numeric1 divided by numeric2.

Returns numeric1 raised to the power of numeric2.

Returns the absolute value of numeric.

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative.

Returns the square root of numeric.

Returns the natural logarithm (base e) of numeric.

Returns the base 10 logarithm of numeric.

Returns e raised to the power of numeric.

Rounds numeric up, and returns the smallest number that is greater than or equal to numeric.

Rounds numeric down, and returns the largest number that is less than or equal to numeric.

<tr>
  <td>
    {% highlight text %}

CHAR_LENGTH(string) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

CHARACTER_LENGTH(string) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

UPPER(string) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

LOWER(string) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

POSITION(string1 IN string2) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

TRIM( { BOTH | LEADING | TRAILING } string1 FROM string2) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ]) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

SUBSTRING(string FROM integer) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

SUBSTRING(string FROM integer FOR integer) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

INITCAP(string) {% endhighlight %}

String functions Description
{% highlight text %} string || string {% endhighlight %}

Concatenates two character strings.

Returns the number of characters in a character string.

As CHAR_LENGTH(string).

Returns a character string converted to upper case.

Returns a character string converted to lower case.

Returns the position of the first occurrence of string1 in string2.

Removes leading and/or trailing characters from string2. By default, whitespaces at both sides are removed.

Replaces a substring of string1 with string2.

Returns a substring of a character string starting at a given point.

Returns a substring of a character string starting at a given point with a given length.

Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

<tr>
  <td>
    {% highlight text %}

CASE WHEN condition1 THEN result1 [ WHEN conditionN THEN resultN ]* [ ELSE resultZ ] END {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

NULLIF(value, value) {% endhighlight %}

<tr>
  <td>
    {% highlight text %}

COALESCE(value, value [, value ]* ) {% endhighlight %}

Conditional functions Description
{% highlight text %} CASE value WHEN value1 [, value11 ]* THEN result1 [ WHEN valueN [, valueN1 ]* THEN resultN ]* [ ELSE resultZ ] END {% endhighlight %}

Simple case.

Searched case.

Returns NULL if the values are the same. For example, NULLIF(5, 5) returns NULL; NULLIF(5, 0) returns 5.

Provides a value if the first value is null. For example, COALESCE(NULL, 5) returns 5.

Type conversion functions Description
{% highlight text %} CAST(value AS type) {% endhighlight %}

Converts a value to a given type.

Value constructor functions Description