Skip to content

Commit

Permalink
[FLINK-36705][table-common] Add initial ProcessTableFunction class an…
Browse files Browse the repository at this point in the history
…d annotations
  • Loading branch information
twalthr authored Nov 29, 2024
1 parent d7bfa77 commit 2897ab7
Show file tree
Hide file tree
Showing 12 changed files with 531 additions and 30 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@
* <p>An {@code ArgumentHint} can be used to provide hints about the name, optionality, and data
* type of argument.
*
* <p>{@code @ArgumentHint(name = "in1", type = @DataTypeHint("STRING"), isOptional = false)} is a
* scalar argument with the data type STRING, named "in1", and cannot be omitted when calling.
* <p>For example, {@code @ArgumentHint(name = "in1", type = @DataTypeHint("STRING"), isOptional =
* false)} is a scalar argument with the data type STRING, named "in1", and cannot be omitted when
* calling.
*
* @see FunctionHint
*/
Expand All @@ -49,7 +50,7 @@
ArgumentTrait[] value() default {ArgumentTrait.SCALAR};

/**
* The name of the argument.
* The name of the argument. It must be unique among other arguments.
*
* <p>This can be used to provide a descriptive name for the argument.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
package org.apache.flink.table.annotation;

import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.table.functions.ProcessTableFunction;
import org.apache.flink.table.types.inference.StaticArgumentTrait;

import java.util.Arrays;
Expand All @@ -43,31 +44,38 @@ public enum ArgumentTrait {

/**
* An argument that accepts a table "as row" (i.e. with row semantics). This trait only applies
* to {@code ProcessTableFunction} (PTF).
* to {@link ProcessTableFunction} (PTF).
*
* <p>For scalability, input tables are distributed into virtual processors. Each virtual
* processor executes a PTF instance and has access only to a share of the entire table. The
* argument declaration decides about the size of the share and co-location of data.
* <p>For scalability, input tables are distributed across so-called "virtual processors". A
* virtual processor, as defined by the SQL standard, executes a PTF instance and has access
* only to a portion of the entire table. The argument declaration decides about the size of the
* portion and co-location of data. Conceptually, tables can be processed either "as row" (i.e.
* with row semantics) or "as set" (i.e. with set semantics).
*
* <p>A table with row semantics assumes that there is no correlation between rows and each row
* can be processed independently. The framework is free in how to distribute rows among virtual
* processors and each virtual processor has access only to the currently processed row.
* can be processed independently. The framework is free in how to distribute rows across
* virtual processors and each virtual processor has access only to the currently processed row.
*/
TABLE_AS_ROW(StaticArgumentTrait.TABLE_AS_ROW),

/**
* An argument that accepts a table "as set" (i.e. with set semantics). This trait only applies
* to {@code ProcessTableFunction} (PTF).
* to {@link ProcessTableFunction} (PTF).
*
* <p>For scalability, input tables are distributed into virtual processors. Each virtual
* processor executes a PTF instance and has access only to a share of the entire table. The
* argument declaration decides about the size of the share and co-location of data.
* <p>For scalability, input tables are distributed across so-called "virtual processors". A
* virtual processor, as defined by the SQL standard, executes a PTF instance and has access
* only to a portion of the entire table. The argument declaration decides about the size of the
* portion and co-location of data. Conceptually, tables can be processed either "as row" (i.e.
* with row semantics) or "as set" (i.e. with set semantics).
*
* <p>A table with set semantics assumes that there is a correlation between rows. When calling
* the function, the PARTITION BY clause defines the columns for correlation. The framework
* ensures that all rows belonging to same set are co-located. A PTF instance is able to access
* all rows belonging to the same set. In other words: The virtual processor is scoped under a
* key context.
* all rows belonging to the same set. In other words: The virtual processor is scoped by a key
* context.
*
* <p>It is also possible not to provide a key ({@link #OPTIONAL_PARTITION_BY}), in which case
* only one virtual processor handles the entire table, thereby losing scalability benefits.
*/
TABLE_AS_SET(StaticArgumentTrait.TABLE_AS_SET),

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,7 @@

// Note to implementers:
// Because "null" is not supported as an annotation value. Every annotation parameter *must*
// have
// some representation for unknown values in order to merge multi-level annotations.
// have some representation for unknown values in order to merge multi-level annotations.

// --------------------------------------------------------------------------------------------
// Explicit data type specification
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
package org.apache.flink.table.annotation;

import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.table.functions.AggregateFunction;
import org.apache.flink.table.functions.ProcessTableFunction;
import org.apache.flink.table.functions.TableAggregateFunction;
import org.apache.flink.table.functions.UserDefinedFunction;
import org.apache.flink.table.types.inference.TypeInference;

Expand Down Expand Up @@ -175,13 +178,40 @@
ArgumentHint[] arguments() default {};

/**
* Explicitly defines the intermediate result type that a function uses as accumulator.
* Explicitly defines the intermediate result type (i.e. state entry) that an aggregating
* function uses as its accumulator. The entry is managed by the framework (usually via Flink's
* managed state).
*
* <p>By default, an explicit accumulator type is undefined and the reflection-based extraction
* is used.
*
* <p>This parameter is primarily intended for aggregating functions (i.e. {@link
* AggregateFunction} and {@link TableAggregateFunction}). It is recommended to use {@link
* #state()} for {@link ProcessTableFunction}.
*/
DataTypeHint accumulator() default @DataTypeHint();

/**
* Explicitly lists the intermediate results (i.e. state entries) of a function that is managed
* by the framework (i.e. Flink managed state). Including their names and data types.
*
* <p>State hints are primarily intended for {@link ProcessTableFunction}. A PTF supports
* multiple state entries at the beginning of an eval()/onTimer() method (after an optional
* context parameter).
*
* <p>Aggregating functions (i.e. {@link AggregateFunction} and {@link TableAggregateFunction})
* support a single state entry at the beginning of an accumulate()/retract() method (i.e. the
* accumulator).
*
* <p>By default, explicit state is undefined and the reflection-based extraction is used where
* {@link StateHint} is present.
*
* <p>Using both {@link #accumulator()} and this parameter is not allowed. Specifying the list
* of state entries manually disables the entire reflection-based extraction around {@link
* StateHint} and accumulators for aggregating functions.
*/
StateHint[] state() default {};

/**
* Explicitly defines the result type that a function uses as output.
*
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.table.annotation;

import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.table.functions.AggregateFunction;
import org.apache.flink.table.functions.ProcessTableFunction;
import org.apache.flink.table.functions.TableAggregateFunction;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

/**
* A hint that declares an intermediate result (i.e. state entry) that is managed by the framework
* (i.e. Flink managed state).
*
* <p>State hints are primarily intended for {@link ProcessTableFunction}. A PTF supports multiple
* state entries at the beginning of an eval()/onTimer() method (after an optional context
* parameter).
*
* <p>Aggregating functions (i.e. {@link AggregateFunction} and {@link TableAggregateFunction})
* support a single state entry at the beginning of an accumulate()/retract() method (i.e. the
* accumulator).
*
* <p>For example, {@code @StateHint(name = "count", type = @DataTypeHint("BIGINT"))} is a state
* entry with the data type BIGINT named "count".
*
* <p>Note: Usually, a state entry is partitioned by a key and can not be accessed globally. The
* partitioning (or whether it is only a single partition) is defined by the corresponding function
* call.
*
* @see FunctionHint
*/
@PublicEvolving
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.TYPE, ElementType.METHOD, ElementType.PARAMETER})
public @interface StateHint {

/**
* The name of the state entry. It must be unique among other state entries.
*
* <p>This can be used to provide a descriptive name for the state entry. The name can be used
* for referencing the entry during clean up.
*/
String name() default "";

/**
* The data type hint for the state entry.
*
* <p>This can be used to provide additional information about the expected data type of the
* argument. The {@link DataTypeHint} annotation can be used to specify the data type explicitly
* or provide hints for the reflection-based extraction of the data type.
*/
DataTypeHint type() default @DataTypeHint();
}
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,7 @@ public enum FunctionKind {

TABLE_AGGREGATE,

PROCESS_TABLE,

OTHER
}
Loading

0 comments on commit 2897ab7

Please sign in to comment.