This section describes the cl_khr_fp16 extension. This extension adds support for half scalar and vector types as built-in types that can be used for arithmetic operations, conversions etc.
The list of built-in scalar, and vector data types defined in tables 6.1, and 6.2 are extended to include the following:
Type | Description |
---|---|
half2 |
A 2-component half-precision floating-point vector. |
half3 |
A 3-component half-precision floating-point vector. |
half4 |
A 4-component half-precision floating-point vector. |
half8 |
A 8-component half-precision floating-point vector. |
half16 |
A 16-component half-precision floating-point vector. |
The built-in vector data types for halfn
are also declared as appropriate
types in the OpenCL API (and header files) that can be used by an
application.
The following table describes the built-in vector data types for halfn
as
defined in the OpenCL C programming language and the corresponding data type
available to the application:
Type in OpenCL Language | API type for application |
---|---|
half2 |
cl_half2 |
half3 |
cl_half3 |
half4 |
cl_half4 |
half8 |
cl_half8 |
half16 |
cl_half16 |
The relational, equality, logical and logical unary operators described in
section 6.3 can be used with half
scalar and halfn
vector types and
shall produce a scalar int
and vector shortn
result respectively.
The OpenCL compiler accepts an h and H suffix on floating point literals, indicating the literal is typed as a half.
The implicit conversion rules specified in section 6.2.1 now include the
half
scalar and halfn
vector data types.
The explicit casts described in section 6.2.2 are extended to take a
half
scalar data type and a halfn
vector data type.
The explicit conversion functions described in section 6.2.3 are extended
to take a half
scalar data type and a halfn
vector data type.
The as_typen()
function for re-interpreting types as described in section
6.2.4.2 is extended to allow conversion-free casts between shortn
,
ushortn
, and halfn
scalar and vector data types.
The built-in math functions defined in table 6.8 (also listed below) are
extended to include appropriate versions of functions that take half
and
half{2|3|4|8|16}
as arguments and return values.
gentype
now also includes half
, half2
, half3
, half4
, half8
, and
half16
.
For any specific use of a function, the actual type has to be the same for all arguments and the return type.
Function | Description |
---|---|
gentype acos (gentype x) |
Arc cosine function. |
gentype acosh (gentype x) |
Inverse hyperbolic cosine. |
gentype acospi (gentype x) |
Compute acos (x) / {pi}. |
gentype asin (gentype x) |
Arc sine function. |
gentype asinh (gentype x) |
Inverse hyperbolic sine. |
gentype asinpi (gentype x) |
Compute asin (x) / {pi}. |
gentype atan (gentype y_over_x) |
Arc tangent function. |
gentype atan2 (gentype y, gentype x) |
Arc tangent of y / x. |
gentype atanh (gentype x) |
Hyperbolic arc tangent. |
gentype atanpi (gentype x) |
Compute atan (x) / {pi}. |
gentype atan2pi (gentype y, gentype x) |
Compute atan2 (y, x) / {pi}. |
gentype cbrt (gentype x) |
Compute cube-root. |
gentype ceil (gentype x) |
Round to integral value using the round to positive infinity rounding mode. |
gentype copysign (gentype x, gentype y) |
Returns x with its sign changed to match the sign of y. |
gentype cos (gentype x) |
Compute cosine. |
gentype cosh (gentype x) |
Compute hyperbolic cosine. |
gentype cospi (gentype x) |
Compute cos ({pi} x). |
gentype erfc (gentype x) |
Complementary error function. |
gentype erf (gentype x) |
Error function encountered in integrating the normal distribution. |
gentype exp (gentype x) |
Compute the base- e exponential of x. |
gentype exp2 (gentype x) |
Exponential base 2 function. |
gentype exp10 (gentype x) |
Exponential base 10 function. |
gentype expm1 (gentype x) |
Compute ex- 1.0. |
gentype fabs (gentype x) |
Compute absolute value of a floating-point number. |
gentype fdim (gentype x, gentype y) |
x - y if x > y, +0 if x is less than or equal to y. |
gentype floor (gentype x) |
Round to integral value using the round to negative infinity rounding mode. |
gentype fma (gentype a, gentype b, gentype c) |
Returns the correctly rounded floating-point representation of the sum of c with the infinitely precise product of a and b. Rounding of intermediate products shall not occur. Edge case behavior is per the IEEE 754-2008 standard. |
gentype fmax (gentype x, gentype y) |
Returns y if x < y, otherwise it returns x. If one argument is a NaN, fmax() returns the other argument. If both arguments are NaNs, fmax() returns a NaN. |
gentype fmin (gentype x, gentype y) |
Returns y if y < x, otherwise it returns x. If one argument is a NaN, fmin() returns the other argument. If both arguments are NaNs, fmin() returns a NaN. |
gentype fmod (gentype x, gentype y) |
Modulus. Returns x - y * trunc (x/y) . |
gentype fract (gentype x, {global} gentype *iptr) For OpenCL C 2.0 or with the gentype fract (gentype x, gentype *iptr) |
Returns fmin( x - floor (x), 0x1.ffcp-1f ). floor(x) is returned in iptr. |
halfn frexp (halfn x, {global} intn *exp) halfn frexp (halfn x, {local} intn *exp) halfn frexp (halfn x, {private} intn *exp) For OpenCL C 2.0 or with the halfn frexp (halfn x, intn *exp) |
Extract mantissa and exponent from x. For each component the mantissa returned is a float with magnitude in the interval [1/2, 1) or 0. Each component of x equals mantissa returned * 2exp. |
gentype hypot (gentype x, gentype y) |
Compute the value of the square root of x2+ y2 without undue overflow or underflow. |
intn ilogb (halfn x) |
Return the exponent as an integer value. |
halfn ldexp (halfn x, intn k) |
Multiply x by 2 to the power k. |
gentype lgamma (gentype x) halfn lgamma_r (halfn x, {global} intn *signp) halfn lgamma_r (halfn x, {local} intn *signp) halfn lgamma_r (halfn x, {private} intn *signp) For OpenCL C 2.0 or with the halfn lgamma_r (halfn x, intn *signp) |
Log gamma function. Returns the natural logarithm of the absolute value of the gamma function. The sign of the gamma function is returned in the signp argument of lgamma_r. |
gentype log (gentype x) |
Compute natural logarithm. |
gentype log2 (gentype x) |
Compute a base 2 logarithm. |
gentype log10 (gentype x) |
Compute a base 10 logarithm. |
gentype log1p (gentype x) |
Compute loge(1.0 + x) . |
gentype logb (gentype x) |
Compute the exponent of x, which is the integral part of logr|x|. |
gentype mad (gentype a, gentype b, gentype c) |
mad computes a * b + c. The function may compute a * b + c with reduced accuracy in the embedded profile. See the OpenCL SPIR-V Environment Specification for details. On some hardware the mad instruction may provide better performance than expanded computation of a * b + c. Note: For some usages, e.g. mad(a, b, -a*b), the half precision definition of mad() is loose enough that almost any result is allowed from mad() for some values of a and b. |
gentype maxmag (gentype x, gentype y) |
Returns x if |x| > |y|, y if |y| > |x|, otherwise fmax(x, y). |
gentype minmag (gentype x, gentype y) |
Returns x if |x| < |y|, y if |y| < |x|, otherwise fmin(x, y). |
gentype modf (gentype x, {global} gentype *iptr) For OpenCL C 2.0 or with the gentype modf (gentype x, gentype *iptr) |
Decompose a floating-point number. The modf function breaks the argument x into integral and fractional parts, each of which has the same sign as the argument. It stores the integral part in the object pointed to by iptr. |
halfn nan (ushortn nancode) |
Returns a quiet NaN. The nancode may be placed in the significand of the resulting NaN. |
gentype nextafter (gentype x, gentype y) |
Computes the next representable half-precision floating-point value following x in the direction of y. Thus, if y is less than x, nextafter() returns the largest representable floating-point number less than x. |
gentype pow (gentype x, gentype y) |
Compute x to the power y. |
halfn pown (halfn x, intn y) |
Compute x to the power y, where y is an integer. |
gentype powr (gentype x, gentype y) |
Compute x to the power y, where x is >= 0. |
gentype remainder (gentype x, gentype y) |
Compute the value r such that r = x - n*y, where n is the integer nearest the exact value of x/y. If there are two integers closest to x/y, n shall be the even one. If r is zero, it is given the same sign as x. |
halfn remquo (halfn x, halfn y, {global} intn *quo) halfn remquo (halfn x, halfn y, {local} intn *quo) halfn remquo (halfn x, halfn y, {private} intn *quo) For OpenCL C 2.0 or with the halfn remquo (halfn x, halfn y, intn *quo) |
The remquo function computes the value r such that r = x - k*y, where k is the integer nearest the exact value of x/y. If there are two integers closest to x/y, k shall be the even one. If r is zero, it is given the same sign as x. This is the same value that is returned by the remainder function. remquo also calculates the lower seven bits of the integral quotient x/y, and gives that value the same sign as x/y. It stores this signed value in the object pointed to by quo. |
gentype rint (gentype x) |
Round to integral value (using round to nearest even rounding mode) in floating-point format. Refer to section 7.1 for description of rounding modes. |
halfn rootn (halfn x, intn y) |
Compute x to the power 1/y. |
gentype round (gentype x) |
Return the integral value nearest to x rounding halfway cases away from zero, regardless of the current rounding direction. |
gentype rsqrt (gentype x) |
Compute inverse square root. |
gentype sin (gentype x) |
Compute sine. |
gentype sincos (gentype x, {global} gentype *cosval) For OpenCL C 2.0 or with the gentype sincos (gentype x, gentype *cosval) |
Compute sine and cosine of x. The computed sine is the return value and computed cosine is returned in cosval. |
gentype sinh (gentype x) |
Compute hyperbolic sine. |
gentype sinpi (gentype x) |
Compute sin ({pi} x). |
gentype sqrt (gentype x) |
Compute square root. |
gentype tan (gentype x) |
Compute tangent. |
gentype tanh (gentype x) |
Compute hyperbolic tangent. |
gentype tanpi (gentype x) |
Compute tan ({pi} x). |
gentype tgamma (gentype x) |
Compute the gamma function. |
gentype trunc (gentype x) |
Round to integral value using the round to zero rounding mode. |
The FP_FAST_FMA_HALF macro indicates whether the fma() family of functions are fast compared with direct code for half precision floating-point. If defined, the FP_FAST_FMA_HALF macro shall indicate that the fma() function generally executes about as fast as, or faster than, a multiply and an add of half operands.
The macro names given in the following list must use the values specified. These constant expressions are suitable for use in #if preprocessing directives.
#define HALF_DIG 3
#define HALF_MANT_DIG 11
#define HALF_MAX_10_EXP +4
#define HALF_MAX_EXP +16
#define HALF_MIN_10_EXP -4
#define HALF_MIN_EXP -13
#define HALF_RADIX 2
#define HALF_MAX 0x1.ffcp15h
#define HALF_MIN 0x1.0p-14h
#define HALF_EPSILON 0x1.0p-10h
The following table describes the built-in macro names given above in the OpenCL C programming language and the corresponding macro names available to the application.
Macro in OpenCL Language | Macro for application |
---|---|
|
{CL_HALF_DIG} |
|
{CL_HALF_MANT_DIG} |
|
{CL_HALF_MAX_10_EXP} |
|
{CL_HALF_MAX_EXP} |
|
{CL_HALF_MIN_10_EXP} |
|
{CL_HALF_MIN_EXP} |
|
{CL_HALF_RADIX} |
|
{CL_HALF_MAX} |
|
{CL_HALF_MIN} |
|
{CL_HALF_EPSILON} |
The following constants are also available.
They are of type half
and are accurate within the precision of the half
type.
Constant | Description |
---|---|
|
Value of e |
|
Value of log2e |
|
Value of log10e |
|
Value of loge2 |
|
Value of loge10 |
|
Value of {pi} |
|
Value of {pi} / 2 |
|
Value of {pi} / 4 |
|
Value of 1 / {pi} |
|
Value of 2 / {pi} |
|
Value of 2 / {sqrt}{pi} |
|
Value of {sqrt}2 |
|
Value of 1 / {sqrt}2 |
The built-in common functions defined in table 6.12 (also listed below)
are extended to include appropriate versions of functions that take half
and half{2|3|4|8|16}
as arguments and return values.
gentype now also includes half
, half2
, half3
, half4
, half8
and
half16
.
These are described below.
Function | Description |
---|---|
gentype clamp ( gentype clamp ( |
Returns fmin(fmax(x, minval), maxval). Results are undefined if minval > maxval. |
gentype degrees (gentype radians) |
Converts radians to degrees, |
gentype max (gentype x, gentype y) |
Returns y if x < y, otherwise it returns x. If x and y are infinite or NaN, the return values are undefined. |
gentype min (gentype x, gentype y) |
Returns y if y < x, otherwise it returns x. If x and y are infinite or NaN, the return values are undefined. |
gentype mix (gentype x, gentype y, gentype a) |
Returns the linear blend of x and y implemented as: x + (y - x) * a a must be a value in the range 0.0 … 1.0. If a is not in the range 0.0 … 1.0, the return values are undefined. Note: The half precision mix function can be implemented using contractions such as mad or fma. |
gentype radians (gentype degrees) |
Converts degrees to radians, i.e. ({pi} / 180) * degrees. |
gentype step (gentype edge, gentype x) |
Returns 0.0 if x < edge, otherwise it returns 1.0. |
gentype smoothstep ( gentype smoothstep ( |
Returns 0.0 if x <= edge0 and 1.0 if x >= edge1 and performs smooth Hermite interpolation between 0 and 1 when edge0 < x < edge1. This is useful in cases where you would want a threshold function with a smooth transition. This is equivalent to: gentype t; Results are undefined if edge0 >= edge1. Note: The half precision smoothstep function can be implemented using contractions such as mad or fma. |
gentype sign (gentype x) |
Returns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or -1.0 if x < 0. Returns 0.0 if x is a NaN. |
The built-in geometric functions defined in table 6.13 (also listed below)
are extended to include appropriate versions of functions that take half
and half{2|3|4}
as arguments and return values.
gentype now also includes half
, half2
, half3
and half4
.
These are described below.
Note: The half precision geometric functions can be implemented using contractions such as mad or fma.
Function | Description |
---|---|
half4 cross (half4 p0, half4 p1) |
Returns the cross product of p0.xyz and p1.xyz. The w component of the result will be 0.0. |
half dot (gentype p0, gentype p1) |
Compute the dot product of p0 and p1. |
half distance (gentype p0, gentype p1) |
Returns the distance between p0 and p1. This is calculated as length(p0 - p1). |
half length (gentype p) |
Return the length of vector x, i.e., |
gentype normalize (gentype p) |
Returns a vector in the same direction as p but with a length of 1. |
The scalar and vector relational functions described in table 6.14 are
extended to include versions that take half
, half2
, half3
, half4
,
half8
and half16
as arguments.
The relational and equality operators (<, <=, >, >=, !=, ==) can be used
with halfn
vector types and shall produce a vector shortn
result as
described in section 6.3.
The functions isequal, isnotequal, isgreater, isgreaterequal, isless, islessequal, islessgreater, isfinite, isinf, isnan, isnormal, isordered, isunordered and signbit shall return a 0 if the specified relation is false and a 1 if the specified relation is true for scalar argument types. These functions shall return a 0 if the specified relation is false and a -1 (i.e. all bits set) if the specified relation is true for vector argument types.
The relational functions isequal, isgreater, isgreaterequal, isless, islessequal, and islessgreater always return 0 if either argument is not a number (NaN). isnotequal returns 1 if one or both arguments are not a number (NaN) and the argument type is a scalar and returns -1 if one or both arguments are not a number (NaN) and the argument type is a vector.
The functions described in table 6.14 are extended to include the halfn
vector types.
Function | Description |
---|---|
int isequal (half x, half y) |
Returns the component-wise compare of x == y. |
int isnotequal (half x, half y) |
Returns the component-wise compare of x != y. |
int isgreater (half x, half y) |
Returns the component-wise compare of x > y. |
int isgreaterequal (half x, half y) |
Returns the component-wise compare of x >= y. |
int isless (half x, half y) |
Returns the component-wise compare of x < y. |
int islessequal (half x, half y) |
Returns the component-wise compare of x <= y. |
int islessgreater (half x, half y) |
Returns the component-wise compare of (x < y) || (x > y) . |
int isfinite (half) |
Test for finite value. |
int isinf (half) |
Test for infinity value (positive or negative) . |
int isnan (half) |
Test for a NaN. |
int isnormal (half) |
Test for a normal value. |
int isordered (half x, half y) |
Test if arguments are ordered. isordered() takes arguments x and y, and returns the result isequal(x, x) && isequal(y, y). |
int isunordered (half x, half y) |
Test if arguments are unordered. isunordered() takes arguments x and y, returning non-zero if x or y is a NaN, and zero otherwise. |
int signbit (half) |
Test for sign bit. The scalar version of the function returns a 1 if the sign bit in the half is set else returns 0. The vector version of the function returns the following for each component in halfn: -1 (i.e all bits set) if the sign bit in the half is set else returns 0. |
halfn bitselect (halfn a, halfn b, halfn c) |
Each bit of the result is the corresponding bit of a if the corresponding bit of c is 0. Otherwise it is the corresponding bit of b. |
halfn select (halfn a, halfn b, shortn c) |
For each component, |
The vector data load (vloadn) and store (vstoren) functions
described in table 6.13 (also listed below) are extended to include
versions that read or write half vector values.
The generic type gentype
is extended to include half
.
The generic type gentypen
is extended to include half2
, half3
,
half4
, half8
, and half16
.
Note: vload3 reads x, y, z components from address (p + (offset * 3)) into a 3-component vector and vstore3 writes x, y, z components from a 3-component vector to address (p + (offset * 3)).
Function | Description |
---|---|
gentypen vloadn(size_t offset, const {global} gentype *p) For OpenCL C 2.0 or with the gentypen vloadn(size_t offset, const gentype *p) |
Return sizeof (gentypen) bytes of data read from address (p + (offset * n)). If gentype is half, the read address computed as (p + (offset * n)) must be 16-bit aligned. |
void vstoren(gentypen data, size_t offset, {global} gentype *p) For OpenCL C 2.0 or with the void vstoren(gentypen data, size_t offset, gentype *p) |
Write sizeof (gentypen) bytes given by data to address (p + (offset * n)). If gentype is half, the write address computed as (p + (offset * n)) must be 16-bit aligned. |
The OpenCL C programming language implements the following functions that provide asynchronous copies between global and local memory and a prefetch from global memory.
The generic type gentype
is extended to include half
, half2
, half3
,
half4
, half8
, and half16
.
Function | Description |
---|---|
event_t async_work_group_copy ( event_t async_work_group_copy ( |
Perform an async copy of num_gentypes gentype elements from src to dst. The async copy is performed by all work-items in a work-group and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined. Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero. If event argument is not zero, the event object supplied in event argument will be returned. This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy. |
event_t async_work_group_strided_copy ( event_t async_work_group_strided_copy ( |
Perform an async gather of num_gentypes gentype elements from src to dst. The src_stride is the stride in elements for each gentype element read from src. The async gather is performed by all work-items in a work-group and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined. Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_strided_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero. If event argument is not zero, the event object supplied in event argument will be returned. This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy. The behavior of async_work_group_strided_copy is undefined if src_stride or dst_stride is 0, or if the src_stride or dst_stride values cause the src or dst pointers to exceed the upper bounds of the address space during the copy. |
void wait_group_events ( |
Wait for events that identify the async_work_group_copy operations to complete. The event objects specified in event_list will be released after the wait is performed. This function must be encountered by all work-items in a work-group executing the kernel with the same num_events and event objects specified in event_list; otherwise the results are undefined. |
void prefetch ( |
Prefetch num_gentypes * sizeof(gentype) bytes into the global cache. The prefetch instruction is applied to a work-item in a work-group and does not affect the functional behavior of the kernel. |
The image read and write functions defined in tables 6.23, 6.24 and
6.25 are extended to support image color values that are a half
type.
Function | Description |
---|---|
half4 read_imageh ( half4 read_imageh ( |
Use the coordinate (coord.x, coord.y) to do an element lookup in the 2D image object specified by image. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats, {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. The read_imageh calls that take integer coordinates must use a sampler
with filter mode set to Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined. |
half4 read_imageh ( half4 read_imageh ( |
Use the coordinate (coord.x, coord.y, coord.z) to do an elementlookup in the 3D image object specified by image. coord.w is ignored. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imagehreturns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. The read_imageh calls that take integer coordinates must use a sampler
with filter mode set to Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description are undefined. |
half4 read_imageh ( half4 read_imageh ( |
Use coord.xy to do an element lookup in the 2D image identified by coord.z in the 2D image array specified by image. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. The read_imageh calls that take integer coordinates must use a sampler
with filter mode set to Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined. |
half4 read_imageh ( half4 read_imageh ( |
Use coord to do an element lookup in the 1D image object specified by image. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. The read_imageh calls that take integer coordinates must use a sampler
with filter mode set to Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined. |
half4 read_imageh ( half4 read_imageh ( |
Use coord.x to do an element lookup in the 1D image identified by coord.y in the 1D image array specified by image. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. The read_imageh calls that take integer coordinates must use a sampler
with filter mode set to Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined. |
aQual in Table 6.24 refers to one of the access qualifiers. For sampler-less read functions this may be read_only or read_write.
Function | Description |
---|---|
half4 read_imageh ( |
Use the coordinate (coord.x, coord.y) to do an element lookup in the 2D image object specified by image. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined. |
half4 read_imageh ( |
Use the coordinate (coord.x, coord.y, coord.z) to do an element lookup in the 3D image object specified by image. coord.w is ignored. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description are undefined. |
half4 read_imageh ( |
Use coord.xy to do an element lookup in the 2D image identified by coord.z in the 2D image array specified by image. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined. |
half4 read_imageh ( half4 read_imageh ( |
Use coord to do an element lookup in the 1D image or 1D image buffer object specified by image. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined. |
half4 read_imageh ( |
Use coord.x to do an element lookup in the 2D image identified by coord.y in the 2D image array specified by image. read_imageh returns half precision floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}. read_imageh returns half precision floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}. read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}. Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined. |
aQual in Table 6.25 refers to one of the access qualifiers. For write functions this may be write_only or read_write.
Function | Description |
---|---|
void write_imageh ( |
Write color value to location specified by coord.xy in the 2D image specified by image. Appropriate data format conversion to the specified image format is done before writing the color value. x & y are considered to be unnormalized coordinates and must be in the range 0 … width - 1, and 0 … height - 1. write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}. The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with (x, y) coordinate values that are not in the range (0 … width - 1, 0 … height - 1) respectively, is undefined. |
void write_imageh ( |
Write color value to location specified by coord.xy in the 2D image identified by coord.z in the 2D image array specified by image. Appropriate data format conversion to the specified image format is done before writing the color value. coord.x, coord.y and coord.z are considered to be unnormalized coordinates and must be in the range 0 … image width - 1, 0 … image height - 1 and 0 … image number of layers - 1. write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}. The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with (x, y, z) coordinate values that are not in the range (0 … image width - 1, 0 … image height - 1, 0 … image number of layers - 1), respectively, is undefined. |
void write_imageh ( void write_imageh ( |
Write color value to location specified by coord in the 1D image or 1D image buffer object specified by image. Appropriate data format conversion to the specified image format is done before writing the color value. coord is considered to be unnormalized coordinates and must be in the range 0 … image width - 1. write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}. Appropriate data format conversion will be done to convert channel data from a floating-point value to actual data format in which the channels are stored. The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with coordinate values that is not in the range (0 … image width - 1), is undefined. |
void write_imageh ( |
Write color value to location specified by coord.x in the 1D image identified by coord.y in the 1D image array specified by image. Appropriate data format conversion to the specified image format is done before writing the color value. coord.x and coord.y are considered to be unnormalized coordinates and must be in the range 0 … image width - 1 and 0 … image number of layers - 1. write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}. Appropriate data format conversion will be done to convert channel data from a floating-point value to actual data format in which the channels are stored. The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with (x, y) coordinate values that are not in the range (0 … image width - 1, 0 … image number of layers - 1), respectively, is undefined. |
void write_imageh ( |
Write color value to location specified by coord.xyz in the 3D image object specified by image. Appropriate data format conversion to the specified image format is done before writing the color value. coord.x, coord.y and coord.z are considered to be unnormalized coordinates and must be in the range 0 … image width - 1, 0 … image height - 1 and 0 … image depth - 1. write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}. The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with (x, y, z) coordinate values that are not in the range (0 … image width - 1, 0 … image height - 1, 0 … image depth - 1), respectively, is undefined. Note: This built-in function is only available if the cl_khr_3d_image_writes extension is also supported by the device. |
The following table entry describes the additions to table 4.3, which allows applications to query the configuration information using {clGetDeviceInfo} for an OpenCL device that supports half precision floating-point.
Op-code | Return Type | Description |
---|---|---|
{CL_DEVICE_HALF_FP_CONFIG} |
{cl_device_fp_config_TYPE} |
Describes half precision floating-point capability of the OpenCL device. This is a bit-field that describes one or more of the following values: {CL_FP_DENORM} — denorms are supported {CL_FP_INF_NAN} — INF and NaNs are supported {CL_FP_ROUND_TO_NEAREST} — round to nearest even rounding mode supported {CL_FP_ROUND_TO_ZERO} — round to zero rounding mode supported {CL_FP_ROUND_TO_INF} — round to positive and negative infinity rounding modes supported {CL_FP_FMA} — IEEE754-2008 fused multiply-add is supported {CL_FP_SOFT_FLOAT} — Basic floating-point operations (such as addition, subtraction, multiplication) are implemented in software. The required minimum half precision floating-point capability as implemented by this extension is: {CL_FP_ROUND_TO_ZERO}, or {CL_FP_ROUND_TO_NEAREST} | {CL_FP_INF_NAN}. |
If {CL_FP_ROUND_TO_NEAREST} is supported, the default rounding mode for half-precision floating-point operations will be round to nearest even; otherwise the default rounding mode will be round to zero.
Conversions to half floating point format must be correctly rounded using
the indicated convert
operator rounding mode or the default rounding mode
for half-precision floating-point operations if no rounding mode is
specified by the operator, or a C-style cast is used.
Conversions from half to integer format shall correctly round using the
indicated convert
operator rounding mode, or towards zero if no rounding
mode is specified by the operator or a C-style cast is used.
All conversions from half to floating point formats are exact.
In this section we discuss the maximum relative error defined as ulp (units in the last place).
Addition, subtraction, multiplication, fused multiply-add operations on half types are required to be correctly rounded using the default rounding mode for half-precision floating-point operations.
The following table describes the minimum accuracy of half precision floating-point arithmetic operations given as ULP values. 0 ULP is used for math functions that do not require rounding. The reference value used to compute the ULP value of an arithmetic operation is the infinitely precise result.
Function | Min Accuracy - Full Profile | Min Accuracy - Embedded Profile |
---|---|---|
x + y |
Correctly rounded |
Correctly rounded |
x - y |
Correctly rounded |
Correctly rounded |
x * y |
Correctly rounded |
Correctly rounded |
1.0 / x |
Correctly rounded |
<= 1 ulp |
x / y |
Correctly rounded |
<= 1 ulp |
acos |
<= 2 ulp |
<= 3 ulp |
acosh |
<= 2 ulp |
<= 3 ulp |
acospi |
<= 2 ulp |
<= 3 ulp |
asin |
<= 2 ulp |
<= 3 ulp |
asinh |
<= 2 ulp |
<= 3 ulp |
asinpi |
<= 2 ulp |
<= 3 ulp |
atan |
<= 2 ulp |
<= 3 ulp |
atanh |
<= 2 ulp |
<= 3 ulp |
atanpi |
<= 2 ulp |
<= 3 ulp |
atan2 |
<= 2 ulp |
<= 3 ulp |
atan2pi |
<= 2 ulp |
<= 3 ulp |
cbrt |
<= 2 ulp |
<= 2 ulp |
ceil |
Correctly rounded |
Correctly rounded |
clamp |
0 ulp |
0 ulp |
copysign |
0 ulp |
0 ulp |
cos |
<= 2 ulp |
<= 2 ulp |
cosh |
<= 2 ulp |
<= 3 ulp |
cospi |
<= 2 ulp |
<= 2 ulp |
cross |
absolute error tolerance of 'max * max * (3 * HLF_EPSILON)' per vector component, where max is the maximum input operand magnitude |
Implementation-defined |
degrees |
<= 2 ulp |
<= 2 ulp |
distance |
<= 2n ulp, for gentype with vector width n |
Implementation-defined |
dot |
absolute error tolerance of 'max * max * (2n - 1) * HLF_EPSILON', for vector width n and maximum input operand magnitude max across all vector components |
Implementation-defined |
erfc |
<= 4 ulp |
<= 4 ulp |
erf |
<= 4 ulp |
<= 4 ulp |
exp |
<= 2 ulp |
<= 3 ulp |
exp2 |
<= 2 ulp |
<= 3 ulp |
exp10 |
<= 2 ulp |
<= 3 ulp |
expm1 |
<= 2 ulp |
<= 3 ulp |
fabs |
0 ulp |
0 ulp |
fdim |
Correctly rounded |
Correctly rounded |
floor |
Correctly rounded |
Correctly rounded |
fma |
Correctly rounded |
Correctly rounded |
fmax |
0 ulp |
0 ulp |
fmin |
0 ulp |
0 ulp |
fmod |
0 ulp |
0 ulp |
fract |
Correctly rounded |
Correctly rounded |
frexp |
0 ulp |
0 ulp |
hypot |
<= 2 ulp |
<= 3 ulp |
ilogb |
0 ulp |
0 ulp |
ldexp |
Correctly rounded |
Correctly rounded |
length |
<= 0.25 + 0.5n ulp, for gentype with vector width n |
Implementation-defined |
log |
<= 2 ulp |
<= 3 ulp |
log2 |
<= 2 ulp |
<= 3 ulp |
log10 |
<= 2 ulp |
<= 3 ulp |
log1p |
<= 2 ulp |
<= 3 ulp |
logb |
0 ulp |
0 ulp |
mad |
Implementation-defined |
Implementation-defined |
max |
0 ulp |
0 ulp |
maxmag |
0 ulp |
0 ulp |
min |
0 ulp |
0 ulp |
minmag |
0 ulp |
0 ulp |
mix |
Implementation-defined |
Implementation-defined |
modf |
0 ulp |
0 ulp |
nan |
0 ulp |
0 ulp |
nextafter |
0 ulp |
0 ulp |
normalize |
<= 1 + n ulp, for gentype with vector width n |
Implementation-defined |
pow(x, y) |
<= 4 ulp |
<= 5 ulp |
pown(x, y) |
<= 4 ulp |
<= 5 ulp |
powr(x, y) |
<= 4 ulp |
<= 5 ulp |
radians |
<= 2 ulp |
<= 2 ulp |
remainder |
0 ulp |
0 ulp |
remquo |
0 ulp for the remainder, at least the lower 7 bits of the integral quotient |
0 ulp for the remainder, at least the lower 7 bits of the integral quotient |
rint |
Correctly rounded |
Correctly rounded |
rootn |
<= 4 ulp |
<= 5 ulp |
round |
Correctly rounded |
Correctly rounded |
rsqrt |
<=1 ulp |
<=1 ulp |
sign |
0 ulp |
0 ulp |
sin |
<= 2 ulp |
<= 2 ulp |
sincos |
<= 2 ulp for sine and cosine values |
<= 2 ulp for sine and cosine values |
sinh |
<= 2 ulp |
<= 3 ulp |
sinpi |
<= 2 ulp |
<= 2 ulp |
smoothstep |
Implementation-defined |
Implementation-defined |
sqrt |
Correctly rounded |
<= 1 ulp |
step |
0 ulp |
0 ulp |
tan |
<= 2 ulp |
<= 3 ulp |
tanh |
<= 2 ulp |
<= 3 ulp |
tanpi |
<= 2 ulp |
<= 3 ulp |
tgamma |
<= 4 ulp |
<= 4 ulp |
trunc |
Correctly rounded |
Correctly rounded |
Note: Implementations may perform floating-point operations on half
scalar or vector data types by converting the half
values to single
precision floating-point values and performing the operation in single
precision floating-point.
In this case, the implementation will use the half
scalar or vector data
type as a storage only format.
Add new sub-sections to section 8.3.1. Conversion rules for normalized integer channel data types:
For images created with image channel data type of {CL_UNORM_INT8} and
{CL_UNORM_INT16}, read_imagef will convert the channel values from an
8-bit or 16-bit unsigned integer to normalized half precision
floating-point values in the range [0.0h
, 1.0h
].
For images created with image channel data type of {CL_SNORM_INT8} and
{CL_SNORM_INT16}, read_imagef will convert the channel values from an
8-bit or 16-bit signed integer to normalized half precision floating-point
values in the range [-1.0h
, 1.0h
].
These conversions are performed as follows:
{CL_UNORM_INT8} (8-bit unsigned integer) {rightarrow} half
-
normalized
half
value =round_to_half(c / 255)
{CL_UNORM_INT_101010} (10-bit unsigned integer) {rightarrow} half
-
normalized
half
value =round_to_half(c / 1023)
{CL_UNORM_INT16} (16-bit unsigned integer) {rightarrow} half
-
normalized
half
value =round_to_half(c / 65535)
{CL_SNORM_INT8} (8-bit signed integer) {rightarrow} half
-
normalized
half
value =max(-1.0h, round_to_half(c / 127))
{CL_SNORM_INT16} (16-bit signed integer) {rightarrow} half
-
normalized
half
value =max(-1.0h, round_to_half(c / 32767))
The accuracy of the above conversions must be <= 1.5 ulp except for the following cases.
For {CL_UNORM_INT8}
-
0 must convert to
0.0h
and -
255 must convert to
1.0h
For {CL_UNORM_INT_101010}
-
0 must convert to
0.0h
and -
1023 must convert to
1.0h
For {CL_UNORM_INT16}
-
0 must convert to
0.0h
and -
65535 must convert to
1.0h
For {CL_SNORM_INT8}
-
-128 and -127 must convert to
-1.0h
, -
0 must convert to
0.0h
and -
127 must convert to
1.0h
For {CL_SNORM_INT16}
-
-32768 and -32767 must convert to
-1.0h
, -
0 must convert to
0.0h
and -
32767 must convert to
1.0h
For images created with image channel data type of {CL_UNORM_INT8} and {CL_UNORM_INT16}, write_imagef will convert the floating-point color value to an 8-bit or 16-bit unsigned integer.
For images created with image channel data type of {CL_SNORM_INT8} and {CL_SNORM_INT16}, write_imagef will convert the floating-point color value to an 8-bit or 16-bit signed integer.
The preferred conversion uses the round to nearest even (_rte
) rounding
mode, but OpenCL implementations may choose to approximate the rounding mode
used in the conversions described below.
When approximate rounding is used instead of the preferred rounding,
the result of the conversion must satisfy the bound given below.
half
{rightarrow} {CL_UNORM_INT8} (8-bit unsigned integer)
-
Let fexact = max(
0
, min(f * 255
,255
)) -
Let fpreferred = convert_uchar_sat_rte(
f * 255.0f
) -
Let fapprox = convert_uchar_sat_<impl-rounding-mode>(
f * 255.0f
) -
fabs(fexact - fapprox) must be <= 0.6
half
{rightarrow} {CL_UNORM_INT_101010} (10-bit unsigned integer)
-
Let fexact = max(
0
, min(f * 1023
,1023
)) -
Let fpreferred = min(convert_ushort_sat_rte(
f * 1023.0f
),1023
) -
Let fapprox = convert_ushort_sat_<impl-rounding-mode>(
f * 1023.0f
) -
fabs(fexact - fapprox) must be <= 0.6
half
{rightarrow} {CL_UNORM_INT16} (16-bit unsigned integer)
-
Let fexact = max(
0
, min(f * 65535
,65535
)) -
Let fpreferred = convert_ushort_sat_rte(
f * 65535.0f
) -
Let fapprox = convert_ushort_sat_<impl-rounding-mode>(
f * 65535.0f
) -
fabs(fexact - fapprox) must be <= 0.6
half
{rightarrow} {CL_SNORM_INT8} (8-bit signed integer)
-
Let fexact = max(
-128
, min(f * 127
,127
)) -
Let fpreferred = convert_char_sat_rte(
f * 127.0f
) -
Let fapprox = convert_char_sat_<impl_rounding_mode>(
f * 127.0f
) -
fabs(fexact - fapprox) must be <= 0.6
half
{rightarrow} {CL_SNORM_INT16} (16-bit signed integer)
-
Let fexact = max(
-32768
, min(f * 32767
,32767
)) -
Let fpreferred = convert_short_sat_rte(
f * 32767.0f
) -
Let fapprox = convert_short_sat_<impl-rounding-mode>(
f * 32767.0f
) -
fabs(fexact - fapprox) must be <= 0.6