Skip to content

Latest commit

 

History

History
1928 lines (1472 loc) · 60.2 KB

cl_khr_fp16.asciidoc

File metadata and controls

1928 lines (1472 loc) · 60.2 KB

Half Precision Floating-Point

This section describes the cl_khr_fp16 extension. This extension adds support for half scalar and vector types as built-in types that can be used for arithmetic operations, conversions etc.

General information

Version history

Date Version Description

2020-04-21

1.0.0

First assigned version.

Additions to Chapter 6 of the OpenCL 2.0 C Specification

The list of built-in scalar, and vector data types defined in tables 6.1, and 6.2 are extended to include the following:

Type Description

half2

A 2-component half-precision floating-point vector.

half3

A 3-component half-precision floating-point vector.

half4

A 4-component half-precision floating-point vector.

half8

A 8-component half-precision floating-point vector.

half16

A 16-component half-precision floating-point vector.

The built-in vector data types for halfn are also declared as appropriate types in the OpenCL API (and header files) that can be used by an application. The following table describes the built-in vector data types for halfn as defined in the OpenCL C programming language and the corresponding data type available to the application:

Type in OpenCL Language API type for application

half2

cl_half2

half3

cl_half3

half4

cl_half4

half8

cl_half8

half16

cl_half16

The relational, equality, logical and logical unary operators described in section 6.3 can be used with half scalar and halfn vector types and shall produce a scalar int and vector shortn result respectively.

The OpenCL compiler accepts an h and H suffix on floating point literals, indicating the literal is typed as a half.

Conversions

The implicit conversion rules specified in section 6.2.1 now include the half scalar and halfn vector data types.

The explicit casts described in section 6.2.2 are extended to take a half scalar data type and a halfn vector data type.

The explicit conversion functions described in section 6.2.3 are extended to take a half scalar data type and a halfn vector data type.

The as_typen() function for re-interpreting types as described in section 6.2.4.2 is extended to allow conversion-free casts between shortn, ushortn, and halfn scalar and vector data types.

Math Functions

The built-in math functions defined in table 6.8 (also listed below) are extended to include appropriate versions of functions that take half and half{2|3|4|8|16} as arguments and return values. gentype now also includes half, half2, half3, half4, half8, and half16.

For any specific use of a function, the actual type has to be the same for all arguments and the return type.

Table 1. Half Precision Built-in Math Functions
Function Description

gentype acos (gentype x)

Arc cosine function.

gentype acosh (gentype x)

Inverse hyperbolic cosine.

gentype acospi (gentype x)

Compute acos (x) / {pi}.

gentype asin (gentype x)

Arc sine function.

gentype asinh (gentype x)

Inverse hyperbolic sine.

gentype asinpi (gentype x)

Compute asin (x) / {pi}.

gentype atan (gentype y_over_x)

Arc tangent function.

gentype atan2 (gentype y, gentype x)

Arc tangent of y / x.

gentype atanh (gentype x)

Hyperbolic arc tangent.

gentype atanpi (gentype x)

Compute atan (x) / {pi}.

gentype atan2pi (gentype y, gentype x)

Compute atan2 (y, x) / {pi}.

gentype cbrt (gentype x)

Compute cube-root.

gentype ceil (gentype x)

Round to integral value using the round to positive infinity rounding mode.

gentype copysign (gentype x, gentype y)

Returns x with its sign changed to match the sign of y.

gentype cos (gentype x)

Compute cosine.

gentype cosh (gentype x)

Compute hyperbolic cosine.

gentype cospi (gentype x)

Compute cos ({pi} x).

gentype erfc (gentype x)

Complementary error function.

gentype erf (gentype x)

Error function encountered in integrating the normal distribution.

gentype exp (gentype x)

Compute the base- e exponential of x.

gentype exp2 (gentype x)

Exponential base 2 function.

gentype exp10 (gentype x)

Exponential base 10 function.

gentype expm1 (gentype x)

Compute ex- 1.0.

gentype fabs (gentype x)

Compute absolute value of a floating-point number.

gentype fdim (gentype x, gentype y)

x - y if x > y, +0 if x is less than or equal to y.

gentype floor (gentype x)

Round to integral value using the round to negative infinity rounding mode.

gentype fma (gentype a, gentype b, gentype c)

Returns the correctly rounded floating-point representation of the sum of c with the infinitely precise product of a and b. Rounding of intermediate products shall not occur. Edge case behavior is per the IEEE 754-2008 standard.

gentype fmax (gentype x, gentype y)
gentype fmax (gentype x, half y)

Returns y if x < y, otherwise it returns x. If one argument is a NaN, fmax() returns the other argument. If both arguments are NaNs, fmax() returns a NaN.

gentype fmin (gentype x, gentype y)
gentype fmin (gentype x, half y)

Returns y if y < x, otherwise it returns x. If one argument is a NaN, fmin() returns the other argument. If both arguments are NaNs, fmin() returns a NaN.

gentype fmod (gentype x, gentype y)

Modulus. Returns x - y * trunc (x/y) .

gentype fract (gentype x, {global} gentype *iptr)
gentype fract (gentype x, {local} gentype *iptr)
gentype fract (gentype x, {private} gentype *iptr)

For OpenCL C 2.0 or with the __opencl_c_generic_address_space feature macro:

gentype fract (gentype x, gentype *iptr)

Returns fmin( x - floor (x), 0x1.ffcp-1f ).

floor(x) is returned in iptr.

halfn frexp (halfn x, {global} intn *exp)
half frexp (half x, {global} int *exp)

halfn frexp (halfn x, {local} intn *exp)
half frexp (half x, {local} int *exp)

halfn frexp (halfn x, {private} intn *exp)
half frexp (half x, {private} int *exp)

For OpenCL C 2.0 or with the __opencl_c_generic_address_space feature macro:

halfn frexp (halfn x, intn *exp)
half frexp (half x, int *exp)

Extract mantissa and exponent from x. For each component the mantissa returned is a float with magnitude in the interval [1/2, 1) or 0. Each component of x equals mantissa returned * 2exp.

gentype hypot (gentype x, gentype y)

Compute the value of the square root of x2+ y2 without undue overflow or underflow.

intn ilogb (halfn x)
int ilogb (half x)

Return the exponent as an integer value.

halfn ldexp (halfn x, intn k)
halfn ldexp (halfn x, int k)
half ldexp (half x, int k)

Multiply x by 2 to the power k.

gentype lgamma (gentype x)

halfn lgamma_r (halfn x, {global} intn *signp)
half lgamma_r (half x, {global} int *signp)

halfn lgamma_r (halfn x, {local} intn *signp)
half lgamma_r (half x, {local} int *signp)

halfn lgamma_r (halfn x, {private} intn *signp)
half lgamma_r (half x, {private} int *signp)

For OpenCL C 2.0 or with the __opencl_c_generic_address_space feature macro:

halfn lgamma_r (halfn x, intn *signp)
half lgamma_r (half x, int *signp)

Log gamma function. Returns the natural logarithm of the absolute value of the gamma function. The sign of the gamma function is returned in the signp argument of lgamma_r.

gentype log (gentype x)

Compute natural logarithm.

gentype log2 (gentype x)

Compute a base 2 logarithm.

gentype log10 (gentype x)

Compute a base 10 logarithm.

gentype log1p (gentype x)

Compute loge(1.0 + x) .

gentype logb (gentype x)

Compute the exponent of x, which is the integral part of logr|x|.

gentype mad (gentype a, gentype b, gentype c)

mad computes a * b + c. The function may compute a * b + c with reduced accuracy in the embedded profile. See the OpenCL SPIR-V Environment Specification for details. On some hardware the mad instruction may provide better performance than expanded computation of a * b + c.

Note: For some usages, e.g. mad(a, b, -a*b), the half precision definition of mad() is loose enough that almost any result is allowed from mad() for some values of a and b.

gentype maxmag (gentype x, gentype y)

Returns x if |x| > |y|, y if |y| > |x|, otherwise fmax(x, y).

gentype minmag (gentype x, gentype y)

Returns x if |x| < |y|, y if |y| < |x|, otherwise fmin(x, y).

gentype modf (gentype x, {global} gentype *iptr)
gentype modf (gentype x, {local} gentype *iptr)
gentype modf (gentype x, {private} gentype *iptr)

For OpenCL C 2.0 or with the __opencl_c_generic_address_space feature macro:

gentype modf (gentype x, gentype *iptr)

Decompose a floating-point number. The modf function breaks the argument x into integral and fractional parts, each of which has the same sign as the argument. It stores the integral part in the object pointed to by iptr.

halfn nan (ushortn nancode)
half nan (ushort nancode)

Returns a quiet NaN. The nancode may be placed in the significand of the resulting NaN.

gentype nextafter (gentype x, gentype y)

Computes the next representable half-precision floating-point value following x in the direction of y. Thus, if y is less than x, nextafter() returns the largest representable floating-point number less than x.

gentype pow (gentype x, gentype y)

Compute x to the power y.

halfn pown (halfn x, intn y)
half pown (half x, int y)

Compute x to the power y, where y is an integer.

gentype powr (gentype x, gentype y)

Compute x to the power y, where x is >= 0.

gentype remainder (gentype x, gentype y)

Compute the value r such that r = x - n*y, where n is the integer nearest the exact value of x/y. If there are two integers closest to x/y, n shall be the even one. If r is zero, it is given the same sign as x.

halfn remquo (halfn x, halfn y, {global} intn *quo)
half remquo (half x, half y, {global} int *quo)

halfn remquo (halfn x, halfn y, {local} intn *quo)
half remquo (half x, half y, {local} int *quo)

halfn remquo (halfn x, halfn y, {private} intn *quo)
half remquo (half x, half y, {private} int *quo)

For OpenCL C 2.0 or with the __opencl_c_generic_address_space feature macro:

halfn remquo (halfn x, halfn y, intn *quo)
half remquo (half x, half y, int *quo)

The remquo function computes the value r such that r = x - k*y, where k is the integer nearest the exact value of x/y. If there are two integers closest to x/y, k shall be the even one. If r is zero, it is given the same sign as x. This is the same value that is returned by the remainder function. remquo also calculates the lower seven bits of the integral quotient x/y, and gives that value the same sign as x/y. It stores this signed value in the object pointed to by quo.

gentype rint (gentype x)

Round to integral value (using round to nearest even rounding mode) in floating-point format. Refer to section 7.1 for description of rounding modes.

halfn rootn (halfn x, intn y)
half rootn (half x, int y)

Compute x to the power 1/y.

gentype round (gentype x)

Return the integral value nearest to x rounding halfway cases away from zero, regardless of the current rounding direction.

gentype rsqrt (gentype x)

Compute inverse square root.

gentype sin (gentype x)

Compute sine.

gentype sincos (gentype x, {global} gentype *cosval)
gentype sincos (gentype x, {local} gentype *cosval)
gentype sincos (gentype x, {private} gentype *cosval)

For OpenCL C 2.0 or with the __opencl_c_generic_address_space feature macro:

gentype sincos (gentype x, gentype *cosval)

Compute sine and cosine of x. The computed sine is the return value and computed cosine is returned in cosval.

gentype sinh (gentype x)

Compute hyperbolic sine.

gentype sinpi (gentype x)

Compute sin ({pi} x).

gentype sqrt (gentype x)

Compute square root.

gentype tan (gentype x)

Compute tangent.

gentype tanh (gentype x)

Compute hyperbolic tangent.

gentype tanpi (gentype x)

Compute tan ({pi} x).

gentype tgamma (gentype x)

Compute the gamma function.

gentype trunc (gentype x)

Round to integral value using the round to zero rounding mode.

The FP_FAST_FMA_HALF macro indicates whether the fma() family of functions are fast compared with direct code for half precision floating-point. If defined, the FP_FAST_FMA_HALF macro shall indicate that the fma() function generally executes about as fast as, or faster than, a multiply and an add of half operands.

The macro names given in the following list must use the values specified. These constant expressions are suitable for use in #if preprocessing directives.

#define HALF_DIG            3
#define HALF_MANT_DIG       11
#define HALF_MAX_10_EXP     +4
#define HALF_MAX_EXP        +16
#define HALF_MIN_10_EXP     -4
#define HALF_MIN_EXP        -13
#define HALF_RADIX          2
#define HALF_MAX            0x1.ffcp15h
#define HALF_MIN            0x1.0p-14h
#define HALF_EPSILON        0x1.0p-10h

The following table describes the built-in macro names given above in the OpenCL C programming language and the corresponding macro names available to the application.

Macro in OpenCL Language Macro for application

HALF_DIG

{CL_HALF_DIG}

HALF_MANT_DIG

{CL_HALF_MANT_DIG}

HALF_MAX_10_EXP

{CL_HALF_MAX_10_EXP}

HALF_MAX_EXP

{CL_HALF_MAX_EXP}

HALF_MIN_10_EXP

{CL_HALF_MIN_10_EXP}

HALF_MIN_EXP

{CL_HALF_MIN_EXP}

HALF_RADIX

{CL_HALF_RADIX}

HALF_MAX

{CL_HALF_MAX}

HALF_MIN

{CL_HALF_MIN}

HALF_EPSILSON

{CL_HALF_EPSILON}

The following constants are also available. They are of type half and are accurate within the precision of the half type.

Constant Description

M_E_H

Value of e

M_LOG2E_H

Value of log2e

M_LOG10E_H

Value of log10e

M_LN2_H

Value of loge2

M_LN10_H

Value of loge10

M_PI_H

Value of {pi}

M_PI_2_H

Value of {pi} / 2

M_PI_4_H

Value of {pi} / 4

M_1_PI_H

Value of 1 / {pi}

M_2_PI_H

Value of 2 / {pi}

M_2_SQRTPI_H

Value of 2 / {sqrt}{pi}

M_SQRT2_H

Value of {sqrt}2

M_SQRT1_2_H

Value of 1 / {sqrt}2

Common Functions

The built-in common functions defined in table 6.12 (also listed below) are extended to include appropriate versions of functions that take half and half{2|3|4|8|16} as arguments and return values. gentype now also includes half, half2, half3, half4, half8 and half16. These are described below.

Table 2. Half Precision Built-in Common Functions
Function Description

gentype clamp (
gentype x, gentype minval, gentype maxval)

gentype clamp (
gentype x, half minval, half maxval)

Returns fmin(fmax(x, minval), maxval).

Results are undefined if minval > maxval.

gentype degrees (gentype radians)

Converts radians to degrees,
i.e. (180 / {pi}) * radians.

gentype max (gentype x, gentype y)
gentype max (gentype x, half y)

Returns y if x < y, otherwise it returns x. If x and y are infinite or NaN, the return values are undefined.

gentype min (gentype x, gentype y)
gentype min (gentype x, half y)

Returns y if y < x, otherwise it returns x. If x and y are infinite or NaN, the return values are undefined.

gentype mix (gentype x, gentype y, gentype a)
gentype mix (gentype x, gentype y, half a)

Returns the linear blend of x and y implemented as:

x + (y - x) * a

a must be a value in the range 0.0 …​ 1.0. If a is not in the range 0.0 …​ 1.0, the return values are undefined.

Note: The half precision mix function can be implemented using contractions such as mad or fma.

gentype radians (gentype degrees)

Converts degrees to radians, i.e. ({pi} / 180) * degrees.

gentype step (gentype edge, gentype x)
gentype step (half edge, gentype x)

Returns 0.0 if x < edge, otherwise it returns 1.0.

gentype smoothstep (
gentype edge0, gentype edge1, gentype x)

gentype smoothstep (
half edge0, half edge1, gentype x)

Returns 0.0 if x <= edge0 and 1.0 if x >= edge1 and performs smooth Hermite interpolation between 0 and 1 when edge0 < x < edge1. This is useful in cases where you would want a threshold function with a smooth transition.

This is equivalent to:

gentype t;
t = clamp ((x - edge0) / (edge1 - edge0), 0, 1);
return t * t * (3 - 2 * t);

Results are undefined if edge0 >= edge1.

Note: The half precision smoothstep function can be implemented using contractions such as mad or fma.

gentype sign (gentype x)

Returns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or -1.0 if x < 0. Returns 0.0 if x is a NaN.

Geometric Functions

The built-in geometric functions defined in table 6.13 (also listed below) are extended to include appropriate versions of functions that take half and half{2|3|4} as arguments and return values. gentype now also includes half, half2, half3 and half4. These are described below.

Note: The half precision geometric functions can be implemented using contractions such as mad or fma.

Table 3. Half Precision Built-in Geometric Functions
Function Description

half4 cross (half4 p0, half4 p1)
half3 cross (half3 p0, half3 p1)

Returns the cross product of p0.xyz and p1.xyz. The w component of the result will be 0.0.

half dot (gentype p0, gentype p1)

Compute the dot product of p0 and p1.

half distance (gentype p0, gentype p1)

Returns the distance between p0 and p1. This is calculated as length(p0 - p1).

half length (gentype p)

Return the length of vector x, i.e.,
sqrt( p.x2 + p.y2 + …​ )

gentype normalize (gentype p)

Returns a vector in the same direction as p but with a length of 1.

Relational Functions

The scalar and vector relational functions described in table 6.14 are extended to include versions that take half, half2, half3, half4, half8 and half16 as arguments.

The relational and equality operators (<, <=, >, >=, !=, ==) can be used with halfn vector types and shall produce a vector shortn result as described in section 6.3.

The functions isequal, isnotequal, isgreater, isgreaterequal, isless, islessequal, islessgreater, isfinite, isinf, isnan, isnormal, isordered, isunordered and signbit shall return a 0 if the specified relation is false and a 1 if the specified relation is true for scalar argument types. These functions shall return a 0 if the specified relation is false and a -1 (i.e. all bits set) if the specified relation is true for vector argument types.

The relational functions isequal, isgreater, isgreaterequal, isless, islessequal, and islessgreater always return 0 if either argument is not a number (NaN). isnotequal returns 1 if one or both arguments are not a number (NaN) and the argument type is a scalar and returns -1 if one or both arguments are not a number (NaN) and the argument type is a vector.

The functions described in table 6.14 are extended to include the halfn vector types.

Table 4. Half Precision Relational Functions
Function Description

int isequal (half x, half y)
shortn isequal (halfn x, halfn y)

Returns the component-wise compare of x == y.

int isnotequal (half x, half y)
shortn isnotequal (halfn x, halfn y)

Returns the component-wise compare of x != y.

int isgreater (half x, half y)
shortn isgreater (halfn x, halfn y)

Returns the component-wise compare of x > y.

int isgreaterequal (half x, half y)
shortn isgreaterequal (halfn x, halfn y)

Returns the component-wise compare of x >= y.

int isless (half x, half y)
shortn isless (halfn x, halfn y)

Returns the component-wise compare of x < y.

int islessequal (half x, half y)
shortn islessequal (halfn x, halfn y)

Returns the component-wise compare of x <= y.

int islessgreater (half x, half y)
shortn islessgreater (halfn x, halfn y)

Returns the component-wise compare of (x < y) || (x > y) .

int isfinite (half)
shortn isfinite (halfn)

Test for finite value.

int isinf (half)
shortn isinf (halfn)

Test for infinity value (positive or negative) .

int isnan (half)
shortn isnan (halfn)

Test for a NaN.

int isnormal (half)
shortn isnormal (halfn)

Test for a normal value.

int isordered (half x, half y)
shortn isordered (halfn x, halfn y)

Test if arguments are ordered. isordered() takes arguments x and y, and returns the result isequal(x, x) && isequal(y, y).

int isunordered (half x, half y)
shortn isunordered (halfn x, halfn y)

Test if arguments are unordered. isunordered() takes arguments x and y, returning non-zero if x or y is a NaN, and zero otherwise.

int signbit (half)
shortn signbit (halfn)

Test for sign bit. The scalar version of the function returns a 1 if the sign bit in the half is set else returns 0. The vector version of the function returns the following for each component in halfn: -1 (i.e all bits set) if the sign bit in the half is set else returns 0.

halfn bitselect (halfn a, halfn b, halfn c)

Each bit of the result is the corresponding bit of a if the corresponding bit of c is 0. Otherwise it is the corresponding bit of b.

halfn select (halfn a, halfn b, shortn c)
halfn select (halfn a, halfn b, ushortn c)

For each component,
result[i] = if MSB of c[i] is set ? b[i] : a[i].

Vector Data Load and Store Functions

The vector data load (vloadn) and store (vstoren) functions described in table 6.13 (also listed below) are extended to include versions that read or write half vector values. The generic type gentype is extended to include half. The generic type gentypen is extended to include half2, half3, half4, half8, and half16.

Note: vload3 reads x, y, z components from address (p + (offset * 3)) into a 3-component vector and vstore3 writes x, y, z components from a 3-component vector to address (p + (offset * 3)).

Table 5. Half Precision Vector Data Load and Store Functions
Function Description

gentypen vloadn(size_t offset, const {global} gentype *p)
gentypen vloadn(size_t offset, const {local} gentype *p)
gentypen vloadn(size_t offset, const {constant} gentype *p)
gentypen vloadn(size_t offset, const {private} gentype *p)

For OpenCL C 2.0 or with the __opencl_c_generic_address_space feature macro:

gentypen vloadn(size_t offset, const gentype *p)

Return sizeof (gentypen) bytes of data read from address (p + (offset * n)). If gentype is half, the read address computed as (p + (offset * n)) must be 16-bit aligned.

void vstoren(gentypen data, size_t offset, {global} gentype *p)
void vstoren(gentypen data, size_t offset, {local} gentype *p)
void vstoren(gentypen data, size_t offset, {private} gentype *p)

For OpenCL C 2.0 or with the __opencl_c_generic_address_space feature macro:

void vstoren(gentypen data, size_t offset, gentype *p)

Write sizeof (gentypen) bytes given by data to address (p + (offset * n)). If gentype is half, the write address computed as (p + (offset * n)) must be 16-bit aligned.

Async Copies from Global to Local Memory, Local to Global Memory, and Prefetch

The OpenCL C programming language implements the following functions that provide asynchronous copies between global and local memory and a prefetch from global memory.

The generic type gentype is extended to include half, half2, half3, half4, half8, and half16.

Table 6. Half Precision Built-in Async Copy and Prefetch Functions
Function Description

event_t async_work_group_copy (
{local} gentype *dst,
const {global} gentype *src,
size_t num_gentypes, event_t event)

event_t async_work_group_copy (
{global} gentype *dst,
const {local} gentype *src,
size_t num_gentypes, event_t event)

Perform an async copy of num_gentypes gentype elements from src to dst. The async copy is performed by all work-items in a work-group and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined.

Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero.

If event argument is not zero, the event object supplied in event argument will be returned.

This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy.

event_t async_work_group_strided_copy (
{local} gentype *dst,
const {global} gentype *src,
size_t num_gentypes,
size_t src_stride, event_t event)

event_t async_work_group_strided_copy (
{global} gentype *dst,
const {local} gentype *src,
size_t num_gentypes,
size_t dst_stride, event_t event)

Perform an async gather of num_gentypes gentype elements from src to dst. The src_stride is the stride in elements for each gentype element read from src. The async gather is performed by all work-items in a work-group and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined.

Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_strided_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero.

If event argument is not zero, the event object supplied in event argument will be returned.

This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy.

The behavior of async_work_group_strided_copy is undefined if src_stride or dst_stride is 0, or if the src_stride or dst_stride values cause the src or dst pointers to exceed the upper bounds of the address space during the copy.

void wait_group_events (
int num_events, event_t *event_list)

Wait for events that identify the async_work_group_copy operations to complete. The event objects specified in event_list will be released after the wait is performed.

This function must be encountered by all work-items in a work-group executing the kernel with the same num_events and event objects specified in event_list; otherwise the results are undefined.

void prefetch (
const {global} gentype *p, size_t num_gentypes)

Prefetch num_gentypes * sizeof(gentype) bytes into the global cache. The prefetch instruction is applied to a work-item in a work-group and does not affect the functional behavior of the kernel.

Image Read and Write Functions

The image read and write functions defined in tables 6.23, 6.24 and 6.25 are extended to support image color values that are a half type.

Built-in Image Read Functions

Table 7. Half Precision Built-in Image Read Functions
Function Description

half4 read_imageh (
read_only image2d_t image,
sampler_t sampler,
int2 coord)

half4 read_imageh (
read_only image2d_t image,
sampler_t sampler,
float2 coord)

Use the coordinate (coord.x, coord.y) to do an element lookup in the 2D image object specified by image.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats, {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

The read_imageh calls that take integer coordinates must use a sampler with filter mode set to CLK_FILTER_NEAREST, normalized coordinates set to CLK_NORMALIZED_COORDS_FALSE and addressing mode set to CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE; otherwise the values returned are undefined.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined.

half4 read_imageh (
read_only image3d_t image,
sampler_t sampler,
int4 coord )

half4 read_imageh (
read_only image3d_t image,
sampler_t sampler,
float4 coord)

Use the coordinate (coord.x, coord.y, coord.z) to do an elementlookup in the 3D image object specified by image. coord.w is ignored.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imagehreturns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

The read_imageh calls that take integer coordinates must use a sampler with filter mode set to CLK_FILTER_NEAREST, normalized coordinates set to CLK_NORMALIZED_COORDS_FALSE and addressing mode set to CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE; otherwise the values returned are undefined.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description are undefined.

half4 read_imageh (
read_only image2d_array_t image,
sampler_t sampler,
int4 coord)

half4 read_imageh (
read_only image2d_array_t image,
sampler_t sampler,
float4 coord)

Use coord.xy to do an element lookup in the 2D image identified by coord.z in the 2D image array specified by image.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

The read_imageh calls that take integer coordinates must use a sampler with filter mode set to CLK_FILTER_NEAREST, normalized coordinates set to CLK_NORMALIZED_COORDS_FALSE and addressing mode set to CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE; otherwise the values returned are undefined.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined.

half4 read_imageh (
read_only image1d_t image,
sampler_t sampler,
int coord)

half4 read_imageh (
read_only image1d_t image,
sampler_t sampler,
float coord)

Use coord to do an element lookup in the 1D image object specified by image.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

The read_imageh calls that take integer coordinates must use a sampler with filter mode set to CLK_FILTER_NEAREST, normalized coordinates set to CLK_NORMALIZED_COORDS_FALSE and addressing mode set to CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE; otherwise the values returned are undefined.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined.

half4 read_imageh (
read_only image1d_array_t image,
sampler_t sampler,
int2 coord)

half4 read_imageh (
read_only image1d_array_t image,
sampler_t sampler,
float2 coord)

Use coord.x to do an element lookup in the 1D image identified by coord.y in the 1D image array specified by image.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

The read_imageh calls that take integer coordinates must use a sampler with filter mode set to CLK_FILTER_NEAREST, normalized coordinates set to CLK_NORMALIZED_COORDS_FALSE and addressing mode set to CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE; otherwise the values returned are undefined.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined.

Built-in Image Sampler-less Read Functions

aQual in Table 6.24 refers to one of the access qualifiers. For sampler-less read functions this may be read_only or read_write.

Table 8. Half Precision Built-in Image Sampler-less Read Functions
Function Description

half4 read_imageh (
aQual image2d_t image,
int2 coord)

Use the coordinate (coord.x, coord.y) to do an element lookup in the 2D image object specified by image.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined.

half4 read_imageh (
aQual image3d_t image,
int4 coord )

Use the coordinate (coord.x, coord.y, coord.z) to do an element lookup in the 3D image object specified by image. coord.w is ignored.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description are undefined.

half4 read_imageh (
aQual image2d_array_t image,
int4 coord)

Use coord.xy to do an element lookup in the 2D image identified by coord.z in the 2D image array specified by image.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined.

half4 read_imageh (
aQual image1d_t image,
int coord)

half4 read_imageh (
aQual image1d_buffer_t image,
int coord)

Use coord to do an element lookup in the 1D image or 1D image buffer object specified by image.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined.

half4 read_imageh (
aQual image1d_array_t image,
int2 coord)

Use coord.x to do an element lookup in the 2D image identified by coord.y in the 2D image array specified by image.

read_imageh returns half precision floating-point values in the range [0.0 …​ 1.0] for image objects created with image_channel_data_type set to one of the pre-defined packed formats or {CL_UNORM_INT8}, or {CL_UNORM_INT16}.

read_imageh returns half precision floating-point values in the range [-1.0 …​ 1.0] for image objects created with image_channel_data_type set to {CL_SNORM_INT8}, or {CL_SNORM_INT16}.

read_imageh returns half precision floating-point values for image objects created with image_channel_data_type set to {CL_HALF_FLOAT}.

Values returned by read_imageh for image objects with image_channel_data_type values not specified in the description above are undefined.

Built-in Image Write Functions

aQual in Table 6.25 refers to one of the access qualifiers. For write functions this may be write_only or read_write.

Table 9. Half Precision Built-in Image Write Functions
Function Description

void write_imageh (
aQual image2d_t image,
int2 coord,
half4 color)

Write color value to location specified by coord.xy in the 2D image specified by image.

Appropriate data format conversion to the specified image format is done before writing the color value. x & y are considered to be unnormalized coordinates and must be in the range 0 …​ width - 1, and 0 …​ height - 1.

write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}.

The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with (x, y) coordinate values that are not in the range (0 …​ width - 1, 0 …​ height - 1) respectively, is undefined.

void write_imageh (
aQual image2d_array_t image,
int4 coord,
half4 color)

Write color value to location specified by coord.xy in the 2D image identified by coord.z in the 2D image array specified by image.

Appropriate data format conversion to the specified image format is done before writing the color value. coord.x, coord.y and coord.z are considered to be unnormalized coordinates and must be in the range 0 …​ image width - 1, 0 …​ image height - 1 and 0 …​ image number of layers - 1.

write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}.

The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with (x, y, z) coordinate values that are not in the range (0 …​ image width - 1, 0 …​ image height - 1, 0 …​ image number of layers - 1), respectively, is undefined.

void write_imageh (
aQual image1d_t image,
int coord,
half4 color)

void write_imageh (
aQual image1d_buffer_t image,
int coord,
half4 color)

Write color value to location specified by coord in the 1D image or 1D image buffer object specified by image. Appropriate data format conversion to the specified image format is done before writing the color value. coord is considered to be unnormalized coordinates and must be in the range 0 …​ image width - 1.

write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}. Appropriate data format conversion will be done to convert channel data from a floating-point value to actual data format in which the channels are stored.

The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with coordinate values that is not in the range (0 …​ image width - 1), is undefined.

void write_imageh (
aQual image1d_array_t image,
int2 coord,
half4 color)

Write color value to location specified by coord.x in the 1D image identified by coord.y in the 1D image array specified by image. Appropriate data format conversion to the specified image format is done before writing the color value. coord.x and coord.y are considered to be unnormalized coordinates and must be in the range 0 …​ image width - 1 and 0 …​ image number of layers - 1.

write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}. Appropriate data format conversion will be done to convert channel data from a floating-point value to actual data format in which the channels are stored.

The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with (x, y) coordinate values that are not in the range (0 …​ image width - 1, 0 …​ image number of layers - 1), respectively, is undefined.

void write_imageh (
aQual image3d_t image,
int4 coord,
half4 color)

Write color value to location specified by coord.xyz in the 3D image object specified by image.

Appropriate data format conversion to the specified image format is done before writing the color value. coord.x, coord.y and coord.z are considered to be unnormalized coordinates and must be in the range 0 …​ image width - 1, 0 …​ image height - 1 and 0 …​ image depth - 1.

write_imageh can only be used with image objects created with image_channel_data_type set to one of the pre-defined packed formats or set to {CL_SNORM_INT8}, {CL_UNORM_INT8}, {CL_SNORM_INT16}, {CL_UNORM_INT16} or {CL_HALF_FLOAT}.

The behavior of write_imageh for image objects created with image_channel_data_type values not specified in the description above or with (x, y, z) coordinate values that are not in the range (0 …​ image width - 1, 0 …​ image height - 1, 0 …​ image depth - 1), respectively, is undefined.

Note: This built-in function is only available if the cl_khr_3d_image_writes extension is also supported by the device.

IEEE754 Compliance

The following table entry describes the additions to table 4.3, which allows applications to query the configuration information using {clGetDeviceInfo} for an OpenCL device that supports half precision floating-point.

Op-code Return Type Description

{CL_DEVICE_HALF_FP_CONFIG}

{cl_device_fp_config_TYPE}

Describes half precision floating-point capability of the OpenCL device. This is a bit-field that describes one or more of the following values:

{CL_FP_DENORM} — denorms are supported

{CL_FP_INF_NAN} — INF and NaNs are supported

{CL_FP_ROUND_TO_NEAREST} — round to nearest even rounding mode supported

{CL_FP_ROUND_TO_ZERO} — round to zero rounding mode supported

{CL_FP_ROUND_TO_INF} — round to positive and negative infinity rounding modes supported

{CL_FP_FMA} — IEEE754-2008 fused multiply-add is supported

{CL_FP_SOFT_FLOAT} — Basic floating-point operations (such as addition, subtraction, multiplication) are implemented in software.

The required minimum half precision floating-point capability as implemented by this extension is:

{CL_FP_ROUND_TO_ZERO}, or {CL_FP_ROUND_TO_NEAREST} | {CL_FP_INF_NAN}.

Rounding Modes

If {CL_FP_ROUND_TO_NEAREST} is supported, the default rounding mode for half-precision floating-point operations will be round to nearest even; otherwise the default rounding mode will be round to zero.

Conversions to half floating point format must be correctly rounded using the indicated convert operator rounding mode or the default rounding mode for half-precision floating-point operations if no rounding mode is specified by the operator, or a C-style cast is used.

Conversions from half to integer format shall correctly round using the indicated convert operator rounding mode, or towards zero if no rounding mode is specified by the operator or a C-style cast is used. All conversions from half to floating point formats are exact.

Relative Error as ULPs

In this section we discuss the maximum relative error defined as ulp (units in the last place).

Addition, subtraction, multiplication, fused multiply-add operations on half types are required to be correctly rounded using the default rounding mode for half-precision floating-point operations.

The following table describes the minimum accuracy of half precision floating-point arithmetic operations given as ULP values. 0 ULP is used for math functions that do not require rounding. The reference value used to compute the ULP value of an arithmetic operation is the infinitely precise result.

Table 10. ULP Values for Half Precision Floating-Point Arithmetic Operations
Function Min Accuracy - Full Profile Min Accuracy - Embedded Profile

x + y

Correctly rounded

Correctly rounded

x - y

Correctly rounded

Correctly rounded

x * y

Correctly rounded

Correctly rounded

1.0 / x

Correctly rounded

<= 1 ulp

x / y

Correctly rounded

<= 1 ulp

acos

<= 2 ulp

<= 3 ulp

acosh

<= 2 ulp

<= 3 ulp

acospi

<= 2 ulp

<= 3 ulp

asin

<= 2 ulp

<= 3 ulp

asinh

<= 2 ulp

<= 3 ulp

asinpi

<= 2 ulp

<= 3 ulp

atan

<= 2 ulp

<= 3 ulp

atanh

<= 2 ulp

<= 3 ulp

atanpi

<= 2 ulp

<= 3 ulp

atan2

<= 2 ulp

<= 3 ulp

atan2pi

<= 2 ulp

<= 3 ulp

cbrt

<= 2 ulp

<= 2 ulp

ceil

Correctly rounded

Correctly rounded

clamp

0 ulp

0 ulp

copysign

0 ulp

0 ulp

cos

<= 2 ulp

<= 2 ulp

cosh

<= 2 ulp

<= 3 ulp

cospi

<= 2 ulp

<= 2 ulp

cross

absolute error tolerance of 'max * max * (3 * HLF_EPSILON)' per vector component, where max is the maximum input operand magnitude

Implementation-defined

degrees

<= 2 ulp

<= 2 ulp

distance

<= 2n ulp, for gentype with vector width n

Implementation-defined

dot

absolute error tolerance of 'max * max * (2n - 1) * HLF_EPSILON', for vector width n and maximum input operand magnitude max across all vector components

Implementation-defined

erfc

<= 4 ulp

<= 4 ulp

erf

<= 4 ulp

<= 4 ulp

exp

<= 2 ulp

<= 3 ulp

exp2

<= 2 ulp

<= 3 ulp

exp10

<= 2 ulp

<= 3 ulp

expm1

<= 2 ulp

<= 3 ulp

fabs

0 ulp

0 ulp

fdim

Correctly rounded

Correctly rounded

floor

Correctly rounded

Correctly rounded

fma

Correctly rounded

Correctly rounded

fmax

0 ulp

0 ulp

fmin

0 ulp

0 ulp

fmod

0 ulp

0 ulp

fract

Correctly rounded

Correctly rounded

frexp

0 ulp

0 ulp

hypot

<= 2 ulp

<= 3 ulp

ilogb

0 ulp

0 ulp

ldexp

Correctly rounded

Correctly rounded

length

<= 0.25 + 0.5n ulp, for gentype with vector width n

Implementation-defined

log

<= 2 ulp

<= 3 ulp

log2

<= 2 ulp

<= 3 ulp

log10

<= 2 ulp

<= 3 ulp

log1p

<= 2 ulp

<= 3 ulp

logb

0 ulp

0 ulp

mad

Implementation-defined

Implementation-defined

max

0 ulp

0 ulp

maxmag

0 ulp

0 ulp

min

0 ulp

0 ulp

minmag

0 ulp

0 ulp

mix

Implementation-defined

Implementation-defined

modf

0 ulp

0 ulp

nan

0 ulp

0 ulp

nextafter

0 ulp

0 ulp

normalize

<= 1 + n ulp, for gentype with vector width n

Implementation-defined

pow(x, y)

<= 4 ulp

<= 5 ulp

pown(x, y)

<= 4 ulp

<= 5 ulp

powr(x, y)

<= 4 ulp

<= 5 ulp

radians

<= 2 ulp

<= 2 ulp

remainder

0 ulp

0 ulp

remquo

0 ulp for the remainder, at least the lower 7 bits of the integral quotient

0 ulp for the remainder, at least the lower 7 bits of the integral quotient

rint

Correctly rounded

Correctly rounded

rootn

<= 4 ulp

<= 5 ulp

round

Correctly rounded

Correctly rounded

rsqrt

<=1 ulp

<=1 ulp

sign

0 ulp

0 ulp

sin

<= 2 ulp

<= 2 ulp

sincos

<= 2 ulp for sine and cosine values

<= 2 ulp for sine and cosine values

sinh

<= 2 ulp

<= 3 ulp

sinpi

<= 2 ulp

<= 2 ulp

smoothstep

Implementation-defined

Implementation-defined

sqrt

Correctly rounded

<= 1 ulp

step

0 ulp

0 ulp

tan

<= 2 ulp

<= 3 ulp

tanh

<= 2 ulp

<= 3 ulp

tanpi

<= 2 ulp

<= 3 ulp

tgamma

<= 4 ulp

<= 4 ulp

trunc

Correctly rounded

Correctly rounded

Note: Implementations may perform floating-point operations on half scalar or vector data types by converting the half values to single precision floating-point values and performing the operation in single precision floating-point. In this case, the implementation will use the half scalar or vector data type as a storage only format.

Additions to Chapter 8 of the OpenCL 2.0 C Specification

Add new sub-sections to section 8.3.1. Conversion rules for normalized integer channel data types:

Converting normalized integer channel data types to half precision floating-point values

For images created with image channel data type of {CL_UNORM_INT8} and {CL_UNORM_INT16}, read_imagef will convert the channel values from an 8-bit or 16-bit unsigned integer to normalized half precision floating-point values in the range [0.0h, 1.0h].

For images created with image channel data type of {CL_SNORM_INT8} and {CL_SNORM_INT16}, read_imagef will convert the channel values from an 8-bit or 16-bit signed integer to normalized half precision floating-point values in the range [-1.0h, 1.0h].

These conversions are performed as follows:

{CL_UNORM_INT8} (8-bit unsigned integer) {rightarrow} half

  • normalized half value = round_to_half(c / 255)

{CL_UNORM_INT_101010} (10-bit unsigned integer) {rightarrow} half

  • normalized half value = round_to_half(c / 1023)

{CL_UNORM_INT16} (16-bit unsigned integer) {rightarrow} half

  • normalized half value = round_to_half(c / 65535)

{CL_SNORM_INT8} (8-bit signed integer) {rightarrow} half

  • normalized half value = max(-1.0h, round_to_half(c / 127))

{CL_SNORM_INT16} (16-bit signed integer) {rightarrow} half

  • normalized half value = max(-1.0h, round_to_half(c / 32767))

The accuracy of the above conversions must be <= 1.5 ulp except for the following cases.

For {CL_UNORM_INT8}

  • 0 must convert to 0.0h and

  • 255 must convert to 1.0h

For {CL_UNORM_INT_101010}

  • 0 must convert to 0.0h and

  • 1023 must convert to 1.0h

For {CL_UNORM_INT16}

  • 0 must convert to 0.0h and

  • 65535 must convert to 1.0h

For {CL_SNORM_INT8}

  • -128 and -127 must convert to -1.0h,

  • 0 must convert to 0.0h and

  • 127 must convert to 1.0h

For {CL_SNORM_INT16}

  • -32768 and -32767 must convert to -1.0h,

  • 0 must convert to 0.0h and

  • 32767 must convert to 1.0h

Converting half precision floating-point values to normalized integer channel data types

For images created with image channel data type of {CL_UNORM_INT8} and {CL_UNORM_INT16}, write_imagef will convert the floating-point color value to an 8-bit or 16-bit unsigned integer.

For images created with image channel data type of {CL_SNORM_INT8} and {CL_SNORM_INT16}, write_imagef will convert the floating-point color value to an 8-bit or 16-bit signed integer.

The preferred conversion uses the round to nearest even (_rte) rounding mode, but OpenCL implementations may choose to approximate the rounding mode used in the conversions described below. When approximate rounding is used instead of the preferred rounding, the result of the conversion must satisfy the bound given below.

half {rightarrow} {CL_UNORM_INT8} (8-bit unsigned integer)

  • Let fexact = max(0, min(f * 255, 255))

  • Let fpreferred = convert_uchar_sat_rte(f * 255.0f)

  • Let fapprox = convert_uchar_sat_<impl-rounding-mode>(f * 255.0f)

  • fabs(fexact - fapprox) must be <= 0.6

half {rightarrow} {CL_UNORM_INT_101010} (10-bit unsigned integer)

  • Let fexact = max(0, min(f * 1023, 1023))

  • Let fpreferred = min(convert_ushort_sat_rte(f * 1023.0f), 1023)

  • Let fapprox = convert_ushort_sat_<impl-rounding-mode>(f * 1023.0f)

  • fabs(fexact - fapprox) must be <= 0.6

half {rightarrow} {CL_UNORM_INT16} (16-bit unsigned integer)

  • Let fexact = max(0, min(f * 65535, 65535))

  • Let fpreferred = convert_ushort_sat_rte(f * 65535.0f)

  • Let fapprox = convert_ushort_sat_<impl-rounding-mode>(f * 65535.0f)

  • fabs(fexact - fapprox) must be <= 0.6

half {rightarrow} {CL_SNORM_INT8} (8-bit signed integer)

  • Let fexact = max(-128, min(f * 127, 127))

  • Let fpreferred = convert_char_sat_rte(f * 127.0f)

  • Let fapprox = convert_char_sat_<impl_rounding_mode>(f * 127.0f)

  • fabs(fexact - fapprox) must be <= 0.6

half {rightarrow} {CL_SNORM_INT16} (16-bit signed integer)

  • Let fexact = max(-32768, min(f * 32767, 32767))

  • Let fpreferred = convert_short_sat_rte(f * 32767.0f)

  • Let fapprox = convert_short_sat_<impl-rounding-mode>(f * 32767.0f)

  • fabs(fexact - fapprox) must be <= 0.6