Skip to content

Commit

Permalink
ARROW-4578: [JS] Ensure Float16 is zero-copy, add more native BigInt …
Browse files Browse the repository at this point in the history
…support

This started as a continuation of apache#3634, but grew enough to deserve its own PR. I've made a PR to my own fork that highlights just the changes here: trxcllnt#8. I'll rebase this PR after apache#3634 is merged so only these changes are included.

This PR reverts the behavior of `Float16Vector#toArray()` back to returning a zero-copy slice of the underlying `Uint16Array` data, and exposes the copying behavior via new `toFloat32Array()` and `toFloat64Array()` methods. `Float16Array.from()` will also convert any incoming 32 or 64-bit floats to Uint16s if necessary.

It also adds tighter integration with the new `BigInt`, `BigInt64Array`, and `BigUint64Array` primitives (if available):
1. Use the native `BigInt` to convert/stringify i64s/u64s
2. Support the `BigInt` type in element comparator and `indexOf()`
3. Add zero-copy `toBigInt64Array()` and `toBigUint64Array()` methods to `Int64Vector` and `Uint64Vector`, respectively

0.4.0 added support for basic conversion to the native `BigInt` when available, but would only create positive `BigInts`, and was slower than necessary. This PR uses the native Arrays to create the BigInts, so we should see some speed ups there. Ex:

```ts
const vec = Int64Vector.from(new Int32Array([-1, 2147483647]))
const big = vec.get(0)
assert(big[0] === -1) // true
assert(big[1] === 2147483647) // true
const num = 0n + big // or BigInt(big)
assert(num === (2n ** 63n - 1n)) // true
```

JIRAs associated with this PR are:
* [ARROW-4578](https://issues.apache.org/jira/browse/ARROW-4578) - Float16Vector toArray should be zero-copy
* [ARROW-4579](https://issues.apache.org/jira/browse/ARROW-4579) - Add more interop with BigInt/BigInt64Array/BigUint64Array
* [ARROW-4580](https://issues.apache.org/jira/browse/ARROW-4580) - Accept Iterables in IntVector/FloatVector from() signatures

Author: ptaylor <[email protected]>

Closes apache#3653 from trxcllnt/js/int-and-float-fixes and squashes the following commits:

69ee6f7 <ptaylor> cleanup after rebase
f44e97b <ptaylor> ensure truncated bitmap size isn't larger than it should be
7ac081a <ptaylor> fix lint
6046e66 <ptaylor> remove more getters in favor of readonly direct property accesses
94d5633 <ptaylor> support BigInt in comparitor/indexOf
760a219 <ptaylor> update BN to use BigIntArrays for signed/unsigned 64bit integers if possible
77fcd40 <ptaylor> add initial BigInt64Array and BigUint64Array support
d561204 <ptaylor> ensure Float16Vector.toArray() is zero-copy again, add toFloat32Array() and toFloat64Array() methods instead
854ae66 <ptaylor> ensure Int/FloatVector.from return signatures are as specific as possible, and accept Iterable<number>
4656ea5 <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns
69abf40 <ptaylor> add initial RecordBatch.new and select tests
9c7ed3d <ptaylor> guard against out-of-bounds selections
a4222f8 <ptaylor> clean up: eliminate more getters in favor of read-only properties
8eabb1c <ptaylor> clean up/speed up: move common argument flattening methods into a utility file
b3b4f1f <ptaylor> add Table and Schema assign() impls
79f9db1 <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index
  • Loading branch information
trxcllnt authored and TheNeuralBit committed Feb 23, 2019
1 parent 48f7b36 commit a26cf7a
Show file tree
Hide file tree
Showing 14 changed files with 749 additions and 323 deletions.
1 change: 0 additions & 1 deletion js/.vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@
// "test/unit/vector/vector-tests.ts",
// "test/unit/vector/bool-vector-tests.ts",
// "test/unit/vector/date-vector-tests.ts",
// "test/unit/vector/float16-vector-tests.ts",
// "test/unit/vector/numeric-vector-tests.ts",

// "test/unit/visitor-tests.ts",
Expand Down
41 changes: 37 additions & 4 deletions js/src/interfaces.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,19 @@ import * as type from './type';
import { DataType } from './type';
import * as vecs from './vector/index';

/** @ignore */ type FloatArray = Float32Array | Float64Array;
/** @ignore */ type IntArray = Int8Array | Int16Array | Int32Array;
/** @ignore */ type UintArray = Uint8Array | Uint16Array | Uint32Array | Uint8ClampedArray;
/** @ignore */
export interface ArrayBufferViewConstructor<T extends ArrayBufferView> {
export type TypedArray = FloatArray | IntArray | UintArray;
export type BigIntArray = BigInt64Array | BigUint64Array;

/** @ignore */
export interface TypedArrayConstructor<T extends TypedArray> {
readonly prototype: T;
new(length: number): T;
new(arrayOrArrayBuffer: ArrayLike<number> | ArrayBufferLike): T;
new(buffer: ArrayBufferLike, byteOffset: number, length?: number): T;
new(length?: number): T;
new(array: Iterable<number>): T;
new(buffer: ArrayBufferLike, byteOffset?: number, length?: number): T;
/**
* The size in bytes of each element in the array.
*/
Expand All @@ -43,6 +50,32 @@ export interface ArrayBufferViewConstructor<T extends ArrayBufferView> {
* @param thisArg Value of 'this' used to invoke the mapfn.
*/
from(arrayLike: ArrayLike<number>, mapfn?: (v: number, k: number) => number, thisArg?: any): T;
from<U>(arrayLike: ArrayLike<U>, mapfn: (v: U, k: number) => number, thisArg?: any): T;
}

/** @ignore */
export interface BigIntArrayConstructor<T extends BigIntArray> {
readonly prototype: T;
new(length?: number): T;
new(array: Iterable<bigint>): T;
new(buffer: ArrayBufferLike, byteOffset?: number, length?: number): T;
/**
* The size in bytes of each element in the array.
*/
readonly BYTES_PER_ELEMENT: number;
/**
* Returns a new array from a set of elements.
* @param items A set of elements to include in the new array object.
*/
of(...items: bigint[]): T;
/**
* Creates an array from an array-like or iterable object.
* @param arrayLike An array-like or iterable object to convert to an array.
* @param mapfn A mapping function to call on every element of the array.
* @param thisArg Value of 'this' used to invoke the mapfn.
*/
from(arrayLike: ArrayLike<bigint>, mapfn?: (v: bigint, k: number) => bigint, thisArg?: any): T;
from<U>(arrayLike: ArrayLike<U>, mapfn: (v: U, k: number) => bigint, thisArg?: any): T;
}

/** @ignore */
Expand Down
221 changes: 118 additions & 103 deletions js/src/type.ts

Large diffs are not rendered by default.

17 changes: 7 additions & 10 deletions js/src/util/bit.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,21 +37,19 @@ export function truncateBitmap(offset: number, length: number, bitmap: Uint8Arra
const alignedSize = (bitmap.byteLength + 7) & ~7;
if (offset > 0 || bitmap.byteLength < alignedSize) {
const bytes = new Uint8Array(alignedSize);
bytes.set((offset % 8 === 0)
// If the offset is a multiple of 8 bits, it's safe to slice the bitmap
? bitmap.subarray(offset >> 3)
// If the offset is a multiple of 8 bits, it's safe to slice the bitmap
bytes.set(offset % 8 === 0 ? bitmap.subarray(offset >> 3) :
// Otherwise iterate each bit from the offset and return a new one
: packBools(iterateBits(bitmap, offset, length, null, getBool)));
packBools(iterateBits(bitmap, offset, length, null, getBool)).subarray(0, alignedSize));
return bytes;
}
return bitmap;
}

/** @ignore */
export function packBools(values: Iterable<any>) {
let n = 0, i = 0;
let xs: number[] = [];
let bit = 0, byte = 0;
let i = 0, bit = 0, byte = 0;
for (const value of values) {
value && (byte |= 1 << bit);
if (++bit === 8) {
Expand All @@ -60,10 +58,9 @@ export function packBools(values: Iterable<any>) {
}
}
if (i === 0 || bit > 0) { xs[i++] = byte; }
if (i % 8 && (n = i + 8 - i % 8)) {
do { xs[i] = 0; } while (++i < n);
}
return new Uint8Array(xs);
let b = new Uint8Array((xs.length + 7) & ~7);
b.set(xs);
return b;
}

/** @ignore */
Expand Down
49 changes: 31 additions & 18 deletions js/src/util/bn.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@
// specific language governing permissions and limitations
// under the License.

import { BigIntArray, BigIntArrayConstructor } from '../interfaces';
import { toArrayBufferView, ArrayBufferViewInput } from './buffer';
import { BigIntAvailable, BigInt64Array, BigUint64Array } from './compat';

/** @ignore */
type BigNumArray = IntArray | UintArray;
Expand All @@ -26,21 +28,23 @@ type UintArray = Uint8Array | Uint16Array | Uint32Array | Uint8ClampedArray;

/** @ignore */
const BigNumNMixin = {
toJSON(this: BN<BigNumArray>, ) { return `"${bignumToString(this)}"`; },
valueOf(this: BN<BigNumArray>, ) { return bignumToNumber(this); },
toString(this: BN<BigNumArray>, ) { return bignumToString(this); },
toJSON(this: BN<BigNumArray>) { return `"${bignumToString(this)}"`; },
valueOf(this: BN<BigNumArray>) { return bignumToNumber(this); },
toString(this: BN<BigNumArray>) { return bignumToString(this); },
[Symbol.toPrimitive]<T extends BN<BigNumArray>>(this: T, hint: 'string' | 'number' | 'default') {
if (hint === 'number') { return bignumToNumber(this); }
/** @suppress {missingRequire} */
return hint === 'string' || typeof BigInt !== 'function' ?
bignumToString(this) : BigInt(bignumToString(this));
switch (hint) {
case 'number': return bignumToNumber(this);
case 'string': return bignumToString(this);
case 'default': return bignumToBigInt(this);
}
return bignumToString(this);
}
};

/** @ignore */
const SignedBigNumNMixin: any = Object.assign({}, BigNumNMixin, { signed: true });
const SignedBigNumNMixin: any = Object.assign({}, BigNumNMixin, { signed: true, BigIntArray: BigInt64Array });
/** @ignore */
const UnsignedBigNumNMixin: any = Object.assign({}, BigNumNMixin, { signed: false });
const UnsignedBigNumNMixin: any = Object.assign({}, BigNumNMixin, { signed: false, BigIntArray: BigUint64Array });

/** @ignore */
export class BN<T extends BigNumArray> {
Expand Down Expand Up @@ -74,6 +78,7 @@ export interface BN<T extends BigNumArray> extends TypedArrayLike<T> {
new<T extends ArrayBufferViewInput>(buffer: T, signed?: boolean): T;

readonly signed: boolean;
readonly BigIntArray: BigIntArrayConstructor<BigIntArray>;

[Symbol.toStringTag]:
'Int8Array' |
Expand Down Expand Up @@ -108,31 +113,39 @@ function bignumToNumber<T extends BN<BigNumArray>>({ buffer, byteOffset, length
let words = new Uint32Array(buffer, byteOffset, length);
for (let i = 0, n = words.length; i < n;) {
int64 += words[i++] + (words[i++] * (i ** 32));
// int64 += (words[i++] >>> 0) + (words[i++] * (i ** 32));
}
return int64;
}

/** @ignore */
function bignumToString<T extends BN<BigNumArray>>({ buffer, byteOffset, length }: T) {
let bignumToString: { <T extends BN<BigNumArray>>(a: T): string; };
/** @ignore */
let bignumToBigInt: { <T extends BN<BigNumArray>>(a: T): bigint; };

if (!BigIntAvailable) {
bignumToString = decimalToString;
bignumToBigInt = <any> bignumToString;
} else {
bignumToBigInt = (<T extends BN<BigNumArray>>(a: T) => a.length === 2 ? new a.BigIntArray(a.buffer, a.byteOffset, 1)[0] : <any>decimalToString(a));
bignumToString = (<T extends BN<BigNumArray>>(a: T) => a.length === 2 ? `${new a.BigIntArray(a.buffer, a.byteOffset, 1)[0]}` : decimalToString(a));
}

let string = '', i = -1;
function decimalToString<T extends BN<BigNumArray>>(a: T) {
let digits = '';
let base64 = new Uint32Array(2);
let base32 = new Uint16Array(buffer, byteOffset, length * 2);
let base32 = new Uint16Array(a.buffer, a.byteOffset, a.length * 2);
let checks = new Uint32Array((base32 = new Uint16Array(base32).reverse()).buffer);
let n = base32.length - 1;

let i = -1, n = base32.length - 1;
do {
for (base64[0] = base32[i = 0]; i < n;) {
base32[i++] = base64[1] = base64[0] / 10;
base64[0] = ((base64[0] - base64[1] * 10) << 16) + base32[i];
}
base32[i] = base64[1] = base64[0] / 10;
base64[0] = base64[0] - base64[1] * 10;
string = `${base64[0]}${string}`;
digits = `${base64[0]}${digits}`;
} while (checks[0] || checks[1] || checks[2] || checks[3]);

return string ? string : `0`;
return digits ? digits : `0`;
}

/** @ignore */
Expand Down
36 changes: 31 additions & 5 deletions js/src/util/buffer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@
import { flatbuffers } from 'flatbuffers';
import { encodeUtf8 } from '../util/utf8';
import ByteBuffer = flatbuffers.ByteBuffer;
import { ArrayBufferViewConstructor } from '../interfaces';
import { isPromise, isIterable, isAsyncIterable, isIteratorResult } from './compat';
import { TypedArray, TypedArrayConstructor } from '../interfaces';
import { BigIntArray, BigIntArrayConstructor } from '../interfaces';
import { isPromise, isIterable, isAsyncIterable, isIteratorResult, BigInt64Array, BigUint64Array } from './compat';

/** @ignore */
const SharedArrayBuf = (typeof SharedArrayBuffer !== 'undefined' ? SharedArrayBuffer : ArrayBuffer);
Expand Down Expand Up @@ -92,7 +93,9 @@ export type ArrayBufferViewInput = ArrayBufferView | ArrayBufferLike | ArrayBuff
ReadableStreamReadResult<ArrayBufferView | ArrayBufferLike | ArrayBufferView | Iterable<number> | ArrayLike<number> | ByteBuffer | string | null | undefined> ;

/** @ignore */
export function toArrayBufferView<T extends ArrayBufferView>(ArrayBufferViewCtor: ArrayBufferViewConstructor<T>, input: ArrayBufferViewInput): T {
export function toArrayBufferView<T extends TypedArray>(ArrayBufferViewCtor: TypedArrayConstructor<T>, input: ArrayBufferViewInput): T;
export function toArrayBufferView<T extends BigIntArray>(ArrayBufferViewCtor: BigIntArrayConstructor<T>, input: ArrayBufferViewInput): T;
export function toArrayBufferView(ArrayBufferViewCtor: any, input: ArrayBufferViewInput) {

let value: any = isIteratorResult(input) ? input.value : input;

Expand All @@ -114,21 +117,44 @@ export function toArrayBufferView<T extends ArrayBufferView>(ArrayBufferViewCtor
/** @ignore */ export const toInt8Array = (input: ArrayBufferViewInput) => toArrayBufferView(Int8Array, input);
/** @ignore */ export const toInt16Array = (input: ArrayBufferViewInput) => toArrayBufferView(Int16Array, input);
/** @ignore */ export const toInt32Array = (input: ArrayBufferViewInput) => toArrayBufferView(Int32Array, input);
/** @ignore */ export const toBigInt64Array = (input: ArrayBufferViewInput) => toArrayBufferView(BigInt64Array, input);
/** @ignore */ export const toUint8Array = (input: ArrayBufferViewInput) => toArrayBufferView(Uint8Array, input);
/** @ignore */ export const toUint16Array = (input: ArrayBufferViewInput) => toArrayBufferView(Uint16Array, input);
/** @ignore */ export const toUint32Array = (input: ArrayBufferViewInput) => toArrayBufferView(Uint32Array, input);
/** @ignore */ export const toBigUint64Array = (input: ArrayBufferViewInput) => toArrayBufferView(BigUint64Array, input);
/** @ignore */ export const toFloat32Array = (input: ArrayBufferViewInput) => toArrayBufferView(Float32Array, input);
/** @ignore */ export const toFloat64Array = (input: ArrayBufferViewInput) => toArrayBufferView(Float64Array, input);
/** @ignore */ export const toUint8ClampedArray = (input: ArrayBufferViewInput) => toArrayBufferView(Uint8ClampedArray, input);

/** @ignore */
export const toFloat16Array = (input: ArrayBufferViewInput) => {
let floats: Float32Array | Float64Array | null = null;
if (ArrayBuffer.isView(input)) {
switch (input.constructor) {
case Float32Array: floats = input as Float32Array; break;
case Float64Array: floats = input as Float64Array; break;
}
} else if (isIterable(input)) {
floats = toFloat64Array(input);
}
if (floats) {
const u16s = new Uint16Array(floats.length);
for (let i = -1, n = u16s.length; ++i < n;) {
u16s[i] = (floats[i] * 32767) + 32767;
}
return u16s;
}
return toUint16Array(input);
};

/** @ignore */
type ArrayBufferViewIteratorInput = Iterable<ArrayBufferViewInput> | ArrayBufferViewInput;

/** @ignore */
const pump = <T extends Iterator<any> | AsyncIterator<any>>(iterator: T) => { iterator.next(); return iterator; };

/** @ignore */
export function* toArrayBufferViewIterator<T extends ArrayBufferView>(ArrayCtor: ArrayBufferViewConstructor<T>, source: ArrayBufferViewIteratorInput) {
export function* toArrayBufferViewIterator<T extends TypedArray>(ArrayCtor: TypedArrayConstructor<T>, source: ArrayBufferViewIteratorInput) {

const wrap = function*<T>(x: T) { yield x; };
const buffers: Iterable<ArrayBufferViewInput> =
Expand Down Expand Up @@ -160,7 +186,7 @@ export function* toArrayBufferViewIterator<T extends ArrayBufferView>(ArrayCtor:
type ArrayBufferViewAsyncIteratorInput = AsyncIterable<ArrayBufferViewInput> | Iterable<ArrayBufferViewInput> | PromiseLike<ArrayBufferViewInput> | ArrayBufferViewInput;

/** @ignore */
export async function* toArrayBufferViewAsyncIterator<T extends ArrayBufferView>(ArrayCtor: ArrayBufferViewConstructor<T>, source: ArrayBufferViewAsyncIteratorInput): AsyncIterableIterator<T> {
export async function* toArrayBufferViewAsyncIterator<T extends TypedArray>(ArrayCtor: TypedArrayConstructor<T>, source: ArrayBufferViewAsyncIteratorInput): AsyncIterableIterator<T> {

// if a Promise, unwrap the Promise and iterate the resolved value
if (isPromise<ArrayBufferViewInput>(source)) {
Expand Down
37 changes: 37 additions & 0 deletions js/src/util/compat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,43 @@ export interface Observable<T> {
subscribe: (observer: Observer<T>) => Subscription;
}

/** @ignore */
const [BigIntCtor, BigIntAvailable] = (() => {
const BigIntUnavailableError = () => { throw new Error('BigInt is not available in this environment'); };
function BigIntUnavailable() { throw BigIntUnavailableError(); }
BigIntUnavailable.asIntN = () => { throw BigIntUnavailableError(); };
BigIntUnavailable.asUintN = () => { throw BigIntUnavailableError(); };
return typeof BigInt !== 'undefined' ? [BigInt, true] : [<any> BigIntUnavailable, false];
})() as [BigIntConstructor, boolean];

/** @ignore */
const [BigInt64ArrayCtor, BigInt64ArrayAvailable] = (() => {
const BigInt64ArrayUnavailableError = () => { throw new Error('BigInt64Array is not available in this environment'); };
class BigInt64ArrayUnavailable {
static get BYTES_PER_ELEMENT() { return 8; }
static of() { throw BigInt64ArrayUnavailableError(); }
static from() { throw BigInt64ArrayUnavailableError(); }
constructor() { throw BigInt64ArrayUnavailableError(); }
}
return typeof BigInt64Array !== 'undefined' ? [BigInt64Array, true] : [<any> BigInt64ArrayUnavailable, false];
})() as [BigInt64ArrayConstructor, boolean];

/** @ignore */
const [BigUint64ArrayCtor, BigUint64ArrayAvailable] = (() => {
const BigUint64ArrayUnavailableError = () => { throw new Error('BigUint64Array is not available in this environment'); };
class BigUint64ArrayUnavailable {
static get BYTES_PER_ELEMENT() { return 8; }
static of() { throw BigUint64ArrayUnavailableError(); }
static from() { throw BigUint64ArrayUnavailableError(); }
constructor() { throw BigUint64ArrayUnavailableError(); }
}
return typeof BigUint64Array !== 'undefined' ? [BigUint64Array, true] : [<any> BigUint64ArrayUnavailable, false];
})() as [BigUint64ArrayConstructor, boolean];

export { BigIntCtor as BigInt, BigIntAvailable };
export { BigInt64ArrayCtor as BigInt64Array, BigInt64ArrayAvailable };
export { BigUint64ArrayCtor as BigUint64Array, BigUint64ArrayAvailable };

/** @ignore */ const isNumber = (x: any) => typeof x === 'number';
/** @ignore */ const isBoolean = (x: any) => typeof x === 'boolean';
/** @ignore */ const isFunction = (x: any) => typeof x === 'function';
Expand Down
10 changes: 8 additions & 2 deletions js/src/util/vector.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import { Vector } from '../vector';
import { Row, kLength } from '../vector/row';
import { compareArrayLike } from '../util/buffer';
import { BigInt, BigIntAvailable } from './compat';

/** @ignore */
type RangeLike = { length: number; stride?: number };
Expand Down Expand Up @@ -59,11 +60,16 @@ export function clampRange<T extends RangeLike, N extends ClampRangeThen<T> = Cl
return then ? then(source, lhs, rhs) : [lhs, rhs];
}

const big0 = BigIntAvailable ? BigInt(0) : 0;

/** @ignore */
export function createElementComparator(search: any) {
let typeofSearch = typeof search;
// Compare primitives
if (search == null || typeof search !== 'object') {
return (value: any) => value === search;
if (typeofSearch !== 'object' || search === null) {
return typeofSearch !== 'bigint'
? (value: any) => value === search
: (value: any) => (big0 + value) === search;
}
// Compare Dates
if (search instanceof Date) {
Expand Down
Loading

0 comments on commit a26cf7a

Please sign in to comment.