Skip to content

Commit

Permalink
ARROW-4708: [C++] add multithreaded json reader
Browse files Browse the repository at this point in the history
- add Converter for conversion of arrays from the parser
- add ChunkedArrayBuilder for multithreaded conversion of
  arrays produced by BlockParser
- extract BlockParser's builder management code into a separate
  class
- add serial and threaded implementations of TableReader for
  parsing from an InputStream

@pitrou the sequel, unless you'd like me to break this one up as well

Author: Benjamin Kietzman <[email protected]>
Author: Wes McKinney <[email protected]>

Closes apache#4165 from bkietz/4708-Add-multithreaded-JSON-reader.2 and squashes the following commits:

673ecdd <Benjamin Kietzman> run clang-format
e2ad676 <Benjamin Kietzman> Merge branch '4708-Add-multithreaded-JSON-reader.2' of https://github.com/bkietz/arrow into 4708-Add-multithreaded-JSON-reader.2
dba28b8 <Benjamin Kietzman> clearing up conversion errors
90a017d <Wes McKinney> Use real time instead of CPU time
7e43c8e <Benjamin Kietzman> resolve CI failures
74cfd06 <Benjamin Kietzman> Adding further tests and benchmarks for the JSON reader
41d7375 <Benjamin Kietzman> CompareBinary must account for offset when checking emptiness
5b62398 <Benjamin Kietzman> add explicit string conversion for MSVC
bc5ec1a <Benjamin Kietzman> re-add partial/completion processing
0b3dbe1 <Benjamin Kietzman> rewrite converter tests
8793c30 <Benjamin Kietzman> comment: conversion errors caught at parse time
4703cd0 <Benjamin Kietzman> refactor to remove chunk_lengths from ...Builder.Finish
1bba412 <Benjamin Kietzman> ensure dictionary arrays in scalar conversion, init null bitmaps
93856af <Benjamin Kietzman> simplify TableReader impl
e0b35c9 <Benjamin Kietzman> add new factories to Table
fee9dc4 <Benjamin Kietzman> chunked-array-builder: fix converter_ race, store unconverted_fields
6c78b2d <Benjamin Kietzman> fix merge error
b6697b3 <Benjamin Kietzman> refactor RawArrayBuilder management into a separate class
f31e079 <Benjamin Kietzman> adding reader, converter, and chunked-builder back
ebad2ff <Benjamin Kietzman> #include sse-utils in rapidjson-def for sse macros
4984604 <Benjamin Kietzman> use arrow sse macros
e12ee75 <Benjamin Kietzman> correct SSE detection
47d37f7 <Benjamin Kietzman> address review comments
b3b7f5d <Benjamin Kietzman> fix build error
ef624d0 <Benjamin Kietzman> refactoring JSON parser to prepare for multithreaded impl
  • Loading branch information
bkietz authored and pitrou committed May 3, 2019
1 parent a9ae4a9 commit b7054c2
Show file tree
Hide file tree
Showing 20 changed files with 2,248 additions and 425 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,4 @@ pkgs
.Rproj.user
arrow.Rcheck/
docker_cache
.gdb_history
2 changes: 2 additions & 0 deletions cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,9 @@ set(ARROW_SRCS
csv/parser.cc
csv/reader.cc
json/options.cc
json/chunked-builder.cc
json/chunker.cc
json/converter.cc
json/parser.cc
json/reader.cc
io/buffered.cc
Expand Down
14 changes: 14 additions & 0 deletions cpp/src/arrow/array-binary-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -670,6 +670,20 @@ void CheckSliceEquality() {

ASSERT_TRUE(slice->Equals(slice2));
ASSERT_TRUE(array->RangeEquals(5, 25, 0, slice));

ASSERT_OK(builder.Append("a"));
for (int j = 0; j < reps; ++j) {
ASSERT_OK(builder.Append(""));
}
FinishAndCheckPadding(&builder, &array);
slice = array->Slice(1);

for (int j = 0; j < reps; ++j) {
ASSERT_OK(builder.Append(""));
}
FinishAndCheckPadding(&builder, &array);

AssertArraysEqual(*slice, *array);
}

TEST_F(TestBinaryArray, TestSliceEquality) { CheckSliceEquality<BinaryType>(); }
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/compare.cc
Original file line number Diff line number Diff line change
Expand Up @@ -527,7 +527,7 @@ class ArrayEqualsVisitor : public RangeEqualsVisitor {
if (!left.value_data() && !(right.value_data())) {
return true;
}
if (left.value_offset(left.length()) == 0) {
if (left.value_offset(left.length()) == left.value_offset(0)) {
return true;
}

Expand Down
6 changes: 6 additions & 0 deletions cpp/src/arrow/json/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,10 @@ add_arrow_test(chunker-test PREFIX "arrow-json")

add_arrow_benchmark(parser-benchmark PREFIX "arrow-json")

add_arrow_test(converter-test PREFIX "arrow-json")

add_arrow_test(chunked-builder-test PREFIX "arrow-json")

add_arrow_test(reader-test PREFIX "arrow-json")

arrow_install_all_headers("arrow/json")
Loading

0 comments on commit b7054c2

Please sign in to comment.