An implementation of Arrow targeting .NET Standard.
See our current feature matrix for currently available features.
- Arrow 0.11 (specification)
- C# 8
- .NET Standard 1.3
- Asynchronous I/O
- Uses modern .NET runtime features such as Span<T>, Memory<T>, MemoryManager<T>, and System.Buffers primitives for memory allocation, memory storage, and fast serialization.
- Uses Acyclic Visitor Pattern for array types and arrays to facilitate serialization, record batch traversal, and format growth.
- Cannot read Arrow files containing tensors.
- Cannot easily modify allocation strategy without implementing a custom memory pool. All allocations are currently 64-byte aligned and padded to 8-bytes.
- Default memory allocation strategy uses an over-allocation strategy with pointer fixing, which results in significant memory overhead for small buffers. A buffer that requires a single byte for storage may be backed by an allocation of up to 64-bytes to satisfy alignment requirements.
- There are currently few builder APIs available for specific array types. Arrays must be built manually with an arrow buffer builder abstraction.
- FlatBuffer code generation is not included in the build process.
- Serialization implementation does not perform exhaustive validation checks during deserialization in every scenario.
- Throws exceptions with vague, inconsistent, or non-localized messages in many situations
- Throws exceptions that are non-specific to the Arrow implementation in some circumstances where it probably should (eg. does not throw ArrowException exceptions)
- Lack of code documentation
- Lack of usage examples
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;
using Apache.Arrow;
using Apache.Arrow.Ipc;
public static async Task<RecordBatch> ReadArrowAsync(string filename)
{
using (var stream = File.OpenRead(filename))
using (var reader = new ArrowFileReader(stream))
{
var recordBatch = await reader.ReadNextRecordBatchAsync();
Debug.WriteLine("Read record batch with {0} column(s)", recordBatch.ColumnCount);
return recordBatch;
}
}
- Allocations are 64-byte aligned and padded to 8-bytes.
- Allocations are automatically garbage collected
- Int8, Int16, Int32, Int64
- UInt8, UInt16, UInt32, UInt64
- Float, Double, Half-float (.NET 6+)
- Binary (variable-length)
- String (utf-8)
- Null
- Timestamp
- Date32
- Date64
- Decimal
- Time32
- Time64
- Binary (fixed-length)
- List
- Struct
- Data Types
- Fields
- Schema
- File
- Stream
- Buffer compression is not supported when writing IPC files or streams
- Buffer decompression is supported, but requires installing the
Apache.Arrow.Compression
package, and passing anApache.Arrow.Compression.CompressionCodecFactory
instance to theArrowFileReader
orArrowStreamReader
constructor. Alternatively, a custom implementation ofICompressionCodecFactory
can be used.
- Serialization
- Exhaustive validation
- Dictionary Batch
- Cannot serialize files or streams containing dictionary batches
- Dictionary Encoding
- Types
- Tensor
- Arrays
- Union
- Dense
- Sparse
- Union
- Array Operations
- Equality / Comparison
- Casting
- Compute
- There is currently no API available for a compute / kernel abstraction.
Install the latest .NET Core SDK
from https://dotnet.microsoft.com/download.
dotnet build
To build the NuGet package run the following command to build a debug flavor, preview package into the artifacts folder.
dotnet pack
When building the officially released version run: (see Note below about current git
repository)
dotnet pack -c Release
Which will build the final/stable package.
NOTE: When building the officially released version, ensure that your git
repository has the origin
remote set to https://github.com/apache/arrow.git
, which will ensure Source Link is set correctly. See https://github.com/dotnet/sourcelink/blob/main/docs/README.md for more information.
There are two output artifacts:
Apache.Arrow.<version>.nupkg
- this contains the executable assembliesApache.Arrow.<version>.snupkg
- this contains the debug symbols files
Both of these artifacts can then be uploaded to https://www.nuget.org/packages/manage/upload.
Build from the Apache Arrow project root.
docker build -f csharp/build/docker/Dockerfile .
dotnet test
All build artifacts are placed in the artifacts folder in the project root.
This project follows the coding style specified in Coding Style.
See https://google.github.io/flatbuffers/flatbuffers_guide_use_java_c-sharp.html for how to get the flatc
executable.
Run flatc --csharp
on each .fbs
file in the format folder. And replace the checked in .cs
files under FlatBuf with the generated files.
Update the non-generated FlatBuffers .cs
files with the files from the google/flatbuffers repo.