Name	Name	Last commit message	Last commit date
parent directory ..
red-arrow-cuda	red-arrow-cuda
red-arrow-dataset	red-arrow-dataset
red-arrow-flight-sql	red-arrow-flight-sql
red-arrow-flight	red-arrow-flight
red-arrow	red-arrow
red-gandiva	red-gandiva
red-parquet	red-parquet
Gemfile	Gemfile
README.md	README.md
Rakefile	Rakefile

Apache Arrow Ruby

Here are the official Ruby bindings for Apache Arrow.

Red Arrow is the base Apache Arrow bindings.

Red Arrow CUDA is the Apache Arrow bindings of CUDA part.

Red Arrow Dataset is the Apache Arrow Dataset bindings.

Red Gandiva is the Gandiva bindings.

Red Parquet is the Parquet bindings.

Cookbook

Getting Started

gem install red-arrow
gem install red-parquet # for parquet support
gem install red-arrow-dataset # reading from s3 / folders

Create table

From file

require 'arrow'
require 'parquet'

table = Arrow::Table.load('data.arrow')
table = Arrow::Table.load('data.csv', format: :csv)
table = Arrow::Table.load('data.parquet', format: :parquet)

From Ruby hash

Types will be detected automatically

table = Arrow::Table.new('name' => ['Tom', 'Max'], 'age' => [22, 23])

From String

Suppose you have your data available via HTTP. Let's connect to demo ClickHouse DB. See https://play.clickhouse.com/ for details

require 'net/http'

params = {
  query: "SELECT WatchID as watch FROM hits LIMIT 10 FORMAT Arrow",
  user: "play",
  password: "",
  database: "default"
}
uri = URI('https://play.clickhouse.com:443/')
uri.query = URI.encode_www_form(params)
resp = Net::HTTP.get(uri)
table = Arrow::Table.load(Arrow::Buffer.new(resp))

From S3

require 'arrow-dataset'

s3_uri = URI('s3://bucket/public.csv')
Arrow::Table.load(s3_uri)

For private access you can pass access_key and secret_key in following way:

require 'cgi/util'

s3_uri = URI("s3://#{CGI.escape(access_key)}:#{CGI.escape(secret_key)}@bucket/private.parquet")
Arrow::Table.load(s3_uri)

From multiple files in folder

require 'arrow-dataset'

Arrow::Table.load(URI("file:///your/folder/"), format: :parquet)

Filtering

Uses concept of slicers in Arrow

table = Arrow::Table.new(
  'name' => ['Tom', 'Max', 'Kate'],
  'age' => [22, 23, 19]
)
table.slice { |slicer| slicer['age'] > 19 }
# => #<Arrow::Table:0x7fa38838c448 ptr=0x7fa3ad269f40>
#   name	age
# 0	Tom 	 22
# 1	Max 	 23

table.slice { |slicer| slicer['age'].in?(19..22) }
# => #<Arrow::Table:0x7fa3881cf998 ptr=0x7fa3a4bb5f30>
#   name	age
# 0	Tom 	 22
# 1	Kate	 19

Multiple slice conditions can be joined using and(&) / or (|) / xor(^) logical operations

table.slice { |slicer| (slicer['age'] > 19) & (slicer['age'] < 23) }
# => #<Arrow::Table:0x7fa3882cc300 ptr=0x7fa3ad260b00>
#   name	age
# 0	Tom 	 22

Operations

Arrow compute functions can be accessed through Arrow::Function

add = Arrow::Function.find('add')
add.execute([table['age'].data, table['age'].data]).value
# => #<Arrow::ChunkedArray:0x7fa389b87250 ptr=0x7fa3a4bb5c40 [
#   [
#     44,
#     46,
#     38
#   ]
# ]>

Grouping

table = Arrow::Table.new(
  'name' => ['Tom', 'Max', 'Kate', 'Tom'],
  'amount' => [10, 2, 3, 5]
)
table.group('name').sum('amount')
# => #<Arrow::Table:0x7fa389894ae8 ptr=0x7fa364141a50>
#   name	amount
# 0	Kate	     3
# 1	Max 	     2
# 2	Tom 	    15

Joining

amounts = Arrow::Table.new(
  'name' => ['Tom', 'Max', 'Kate'],
  'amount' => [10, 2, 3]
)
levels = Arrow::Table.new(
  'name' => ['Max', 'Kate', 'Tom'],
  'level' => [1, 9, 5]
)
amounts.join(levels, [:name])
# => #<Arrow::Table:0x55d512ceb1b0 ptr=0x55d51262aa70>
# 	name	amount	name	level
# 0	Tom 	    10	Tom 	    5
# 1	Max 	     2	Max 	    1
# 2	Kate	     3	Kate	    9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ruby

ruby

README.md

Apache Arrow Ruby

Cookbook

Getting Started

Create table

From file

From Ruby hash

From String

From S3

From multiple files in folder

Filtering

Operations

Grouping

Joining

Files

ruby

Directory actions

More options

Directory actions

More options

Latest commit

History

ruby

Folders and files

parent directory

README.md

Apache Arrow Ruby

Cookbook

Getting Started

Create table

From file

From Ruby hash

From String

From S3

From multiple files in folder

Filtering

Operations

Grouping

Joining