arrow 0.15.1.9000

Data exploration

Explore a multi-file dataset with open_dataset() and then use dplyr methods to select(), filter(), etc., and work will be done where possible in Arrow memory. When necessary, data is pulled into R for further computation. dplyr methods are conditionally loaded if you have dplyr available; it is not a hard dependency.
Tables and RecordBatches also have dplyr methods.
For exploration without dplyr, [ methods for Tables, RecordBatches, Arrays, and ChunkedArrays now support natural row extraction operations. These use the C++ Filter, Slice, and Take methods for efficient access, depending on the type of selection vector.
An experimental, lazily evaluated array_expression class has also been added, enabling among other things the ability to filter a Table with some function of Arrays, such as arrow_table[arrow_table$var1 > 5, ] without having to pull everything into R first.

write_parquet() now supports compression
codec_is_available() returns TRUE or FALSE whether the Arrow C++ library was built with support for a given compression library (e.g. gzip, lz4, snappy)

This patch release includes bugfixes in the C++ library around dictionary types and Parquet reading.

The R6 classes that wrap the C++ classes are now documented and exported and have been renamed to be more R-friendly. Users of the high-level R interface in this package are not affected. Those who want to interact with the Arrow C++ API more directly should work with these objects and methods. As part of this change, many functions that instantiated these R6 objects have been removed in favor of Class$create() methods. Notably, arrow::array() and arrow::table() have been removed in favor of Array$create() and Table$create(), eliminating the package startup message about masking base functions. For more information, see the new vignette("arrow").
Due to a subtle change in the Arrow message format, data written by the 0.15 version libraries may not be readable by older versions. If you need to send data to a process that uses an older version of Arrow (for example, an Apache Spark server that hasn't yet updated to Arrow 0.15), you can set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1.
The as_tibble argument in the read_*() functions has been renamed to as_data_frame (ARROW-6337, @jameslamb)
The arrow::Column class has been removed, as it was removed from the C++ library

Table and RecordBatch objects have S3 methods that enable you to work with them more like data.frames. Extract columns, subset, and so on. See ?Table and ?RecordBatch for examples.
Initial implementation of bindings for the C++ File System API. (ARROW-6348)
Compressed streams are now supported on Windows (ARROW-6360), and you can also specify a compression level (ARROW-6533)

Parquet file reading is much, much faster, thanks to improvements in the Arrow C++ library.
read_csv_arrow() supports more parsing options, including col_names, na, quoted_na, and skip
read_parquet() and read_feather() can ingest data from a raw vector (ARROW-6278)
File readers now properly handle paths that need expanding, such as ~/file.parquet (ARROW-6323)
Improved support for creating types in a schema: the types' printed names (e.g. "double") are guaranteed to be valid to use in instantiating a schema (e.g. double()), and time types can be created with human-friendly resolution strings ("ms", "s", etc.). (ARROW-6338, ARROW-6364)

Initial CRAN release of the arrow package. Key features include:

Read and write support for various file formats, including Parquet, Feather/Arrow, CSV, and JSON.
API bindings to the C++ library for Arrow data types and objects, as well as mapping between Arrow types and R data types.
Tools for helping with C++ library configuration and installation.