Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-5505: [R] Normalize file and class names, stop masking base R f…
…unctions, add vignette, improve documentation The main thrust of the changes are summarized in the new vignette: > C++ is an object-oriented language, so the core logic of the Arrow library is encapsulated in classes and methods. In the R package, these classes are implemented as `R6` reference classes, most of which are exported from the namespace. > > In order to match the C++ naming conventions, the `R6` classes are in TitleCase, e.g. `RecordBatch`. This makes it easy to look up the relevant C++ implementations in the [code](https://github.com/apache/arrow/tree/master/cpp) or [documentation](https://arrow.apache.org/docs/cpp/). To simplify things in R, the C++ library namespaces are generally dropped or flattened; that is, where the C++ library has `arrow::io::FileOutputStream`, it is just `FileOutputStream` in the R package. One exception is for the file readers, where the namespace is necessary to disambiguate. So `arrow::csv::TableReader` becomes `CsvTableReader`, and `arrow::json::TableReader` becomes `JsonTableReader`. > > Some of these classes are not meant to be instantiated directly; they may be base classes or other kinds of helpers. For those that you should be able to create, use the `$create()` method to instantiate an object. For example, `rb <- RecordBatch$create(int = 1:10, dbl = as.numeric(1:10))` will create a `RecordBatch`. Many of these factory methods that an R user might most often encounter also have a `snake_case` alias, in order to be more familiar for contemporary R users. So `record_batch(int = 1:10, dbl = as.numeric(1:10))` would do the same as `RecordBatch$create()` above. > > The typical user of the `arrow` R package may never deal directly with the `R6` objects. We provide more R-friendly wrapper functions as a higher-level interface to the C++ library. An R user can call `read_parquet()` without knowing or caring that they're instantiating a `ParquetFileReader` object and calling the `$ReadFile()` method on it. The classes are there and available to the advanced programmer who wants fine-grained control over how the C++ library is used. There are a few other fixes and cleanups rolled in here, named in the individual commit messages below. I stopped short of more documentation consolidation because (1) this patch is already huge and (2) `R6` classes are really tedious to document because it's all manual. I did some searching around and found open issues from 2014 and 2015 about supporting R6 better in roxygen2. Closes apache#5279 from nealrichardson/cleaner-class-names and squashes the following commits: 3c6f85b <Neal Richardson> 🐀 22c9d04 <Neal Richardson> More doc cleaning 01084ce <Neal Richardson> Factor out assert_is() caf3265 <Neal Richardson> PR feedback from romain adf1cf9 <Neal Richardson> File renaming (not case-sensitive) 35f00f5 <Neal Richardson> Rename Table.R to table.R 8bd52d7 <Neal Richardson> Rename Struct.R to struct.R 358290b <Neal Richardson> Rename Schema.R to schema.R 924edd1 <Neal Richardson> Rename List.R to list.R 0150d99 <Neal Richardson> Rename Field.R to field.R 8683f10 <Neal Richardson> Add content to vignette from blog post e6b75f4 <Neal Richardson> Consolidate and document reader/writer classes; also fix ARROW-6449 495abf6 <Neal Richardson> Fill in documentation and standardize file naming 5fd49ef <Neal Richardson> Fix check failures 96873e1 <Neal Richardson> Factor out make_readable_file 3e4cfe7 <Neal Richardson> Clean up parquet classes and document the R6 85a8d36 <Neal Richardson> Start vignette draft explaining the class and naming conventions 71cac57 <Neal Richardson> Clean up Rd file names, experiment with documenting constructors, and start updating pkgdown 2d1b738 <Neal Richardson> Replace table() with Table() b694511 <Neal Richardson> Remove defunct Column class 730313e <Neal Richardson> One more find/replace, esp. RecordBatch* 702a0b1 <Neal Richardson> Message 365fedc <Neal Richardson> feather 0e7877b <Neal Richardson> Drop ::ipc:: 55607a6 <Neal Richardson> json 9bd708f <Neal Richardson> csv fbebf27 <Neal Richardson> io 1711d3e <Neal Richardson> CastOptions 12031ad <Neal Richardson> Backfill some methods 4075897 <Neal Richardson> compression 3b4b492 <Neal Richardson> ChunkedArray bbf0799 <Neal Richardson> Buffer 3f1cd71 <Neal Richardson> Object 9fbecda <Neal Richardson> A few more backticks 8edf085 <Neal Richardson> Remove more backticks 1f6d154 <Neal Richardson> Replace array() with Array() 9f52490 <Neal Richardson> Progress commit renaming Array Authored-by: Neal Richardson <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
- Loading branch information