From b77e8ae6ff7e987d878f90877421afdcd52dec88 Mon Sep 17 00:00:00 2001
From: Neal Richardson <neal.p.richardson@gmail.com>
Date: Thu, 10 Sep 2020 14:59:13 -0700
Subject: [PATCH] ARROW-9854: [R] Support reading/writing data to/from S3

- [x] read_parquet/feather/etc. from S3 (use FileSystem->OpenInputFile(path))
- [x] write_$FORMAT via FileSystem->OpenOutputStream(path)
- [x] write_dataset (done? at least via URI)
- [x] ~~for linux, an argument to install_arrow to help, assuming you've installed aws-sdk-cpp already (turn on ARROW_S3, AWSSDK_SOURCE=SYSTEM)~~ Turns out there's no official deb/rpm packages for aws-sdk-cpp so there's no value in making this part easier; would be more confusing than helpful actually
- [x] set up a real test bucket and user for e2e testing (credentials available on request)
- [x] add a few tests that use s3, if credentials are set (which I'll set locally)
- [x] add vignette showing how to use s3 (via URI)
- [x] update docs, news

Out of the current scope:

- [ ] testing with minio on CI
- [ ] download dataset, i.e. copy files/directory recursively (needs ARROW-9867, ARROW-9868)
- [ ] friendlier methods for interacting with/viewing a filesystem (ls, mkdir, etc.) (ARROW-9870)
- [ ] direct construction of S3FileSystem object with S3Options (i.e. not only URI) (ARROW-9869)

Closes #8058 from nealrichardson/r-s3

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
---
 r/NEWS.md                   |  9 +++---
 r/R/csv.R                   |  2 +-
 r/R/dataset-factory.R       | 15 ++--------
 r/R/dataset.R               | 13 ++------
 r/R/feather.R               |  6 ++--
 r/R/filesystem.R            | 22 +++++++++++++-
 r/R/io.R                    | 23 +++++++++++++--
 r/R/ipc_stream.R            |  8 ++---
 r/R/parquet.R               |  5 ++--
 r/man/make_readable_file.Rd |  6 +++-
 r/man/read_delim_arrow.Rd   |  2 +-
 r/man/read_feather.Rd       |  4 +--
 r/man/read_ipc_stream.Rd    |  4 +--
 r/man/read_json_arrow.Rd    |  2 +-
 r/man/read_parquet.Rd       |  4 +--
 r/man/write_feather.Rd      |  2 +-
 r/man/write_ipc_stream.Rd   |  2 +-
 r/man/write_parquet.Rd      |  3 +-
 r/tests/testthat/test-s3.R  | 52 ++++++++++++++++++++++++++++++++
 r/vignettes/fs.Rmd          | 59 +++++++++++++++++++++++++++++++++++++
 20 files changed, 191 insertions(+), 52 deletions(-)
 create mode 100644 r/tests/testthat/test-s3.R
 create mode 100644 r/vignettes/fs.Rmd

diff --git a/r/NEWS.md b/r/NEWS.md
index b3c423201fbde..5b2eac82539c5 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -25,6 +25,11 @@
 * Datasets now have `head()`, `tail()`, and take (`[`) methods. `head()` is optimized but the others  may not be performant.
 * `collect()` gains an `as_data_frame` argument, default `TRUE` but when `FALSE` allows you to evaluate the accumulated `select` and `filter` query but keep the result in Arrow, not an R `data.frame`
 
+## AWS S3 support
+
+* S3 support is now enabled in binary macOS and Windows (Rtools40 only, i.e. R >= 4.0) packages. To enable it on Linux, you will need to build and install `aws-sdk-cpp` from source, then set the environment variable `EXTRA_CMAKE_FLAGS="-DARROW_S3=ON -DAWSSDK_SOURCE=SYSTEM"` prior to building the R package (with bundled C++ build, not with Arrow system libraries) from source.
+* File readers and writers (`read_parquet()`, `write_feather()`, et al.) now accept an `s3://` URI as the source or destination file, as do `open_dataset()` and `write_dataset()`. See `vignette("fs", package = "arrow")` for details.
+
 ## Computation
 
 * Comparison (`==`, `>`, etc.) and boolean (`&`, `|`, `!`) operations, along with `is.na`, `%in%` and `match` (called `match_arrow()`), on Arrow Arrays and ChunkedArrays are now implemented in the C++ library.
@@ -32,10 +37,6 @@
 * `dplyr` filter expressions on Arrow Tables and RecordBatches are now evaluated in the C++ library, rather than by pulling data into R and evaluating. This yields significant performance improvements.
 * `dim()` (`nrow`) for dplyr queries on Table/RecordBatch is now supported
 
-## Packaging
-
-* S3 support is now enabled in binary macOS and Windows (Rtools40 only, i.e. R >= 4.0) packages
-
 ## Other improvements
 
 * `arrow` now depends on [`cpp11`](https://cpp11.r-lib.org/), which brings more robust UTF-8 handling and faster compilation
diff --git a/r/R/csv.R b/r/R/csv.R
index e145a907e28c1..62dfad7d52906 100644
--- a/r/R/csv.R
+++ b/r/R/csv.R
@@ -32,7 +32,7 @@
 #' `parse_options`, `convert_options`, or `read_options` arguments, or you can
 #' use [CsvTableReader] directly for lower-level access.
 #'
-#' @param file A character file name, `raw` vector, or an Arrow input stream.
+#' @param file A character file name or URI, `raw` vector, or an Arrow input stream.
 #' If a file name, a memory-mapped Arrow [InputStream] will be opened and
 #' closed when finished; compression will be detected from the file extension
 #' and handled automatically. If an input stream is provided, it will be left
diff --git a/r/R/dataset-factory.R b/r/R/dataset-factory.R
index 767f0b7c02a42..00039faed0fd2 100644
--- a/r/R/dataset-factory.R
+++ b/r/R/dataset-factory.R
@@ -48,17 +48,8 @@ DatasetFactory$create <- function(x,
     stop("'x' must be a string or a list of DatasetFactory", call. = FALSE)
   }
 
-  if (!inherits(filesystem, "FileSystem")) {
-    if (grepl("://", x)) {
-      fs_from_uri <- FileSystem$from_uri(x)
-      filesystem <- fs_from_uri$fs
-      x <- fs_from_uri$path
-    } else {
-      filesystem <- LocalFileSystem$create()
-      x <- clean_path_abs(x)
-    }
-  }
-  selector <- FileSelector$create(x, allow_not_found = FALSE, recursive = TRUE)
+  path_and_fs <- get_path_and_filesystem(x, filesystem)
+  selector <- FileSelector$create(path_and_fs$path, allow_not_found = FALSE, recursive = TRUE)
 
   if (is.character(format)) {
     format <- FileFormat$create(match.arg(format), ...)
@@ -74,7 +65,7 @@ DatasetFactory$create <- function(x,
       partitioning <- DirectoryPartitioningFactory$create(partitioning)
     }
   }
-  FileSystemDatasetFactory$create(filesystem, selector, format, partitioning)
+  FileSystemDatasetFactory$create(path_and_fs$fs, selector, format, partitioning)
 }
 
 #' Create a DatasetFactory
diff --git a/r/R/dataset.R b/r/R/dataset.R
index ec86dc56f083a..7661c33292e8c 100644
--- a/r/R/dataset.R
+++ b/r/R/dataset.R
@@ -164,17 +164,8 @@ Dataset <- R6Class("Dataset", inherit = ArrowObject,
     NewScan = function() unique_ptr(ScannerBuilder, dataset___Dataset__NewScan(self)),
     ToString = function() self$schema$ToString(),
     write = function(path, filesystem = NULL, schema = self$schema, format, partitioning, ...) {
-      if (!inherits(filesystem, "FileSystem")) {
-        if (grepl("://", path)) {
-          fs_from_uri <- FileSystem$from_uri(path)
-          filesystem <- fs_from_uri$fs
-          path <- fs_from_uri$path
-        } else {
-          filesystem <- LocalFileSystem$create()
-          path <- clean_path_abs(path)
-        }
-      }
-      dataset___Dataset__Write(self, schema, format, filesystem, path, partitioning)
+      path_and_fs <- get_path_and_filesystem(path, filesystem)
+      dataset___Dataset__Write(self, schema, format, path_and_fs$fs, path_and_fs$path, partitioning)
       invisible(self)
     }
   ),
diff --git a/r/R/feather.R b/r/R/feather.R
index 9b8dc8c512100..7026de4dbabfc 100644
--- a/r/R/feather.R
+++ b/r/R/feather.R
@@ -24,7 +24,7 @@
 #' and the version 2 specification, which is the Apache Arrow IPC file format.
 #'
 #' @param x `data.frame`, [RecordBatch], or [Table]
-#' @param sink A string file path or [OutputStream]
+#' @param sink A string file path, URI, or [OutputStream]
 #' @param version integer Feather file version. Version 2 is the current.
 #' Version 1 is the more limited legacy format.
 #' @param chunk_size For V2 files, the number of rows that each chunk of data
@@ -106,7 +106,7 @@ write_feather <- function(x,
   assert_is(x, "Table")
 
   if (is.string(sink)) {
-    sink <- FileOutputStream$create(sink)
+    sink <- make_output_stream(sink)
     on.exit(sink$close())
   }
   assert_is(sink, "OutputStream")
@@ -142,7 +142,7 @@ write_feather <- function(x,
 #' df <- read_feather(tf, col_select = starts_with("d"))
 #' }
 read_feather <- function(file, col_select = NULL, as_data_frame = TRUE, ...) {
-  if (!inherits(file, "InputStream")) {
+  if (!inherits(file, "RandomAccessFile")) {
     file <- make_readable_file(file)
     on.exit(file$close())
   }
diff --git a/r/R/filesystem.R b/r/R/filesystem.R
index f0e123ac4cd0d..4cde03eb6b42a 100644
--- a/r/R/filesystem.R
+++ b/r/R/filesystem.R
@@ -228,7 +228,7 @@ FileSystem <- R6Class("FileSystem", inherit = ArrowObject,
       shared_ptr(InputStream, fs___FileSystem__OpenInputStream(self, clean_path_rel(path)))
     },
     OpenInputFile = function(path) {
-      shared_ptr(InputStream, fs___FileSystem__OpenInputFile(self, clean_path_rel(path)))
+      shared_ptr(RandomAccessFile, fs___FileSystem__OpenInputFile(self, clean_path_rel(path)))
     },
     OpenOutputStream = function(path) {
       shared_ptr(OutputStream, fs___FileSystem__OpenOutputStream(self, clean_path_rel(path)))
@@ -242,11 +242,31 @@ FileSystem <- R6Class("FileSystem", inherit = ArrowObject,
   )
 )
 FileSystem$from_uri <- function(uri) {
+  assert_that(is.string(uri))
   out <- fs___FileSystemFromUri(uri)
   out$fs <- shared_ptr(FileSystem, out$fs)$..dispatch()
   out
 }
 
+get_path_and_filesystem <- function(x, filesystem = NULL) {
+  # Wrapper around FileSystem$from_uri that handles local paths
+  # and an optional explicit filesystem
+  assert_that(is.string(x))
+  if (is_url(x)) {
+    if (!is.null(filesystem)) {
+      # Stop? Can't have URL (which yields a fs) and another fs
+    }
+    FileSystem$from_uri(x)
+  } else {
+    list(
+      fs = filesystem %||% LocalFileSystem$create(),
+      path = clean_path_abs(x)
+    )
+  }
+}
+
+is_url <- function(x) grepl("://", x)
+
 #' @usage NULL
 #' @format NULL
 #' @rdname FileSystem
diff --git a/r/R/io.R b/r/R/io.R
index c14c5ce1abcd6..3b607a4e2b74f 100644
--- a/r/R/io.R
+++ b/r/R/io.R
@@ -224,15 +224,25 @@ mmap_open <- function(path, mode = c("read", "write", "readwrite")) {
 #' with this compression codec, either a [Codec] or the string name of one.
 #' If `NULL` (default) and `file` is a string file name, the function will try
 #' to infer compression from the file extension.
+#' @param filesystem If not `NULL`, `file` will be opened via the
+#' `filesystem$OpenInputFile()` filesystem method, rather than the `io` module's
+#' `MemoryMappedFile` or `ReadableFile` constructors.
 #' @return An `InputStream` or a subclass of one.
 #' @keywords internal
-make_readable_file <- function(file, mmap = TRUE, compression = NULL) {
+make_readable_file <- function(file, mmap = TRUE, compression = NULL, filesystem = NULL) {
   if (is.string(file)) {
+    if (is_url(file)) {
+      fs_and_path <- FileSystem$from_uri(file)
+      filesystem <- fs_and_path$fs
+      file <- fs_and_path$path
+    }
     if (is.null(compression)) {
       # Infer compression from the file path
       compression <- detect_compression(file)
     }
-    if (isTRUE(mmap)) {
+    if (!is.null(filesystem)) {
+      file <- filesystem$OpenInputFile(file)
+    } else if (isTRUE(mmap)) {
       file <- mmap_open(file)
     } else {
       file <- ReadableFile$create(file)
@@ -247,6 +257,15 @@ make_readable_file <- function(file, mmap = TRUE, compression = NULL) {
   file
 }
 
+make_output_stream <- function(x) {
+  if (is_url(x)) {
+    fs_and_path <- FileSystem$from_uri(x)
+    fs_and_path$fs$OpenOutputStream(fs_and_path$path)
+  } else {
+    FileOutputStream$create(x)
+  }
+}
+
 detect_compression <- function(path) {
   assert_that(is.string(path))
   switch(tools::file_ext(path),
diff --git a/r/R/ipc_stream.R b/r/R/ipc_stream.R
index 0c728b26b5341..618ace52f49e2 100644
--- a/r/R/ipc_stream.R
+++ b/r/R/ipc_stream.R
@@ -41,7 +41,7 @@ write_ipc_stream <- function(x, sink, ...) {
     x <- Table$create(x)
   }
   if (is.string(sink)) {
-    sink <- FileOutputStream$create(sink)
+    sink <- make_output_stream(sink)
     on.exit(sink$close())
   }
   assert_is(sink, "OutputStream")
@@ -82,10 +82,10 @@ write_to_raw <- function(x, format = c("stream", "file")) {
 #' `read_arrow()`, a wrapper around `read_ipc_stream()` and `read_feather()`,
 #' is deprecated. You should explicitly choose
 #' the function that will read the desired IPC format (stream or file) since
-#' a file or `InputStream` may contain either. 
+#' a file or `InputStream` may contain either.
 #'
-#' @param file A character file name, `raw` vector, or an Arrow input stream.
-#' If a file name, a memory-mapped Arrow [InputStream] will be opened and
+#' @param file A character file name or URI, `raw` vector, or an Arrow input stream.
+#' If a file name or URI, an Arrow [InputStream] will be opened and
 #' closed when finished. If an input stream is provided, it will be left
 #' open.
 #' @param as_data_frame Should the function return a `data.frame` (default) or
diff --git a/r/R/parquet.R b/r/R/parquet.R
index caf93f5284b92..0b6357316eed7 100644
--- a/r/R/parquet.R
+++ b/r/R/parquet.R
@@ -59,7 +59,8 @@ read_parquet <- function(file,
 #' This function enables you to write Parquet files from R.
 #'
 #' @param x An [arrow::Table][Table], or an object convertible to it.
-#' @param sink an [arrow::io::OutputStream][OutputStream] or a string which is interpreted as a file path
+#' @param sink an [arrow::io::OutputStream][OutputStream] or a string
+#'   interpreted as a file path or URI
 #' @param chunk_size chunk size in number of rows. If NULL, the total number of rows is used.
 #' @param version parquet version, "1.0" or "2.0". Default "1.0". Numeric values
 #'   are coerced to character.
@@ -129,7 +130,7 @@ write_parquet <- function(x,
   }
 
   if (is.string(sink)) {
-    sink <- FileOutputStream$create(sink)
+    sink <- make_output_stream(sink)
     on.exit(sink$close())
   } else if (!inherits(sink, "OutputStream")) {
     abort("sink must be a file path or an OutputStream")
diff --git a/r/man/make_readable_file.Rd b/r/man/make_readable_file.Rd
index 11d302c0b04d1..fe2e29826120d 100644
--- a/r/man/make_readable_file.Rd
+++ b/r/man/make_readable_file.Rd
@@ -4,7 +4,7 @@
 \alias{make_readable_file}
 \title{Handle a range of possible input sources}
 \usage{
-make_readable_file(file, mmap = TRUE, compression = NULL)
+make_readable_file(file, mmap = TRUE, compression = NULL, filesystem = NULL)
 }
 \arguments{
 \item{file}{A character file name, \code{raw} vector, or an Arrow input stream}
@@ -15,6 +15,10 @@ make_readable_file(file, mmap = TRUE, compression = NULL)
 with this compression codec, either a \link{Codec} or the string name of one.
 If \code{NULL} (default) and \code{file} is a string file name, the function will try
 to infer compression from the file extension.}
+
+\item{filesystem}{If not \code{NULL}, \code{file} will be opened via the
+\code{filesystem$OpenInputFile()} filesystem method, rather than the \code{io} module's
+\code{MemoryMappedFile} or \code{ReadableFile} constructors.}
 }
 \value{
 An \code{InputStream} or a subclass of one.
diff --git a/r/man/read_delim_arrow.Rd b/r/man/read_delim_arrow.Rd
index 124abdcb91281..abc2d4b058199 100644
--- a/r/man/read_delim_arrow.Rd
+++ b/r/man/read_delim_arrow.Rd
@@ -59,7 +59,7 @@ read_tsv_arrow(
 )
 }
 \arguments{
-\item{file}{A character file name, \code{raw} vector, or an Arrow input stream.
+\item{file}{A character file name or URI, \code{raw} vector, or an Arrow input stream.
 If a file name, a memory-mapped Arrow \link{InputStream} will be opened and
 closed when finished; compression will be detected from the file extension
 and handled automatically. If an input stream is provided, it will be left
diff --git a/r/man/read_feather.Rd b/r/man/read_feather.Rd
index f507edb456ed9..b84d07f61768b 100644
--- a/r/man/read_feather.Rd
+++ b/r/man/read_feather.Rd
@@ -7,8 +7,8 @@
 read_feather(file, col_select = NULL, as_data_frame = TRUE, ...)
 }
 \arguments{
-\item{file}{A character file name, \code{raw} vector, or an Arrow input stream.
-If a file name, a memory-mapped Arrow \link{InputStream} will be opened and
+\item{file}{A character file name or URI, \code{raw} vector, or an Arrow input stream.
+If a file name or URI, an Arrow \link{InputStream} will be opened and
 closed when finished. If an input stream is provided, it will be left
 open.}
 
diff --git a/r/man/read_ipc_stream.Rd b/r/man/read_ipc_stream.Rd
index 1cc969b922e80..01b64350a8c71 100644
--- a/r/man/read_ipc_stream.Rd
+++ b/r/man/read_ipc_stream.Rd
@@ -10,8 +10,8 @@ read_arrow(file, ...)
 read_ipc_stream(file, as_data_frame = TRUE, ...)
 }
 \arguments{
-\item{file}{A character file name, \code{raw} vector, or an Arrow input stream.
-If a file name, a memory-mapped Arrow \link{InputStream} will be opened and
+\item{file}{A character file name or URI, \code{raw} vector, or an Arrow input stream.
+If a file name or URI, an Arrow \link{InputStream} will be opened and
 closed when finished. If an input stream is provided, it will be left
 open.}
 
diff --git a/r/man/read_json_arrow.Rd b/r/man/read_json_arrow.Rd
index 37fff64daf097..8501b19c392d1 100644
--- a/r/man/read_json_arrow.Rd
+++ b/r/man/read_json_arrow.Rd
@@ -7,7 +7,7 @@
 read_json_arrow(file, col_select = NULL, as_data_frame = TRUE, ...)
 }
 \arguments{
-\item{file}{A character file name, \code{raw} vector, or an Arrow input stream.
+\item{file}{A character file name or URI, \code{raw} vector, or an Arrow input stream.
 If a file name, a memory-mapped Arrow \link{InputStream} will be opened and
 closed when finished; compression will be detected from the file extension
 and handled automatically. If an input stream is provided, it will be left
diff --git a/r/man/read_parquet.Rd b/r/man/read_parquet.Rd
index 6bd7335c40d2e..f4a3897643c2d 100644
--- a/r/man/read_parquet.Rd
+++ b/r/man/read_parquet.Rd
@@ -13,8 +13,8 @@ read_parquet(
 )
 }
 \arguments{
-\item{file}{A character file name, \code{raw} vector, or an Arrow input stream.
-If a file name, a memory-mapped Arrow \link{InputStream} will be opened and
+\item{file}{A character file name or URI, \code{raw} vector, or an Arrow input stream.
+If a file name or URI, an Arrow \link{InputStream} will be opened and
 closed when finished. If an input stream is provided, it will be left
 open.}
 
diff --git a/r/man/write_feather.Rd b/r/man/write_feather.Rd
index e9639480a5d02..e079aeb893434 100644
--- a/r/man/write_feather.Rd
+++ b/r/man/write_feather.Rd
@@ -16,7 +16,7 @@ write_feather(
 \arguments{
 \item{x}{\code{data.frame}, \link{RecordBatch}, or \link{Table}}
 
-\item{sink}{A string file path or \link{OutputStream}}
+\item{sink}{A string file path, URI, or \link{OutputStream}}
 
 \item{version}{integer Feather file version. Version 2 is the current.
 Version 1 is the more limited legacy format.}
diff --git a/r/man/write_ipc_stream.Rd b/r/man/write_ipc_stream.Rd
index 2bf4fdd2430a5..8274eddb3b1e1 100644
--- a/r/man/write_ipc_stream.Rd
+++ b/r/man/write_ipc_stream.Rd
@@ -12,7 +12,7 @@ write_ipc_stream(x, sink, ...)
 \arguments{
 \item{x}{\code{data.frame}, \link{RecordBatch}, or \link{Table}}
 
-\item{sink}{A string file path or \link{OutputStream}}
+\item{sink}{A string file path, URI, or \link{OutputStream}}
 
 \item{...}{extra parameters passed to \code{write_feather()}.}
 }
diff --git a/r/man/write_parquet.Rd b/r/man/write_parquet.Rd
index 0253a2fb5a36e..f532ce06c4c5d 100644
--- a/r/man/write_parquet.Rd
+++ b/r/man/write_parquet.Rd
@@ -22,7 +22,8 @@ write_parquet(
 \arguments{
 \item{x}{An \link[=Table]{arrow::Table}, or an object convertible to it.}
 
-\item{sink}{an \link[=OutputStream]{arrow::io::OutputStream} or a string which is interpreted as a file path}
+\item{sink}{an \link[=OutputStream]{arrow::io::OutputStream} or a string
+interpreted as a file path or URI}
 
 \item{chunk_size}{chunk size in number of rows. If NULL, the total number of rows is used.}
 
diff --git a/r/tests/testthat/test-s3.R b/r/tests/testthat/test-s3.R
new file mode 100644
index 0000000000000..9dfadfdfb5878
--- /dev/null
+++ b/r/tests/testthat/test-s3.R
@@ -0,0 +1,52 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+context("S3 integration tests")
+
+run_these <- tryCatch({
+  if (arrow_with_s3() &&
+      identical(tolower(Sys.getenv("ARROW_R_DEV")), "true") &&
+      !identical(Sys.getenv("AWS_ACCESS_KEY_ID"), "") &&
+      !identical(Sys.getenv("AWS_SECRET_ACCESS_KEY"), "")) {
+    # See if we have access to the test bucket
+    bucket <- FileSystem$from_uri("s3://ursa-labs-r-test?region=us-west-2")
+    bucket$fs$GetFileInfo(bucket$path)
+    TRUE
+  } else {
+    FALSE
+  }
+}, error = function(e) FALSE)
+
+bucket_uri <- function(..., bucket = "s3://ursa-labs-r-test/%s?region=us-west-2") {
+  segments <- paste(..., sep = "/")
+  sprintf(bucket, segments)
+}
+
+if (run_these) {
+  now <- as.numeric(Sys.time())
+  on.exit(bucket$fs$DeleteDir(paste0("ursa-labs-r-test/", now)))
+
+  test_that("read/write Feather on S3", {
+    write_feather(example_data, bucket_uri(now, "test.feather"))
+    expect_identical(read_feather(bucket_uri(now, "test.feather")), example_data)
+  })
+
+  test_that("read/write Parquet on S3", {
+    write_parquet(example_data, bucket_uri(now, "test.parquet"))
+    expect_identical(read_parquet(bucket_uri(now, "test.parquet")), example_data)
+  })
+}
diff --git a/r/vignettes/fs.Rmd b/r/vignettes/fs.Rmd
new file mode 100644
index 0000000000000..03730bc1269a8
--- /dev/null
+++ b/r/vignettes/fs.Rmd
@@ -0,0 +1,59 @@
+---
+title: "Working with Cloud Storage (S3)"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Working with Cloud Storage (S3)}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+The Arrow C++ library includes a generic filesystem interface and specific
+implementations for some cloud storage systems. This setup allows various
+parts of the project to be able to read and write data with different storage
+backends. In the `arrow` R package, support has been enabled for AWS S3 on
+macOS and Windows. This vignette provides an overview of working with S3 data
+using Arrow.
+
+> Note that S3 support is not enabled by default on Linux due to packaging complications. To enable it, you will need to build and install [aws-sdk-cpp](https://aws.amazon.com/sdk-for-cpp/) from source, then set the environment variable `EXTRA_CMAKE_FLAGS="-DARROW_S3=ON -DAWSSDK_SOURCE=SYSTEM"` prior to building the R package (with bundled C++ build, not with Arrow system libraries) from source.
+
+## URIs
+
+File readers and writers (`read_parquet()`, `write_feather()`, et al.)
+now accept an S3 URI as the source or destination file,
+as do `open_dataset()` and `write_dataset()`.
+An S3 URI looks like:
+
+```
+s3://[id:secret@]bucket/path[?region=]
+```
+
+For example, one of the NYC taxi data files used in `vignette("dataset", package = "arrow")` is found at
+
+```
+s3://ursa-labs-taxi-data/2019/06/data.parquet?region=us-east-2
+```
+
+`region` defaults to `us-east-1` and can be omitted if the bucket is in that region.
+
+Given this URI, we can pass it to `read_parquet()` just as if it were a local file path:
+
+```r
+df <- read_parquet("s3://ursa-labs-taxi-data/2019/06/data.parquet?region=us-east-2")
+```
+
+Note that this will be slower to read than if the file were local,
+though if you're running on a machine in the same AWS region as the file in S3,
+the cost of reading the data over the network should be much lower.
+
+## Authentication
+
+To access private S3 buckets, you need two secret parameters:
+a `AWS_ACCESS_KEY_ID`, which is like a user id,
+and `AWS_SECRET_ACCESS_KEY`, like a token.
+There are a few options for passing these credentials:
+
+1. Include them in the URI, like `s3://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@bucket-name/path/to/file`. Be sure to [URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if they contain special characters like "/".
+
+2. Set them as environment variables named `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`.
+
+3. Define them in a `~/.aws/credentials` file, according to the [AWS documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html).