forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-9235: [R] Support for
connection
class when reading and writi…
…ng files This is a PR to support arbitrary R "connection" objects as Input and Output streams. In particular, this adds support for sockets (ARROW-4512), URLs, and some other IO operations that are implemented as R connections (e.g., in the [archive](https://github.com/r-lib/archive#archive) package). The gist of it is that you should be able to do this: ``` r # remotes::install_github("paleolimbot/arrow/r@r-connections") library(arrow, warn.conflicts = FALSE) addr <- "https://github.com/apache/arrow/raw/master/r/inst/v0.7.1.parquet" stream <- arrow:::make_readable_file(addr) rawToChar(as.raw(stream$Read(4))) #> [1] "PAR1" stream$close() stream <- arrow:::make_readable_file(url(addr, open = "rb")) rawToChar(as.raw(stream$Read(4))) #> [1] "PAR1" stream$close() ``` There are two serious issues that prevent this PR from being useful yet. First, it uses functions that R considers "non-API" functions from the C API. > checking compiled code ... NOTE File ‘arrow/libs/arrow.so’: Found non-API calls to R: ‘R_GetConnection’, ‘R_ReadConnection’, ‘R_WriteConnection’ Compiled code should not call non-API entry points in R. We can get around this by calling back into R (in the same way this PR implements `Tell()` and `Close()`). We could also go all out and implement the other half (exposing `InputStream`/`OutputStream`s as R connections) and ask for an exemption (at least one R package, curl, does this). The archive package seems to expose connections without a NOTE on the CRAN check page, so perhaps there is also a workaround. Second, we get a crash when passing the input stream to most functions. I think this is because the `Read()` method is getting called from another thread but it also could be an error in my implementation. If the issue is threading, we would have to arrange a way to queue jobs for the R main thread (e.g., how the [later](https://github.com/r-lib/later#background-tasks) package does it) and a way to ping it occasionally to fetch the results. This is complicated but might be useful for other reasons (supporting evaluation of R functions in more places). It also might be more work than it's worth. ``` r # remotes::install_github("paleolimbot/arrow/r@r-connections") library(arrow, warn.conflicts = FALSE) addr <- "https://github.com/apache/arrow/raw/master/r/inst/v0.7.1.parquet" read_parquet(addr) ``` ``` *** caught segfault *** address 0x28, cause 'invalid permissions' Traceback: 1: parquet___arrow___FileReader__OpenFile(file, props) ``` Closes apache#12323 from paleolimbot/r-connections Lead-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
- Loading branch information
1 parent
c16bbe1
commit 6cf344b
Showing
15 changed files
with
574 additions
and
45 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.