Skip to content

Commit

Permalink
ARROW-7146: [R][CI] Various fixes and speedups for the R docker-compo…
Browse files Browse the repository at this point in the history
…se setup

There's a lot of little things in here that have the effect of cutting 9 minutes off the build time and fixing a few bugs.

* Move R package dependency installation into the Dockerfile for better caching
* Sets MAKEFLAGS to the actual number of cores, not hard-coded to 8 (if this works out well I'll do the same to fix https://issues.apache.org/jira/browse/ARROW-6957)
* Removes hard-coded references to `bionic` so that R installation will work appropriately on other Ubuntu versions; also makes note of a dependency between OS version and availability of R versions for those who might try to run an older R version.
* Installs a lighter version of Romain's `decor` package, which cuts out dozens of unnecessary dependencies. I may remove this altogether, or at least flag it to be off on CI, because we don't need it there, it's just for developers.
* Skips vignette checks, which thereby avoid compiling and installing the R package an extra time (the check is pointless for us because there's no R code in the vignette)
* Removes an unnecessary `R CMD INSTALL`, which compiled the R bindings again (i.e., previously we compiled the R bindings 3 times, now we're down to 1)
* Turns off a check for valid URLs, saving about 30s
* Turns on more verbose error printing when there's a failure, rather than the default truncation at 13 lines.
* Uses the `rcmdcheck` package so we can fail the build on check Warnings (which cause CRAN failures). This was default on Travis.
* Adds some missing UTF-8 configuration to the dockerfile to fix some check Warnings
* Fixes some bad documentation crossreferences from apache#5859 that were causing other Warnings
* Correctly passes `-Werror` into the docker environment, which apache#5940 tried to add
* Restore the `TEST_R_WITH_ARROW` env var (lost in the move from Travis-CI) to make sure we're not just skipping all the tests

Closes apache#5967 from nealrichardson/tune-r-docker and squashes the following commits:

2d691aa <Neal Richardson> Feedback from @kou; also add env vars to appveyor job
64994c9 <Neal Richardson> Make sure we're testing with libarrow
9c10ca9 <Neal Richardson> Fix missing doc crossreferences from apache#5859
d0b07d7 <Neal Richardson> Move -Werror setting so that it works (and suppress a NOTE about it)
4e9bbe8 <Neal Richardson> Don't hard-code bionic
4809190 <Neal Richardson> Fix UTF-8 issues in R docker container
798522d <Neal Richardson> Oops.
d068c4f <Neal Richardson> Restore flight off flag
2691976 <Neal Richardson> Move R dependency installation to the dockerfile. Use rcmdcheck package so we can fail the build on warnings

Authored-by: Neal Richardson <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
  • Loading branch information
nealrichardson authored and kou committed Dec 6, 2019
1 parent 27a292f commit e76e1f6
Show file tree
Hide file tree
Showing 16 changed files with 79 additions and 70 deletions.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,4 @@
!python/requirements*.txt
!python/manylinux1/**
!python/manylinux2010/**
!r/DESCRIPTION
1 change: 0 additions & 1 deletion .github/workflows/r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ jobs:
env:
R: ${{ matrix.r }}
UBUNTU: ${{ matrix.ubuntu }}
ARROW_R_CXXFLAGS: '-Werror'
steps:
- name: Checkout Arrow
uses: actions/checkout@v1
Expand Down
2 changes: 2 additions & 0 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ environment:
TEST_R_WITH_ARROW: "TRUE"
RWINLIB_LOCAL: '%APPVEYOR_BUILD_FOLDER%\libarrow.zip'
ARROW_R_CXXFLAGS: '-Werror'
_R_CHECK_TESTS_NLINES_: 0
WARNINGS_ARE_ERRORS: 1

MSVC_DEFAULT_OPTIONS: ON
APPVEYOR_SAVE_CACHE_ON_ERROR: true
Expand Down
1 change: 1 addition & 0 deletions ci/conda_env_r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ pandoc
r-covr
r-hms
r-lubridate
r-rcmdcheck
r-rmarkdown
r-testthat
r-tibble
24 changes: 17 additions & 7 deletions ci/docker/linux-apt-r.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,10 @@ RUN apt-get update -y && \
apt-key adv \
--keyserver keyserver.ubuntu.com \
--recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' && \
# NOTE: as of 2019-12, R 3.5 and 3.6 are available in the repos with -cran35 suffix
# R 3.2, 3.3, 3.4 are available without the suffix but only for trusty and xenial
# TODO: make sure OS version and R version are valid together and conditionally set repo suffix
add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu '$(lsb_release -cs)'-cran35/' && \
apt-get install -y \
r-base=${r}* \
# system libs needed by core R packages
Expand All @@ -42,22 +45,29 @@ RUN apt-get update -y && \
clang-format \
clang-tidy \
# R CMD CHECK --as-cran needs pdflatex to build the package manual
texlive-latex-base && \
texlive-latex-base \
# Need locales so we can set UTF-8
locales && \
locale-gen en_US.UTF-8 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Ensure parallel R package installation, set CRAN repo mirror,
# and use pre-built binaries where possible
RUN printf "\
options(Ncpus = parallel::detectCores(), \
repos = 'https://demo.rstudiopm.com/all/__linux__/bionic/latest', \
repos = 'https://demo.rstudiopm.com/all/__linux__/"$(lsb_release -cs)"/latest', \
HTTPUserAgent = sprintf(\
'R/%%s R (%%s)', getRversion(), \
paste(getRversion(), R.version\$platform, R.version\$arch, R.version\$os)))\n" \
>> /etc/R/Rprofile.site

# Also ensure parallel compilation of each individual package
RUN echo "MAKEFLAGS=-j8" >> /usr/lib/R/etc/Makeconf
# Also ensure parallel compilation of C/C++ code
RUN echo "MAKEFLAGS=-j$(R --slave -e 'cat(parallel::detectCores())')" >> /usr/lib/R/etc/Makeconf

COPY ci/scripts/r_deps.sh /arrow/ci/scripts/
COPY r/DESCRIPTION /arrow/r/
RUN /arrow/ci/scripts/r_deps.sh /arrow

ENV \
ARROW_BUILD_STATIC=OFF \
Expand All @@ -66,9 +76,9 @@ ENV \
ARROW_DEPENDENCY_SOURCE=SYSTEM \
ARROW_FLIGHT=OFF \
ARROW_GANDIVA=OFF \
ARROW_NO_DEPRECATED_API=ON \
ARROW_ORC=OFF \
ARROW_PARQUET=ON \
ARROW_PLASMA=OFF \
ARROW_USE_GLOG=OFF \
ARROW_NO_DEPRECATED_API=ON \
ARROW_R_DEV=TRUE
LC_ALL=en_US.UTF-8
40 changes: 0 additions & 40 deletions ci/scripts/r_build.sh

This file was deleted.

4 changes: 2 additions & 2 deletions ci/scripts/r_deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ source_dir=${1}/r
pushd ${source_dir}

# Install R package dependencies
${R_BIN} -e "install.packages(c('remotes', 'dplyr', 'glue'))"
${R_BIN} -e "install.packages('remotes'); remotes::install_cran(c('glue', 'rcmdcheck'))"
${R_BIN} -e "remotes::install_deps(dependencies = TRUE)"
${R_BIN} -e "remotes::install_github('romainfrancois/decor')"
${R_BIN} -e "remotes::install_github('nealrichardson/decor')"

popd
9 changes: 8 additions & 1 deletion ci/scripts/r_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,14 @@ pushd ${source_dir}

export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
export R_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
export ARROW_R_CXXFLAGS='-Werror'
export TEST_R_WITH_ARROW=TRUE
export _R_CHECK_TESTS_NLINES_=0
export _R_CHECK_CRAN_INCOMING_REMOTE_=FALSE
export _R_CHECK_LIMIT_CORES_=FALSE
export _R_CHECK_COMPILATION_FLAGS_=FALSE
export VERSION=$(grep ^Version DESCRIPTION | sed s/Version:\ //)

${R_BIN} CMD check --no-manual --as-cran arrow_*.tar.gz
${R_BIN} -e "rcmdcheck::rcmdcheck(build_args = '--no-build-vignettes', args = c('--no-manual', '--as-cran', '--ignore-vignettes'), error_on = 'warning', check_dir = 'check')"

popd
3 changes: 0 additions & 3 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -576,7 +576,6 @@ services:
volumes: *conda-volumes
command:
["/arrow/ci/scripts/cpp_build.sh /arrow /build/cpp &&
/arrow/ci/scripts/r_build.sh /arrow &&
/arrow/ci/scripts/r_test.sh /arrow"]

ubuntu-r:
Expand All @@ -598,8 +597,6 @@ services:
command: >
/bin/bash -c "
/arrow/ci/scripts/cpp_build.sh /arrow /build/cpp &&
/arrow/ci/scripts/r_deps.sh /arrow &&
/arrow/ci/scripts/r_build.sh /arrow &&
/arrow/ci/scripts/r_test.sh /arrow"
ubuntu-r-sanitizer:
Expand Down
1 change: 1 addition & 0 deletions r/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ export(boolean)
export(buffer)
export(cast_options)
export(chunked_array)
export(codec_is_available)
export(contains)
export(date32)
export(date64)
Expand Down
22 changes: 16 additions & 6 deletions r/R/compression.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,13 @@
#' @section Factory:
#' The `Codec$create()` factory method takes the following arguments:
#' * `type`: string name of the compression method. Possible values are
#' "uncompressed", "snappy", "gzip", "brotli", "zstd", "lz4", "lzo" or "bz2".
#' `type` may be upper- or lower-cased. Not all methods may be available; support
#' depends on build-time flags for the C++ library. See [codec_is_available()].
#' Most builds support at least "snappy" and "gzip". All support "uncompressed".
#' * `compression_level`: compression level, the default value (`NA`) uses the default
#' compression level for the selected compression `type`.
#' "uncompressed", "snappy", "gzip", "brotli", "zstd", "lz4", "lzo", or
#' "bz2". `type` may be upper- or lower-cased. Not all methods may be
#' available; support depends on build-time flags for the C++ library.
#' See [codec_is_available()]. Most builds support at least "snappy" and
#' "gzip". All support "uncompressed".
#' * `compression_level`: compression level, the default value (`NA`) uses the
#' default compression level for the selected compression `type`.
#' @rdname Codec
#' @name Codec
#' @export
Expand All @@ -53,6 +54,15 @@ Codec$create <- function(type = "gzip", compression_level = NA) {
type
}

#' Check whether a compression codec is available
#'
#' Support for compression libraries depends on the build-time settings of
#' the Arrow C++ library. This function lets you know which are available for
#' use.
#' @param type A string, one of "uncompressed", "snappy", "gzip", "brotli",
#' "zstd", "lz4", "lzo", or "bz2", case insensitive.
#' @return Logical: is `type` available?
#' @export
codec_is_available <- function(type) {
util___Codec__IsAvailable(compression_from_name(type))
}
Expand Down
4 changes: 2 additions & 2 deletions r/R/parquet.R
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ read_parquet <- function(file,
#' The `compression` argument can be any of the following (case insensitive):
#' "uncompressed", "snappy", "gzip", "brotli", "zstd", "lz4", "lzo" or "bz2".
#' Only "uncompressed" is guaranteed to be available, but "snappy" and "gzip"
#' are almost always included.
#' are almost always included. See [codec_is_available()].
#' The default "snappy" is used if available, otherwise "uncompressed". To
#' disable compression, set `compression = "uncompressed"`.
#' Note that "uncompressed" columns may still have dictionary encoding.
Expand Down Expand Up @@ -379,7 +379,7 @@ ParquetWriterProperties$create <- function(table, version = NULL, compression =
#' - `schema` A [Schema]
#' - `sink` An [arrow::io::OutputStream][OutputStream] or a string which is interpreted as a file path
#' - `properties` An instance of [ParquetWriterProperties]
#' - `arrow_properties` An instance of [ParquetArrowWriterProperties]
#' - `arrow_properties` An instance of `ParquetArrowWriterProperties`
#' @export
#' @include arrow-package.R
ParquetFileWriter <- R6Class("ParquetFileWriter", inherit = Object,
Expand Down
13 changes: 7 additions & 6 deletions r/man/Codec.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion r/man/ParquetFileWriter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions r/man/codec_is_available.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion r/man/write_parquet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit e76e1f6

Please sign in to comment.