Skip to content

Commit

Permalink
Select bottom n values
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Nov 4, 2015
1 parent b3fb360 commit 9699289
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 24 deletions.
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# dplyr 0.4.3.9000

* `top_n()` can now also select bottom `n` values by passing a negative value
to `n` (#1008, #1352).

* `ungroup()` generic gains `...` (#922).

* Weighted `tally()` now ignores NAs (#1145).
Expand Down
41 changes: 28 additions & 13 deletions R/top-n.R
Original file line number Diff line number Diff line change
@@ -1,23 +1,30 @@
#' Select top n rows (by value).
#' Select top (or bottom) n rows (by value).
#'
#' This is a convenient wrapper that uses \code{\link{filter}} and
#' \code{\link{min_rank}} to select the top n entries in each group, ordered
#' by \code{wt}.
#' \code{\link{min_rank}} to select the top or bottom entries in each group,
#' ordered by \code{wt}.
#'
#' @param x a \code{\link{tbl}} to filter
#' @param n number of rows to return. If \code{x} is grouped, this is
#' the number of rows per group. May include more than \code{n} if there
#' are ties.
#' @param n number of rows to return. If \code{x} is grouped, this is the
#' number of rows per group. Will include more than \code{n} rows if
#' there are ties.
#'
#' If \code{n} is positive, selects the top \code{n} rows. If negative,
#' selects the bottom \code{n} rows.
#' @param wt (Optional). The variable to use for ordering. If not specified,
#' defaults to the last variable in the tbl.
#' @export
#' @examples
#' # Find 10 players with most games
#' if (require("Lahman")) {
#' players <- group_by(tbl_df(Batting), playerID)
#' games <- tally(players, G)
#' top_n(games, 10, n)
#' df <- data.frame(x = c(10, 4, 1, 6, 3, 1, 1))
#' df %>% top_n(2)
#'
#' # Negative values select bottom from group. Note that we get more
#' # than 2 values here because there's a tie: top_n() either takes
#' # all rows with a value, or none.
#' df %>% top_n(-2)
#'
#' if (require("Lahman")) {
#' # Find 10 players with most games
#' # A little nicer with %>%
#' tbl_df(Batting) %>%
#' group_by(playerID) %>%
Expand All @@ -32,10 +39,18 @@ top_n <- function(x, n, wt) {
vars <- tbl_vars(x)
message("Selecting by ", vars[length(vars)])
wt <- as.name(vars[length(vars)])
} else {
wt <- substitute(wt)
}

call <- substitute(filter(x, min_rank(desc(wt)) <= n),
list(n = n, wt = substitute(wt)))
stopifnot(is.numeric(n), length(n) == 1)
if (n > 0) {
call <- substitute(filter(x, min_rank(desc(wt)) <= n),
list(n = n, wt = wt))
} else {
call <- substitute(filter(x, min_rank(wt) <= n),
list(n = abs(n), wt = wt))
}

eval(call)
}
29 changes: 18 additions & 11 deletions man/top_n.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 9699289

Please sign in to comment.