Skip to content

Commit

Permalink
deprication warnings and more tests to prep for release
Browse files Browse the repository at this point in the history
  • Loading branch information
spilliton committed Mar 22, 2014
1 parent 0940808 commit 374785a
Show file tree
Hide file tree
Showing 5 changed files with 65 additions and 39 deletions.
63 changes: 27 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,43 +20,33 @@ bundle install

## Usage

``` ruby
Artist.random # a random Artist if there are any, otherwise nil
Artist.random(3) # an array of three Artists picked at random
Artist.random(1) # an array containing one random Artist
```
The most common usage is a simple scope you can chain along like any other:

### Scopes
``` ruby
# randumb works like the active record "all, first, and last" methods
Artist.has_views.includes(:albums).where(["created_at > ?", 2.days.ago]).random(10)
Artist.order_by_rand.first # a random Artist if there are any, otherwise nil
Artist.order_by_rand.limit(3).all # an array of three Artists picked at random
Artist.order_by_rand.limit(1).all # an array containing one random Artist
```

If only 5 records matched the conditions specified above, randumb will return an array with those 5 records in random order (as opposed to 10 records with duplicates).

### How It Works

randumb simply tacks an additional ```ORDER BY RANDOM()``` (or ```RAND()``` for mysql) to your query.

It will have the *least* amount of sort precedence if you include other orders in your scope.

## Advanced Usage

### Stacking the Deck

You can use the ```random_weighted``` method to favor certain records more than others.
You can use the ```order_by_rand_weighted``` method to favor certain records more than others.

For example, if you want to favor higher-rated Movies, and your
Movie model has a numeric ```score``` column, you can do any of the the following:

``` ruby
Movie.random_weighted(:score)
Movie.random_weighted_by_score
Movie.order_by_rand_weighted(:score).first
# returns 1 random movie by:
# select * from movies ORDER BY (score * RANDOM() DESC)
# select * from movies ORDER BY (score * RANDOM() DESC) LIMIT 1

Movie.random_weighted(:score, 10)
Movie.random_weighted_by_score(10)
Movie.order_by_rand_weighted(:score).limit(10).all
# returns an array of up to 10 movies and executes:
# select * from movies ORDER BY (score * RANDOM() DESC) LIMIT 10
```
Expand All @@ -68,11 +58,24 @@ If you wish to seed the randomness so that you can have predictable outcomes, pr
``` ruby
# Assuming no no records have been added between calls
# These will return the same 2 artists in the same order both times
Artist.random(2, seed: 123)
Artist.random(2, seed: 123)
Artist.order_by_rand(seed: 123).limit(2)
Artist.order_by_rand(seed: 123).limit(2)
```

One use case is when you are paginating through random records.

### Depricated Syntax

A few methods will be going away in randumb 1.0 due to them not really following current active record conventions:

``` ruby
# working like the active record "all, first, and last" methods and passing limit as param
Artist.has_views.includes(:albums).where(["created_at > ?", 2.days.ago]).random(10)
# dynamic finders for weighted methods
Artist.random_weighted_by_views
```

### Pick Your Poison
### Random By Id Shuffle

The adventurous may wish to try randumb's earlier algorithm for random record selection: ```random_by_id_shuffle```.

Expand All @@ -88,36 +91,24 @@ artists = Artist.limit(100).order("view_count DESC").random_by_id_shuffle(5)
# select * from artists WHERE id in (artist_ids)
```

Compare this to the default ```random()``` which will use the lesser of the limits you provide and apply ```ORDER BY RANDOM()``` sorting after any other orders you provide.

``` ruby
# (belligerently) Gimme the top 5 artists and I'll pointlessly provide a limit of 100!
# Plus I want artists with the same view count to be sorted randomly!
# This clearly a silly thing to do...
artists = Artist.limit(100).order("view_count DESC").random(5)

# Executes:
# select * from artists ORDER BY view_count DESC, RANDOM() LIMIT 5
```

## A Note on Performance

As stated above, by default, randumb uses a simple approach of applying an order by random() statement to your query. In many sets, this performs well enough to not really be a big deal. However, as many blog posts and articles will note, the database must generate a random number for each row matching the scope and this can result in rather slow queries for large result sets. The last time I tested randumb on a test data set with 1 million rows (with no scopes) it took over 2 seconds.

In earlier versions of randumb I tried to alleviate this by doing two db queries. One to select the possibly IDs into an array, and a second with a randomly selected set of those ids. This was sometimes faster in very high data sets, however, for most sizes I tested, it did not perform significatly better than ORDER BY RAND() and it had the possibility of running out of memory due to selecting all the ids into into a ruby array.

If you are noticing slow speeds on your random queries and you have a very very large database table, my advice is to scope down your query to a subset of the table via an indexed scope. Ex: ```Artist.where('views > 10').random``` This will result in less calls to RAND() and a faster query. You might also experiment with the old method by using ```random_by_id_shuffle``` and gauge the resulting speeds.
If you are noticing slow speeds on your random queries and you have a very very large database table, my advice is to scope down your query to a subset of the table via an indexed scope. Ex: ```Artist.where('views > 10').order_by_rand.first``` This will result in less calls to RAND() and a faster query. You might also experiment with the old method by using ```random_by_id_shuffle``` and gauge the resulting speeds.

## ActiveRecord Caching

By default, ActiveRecord keeps a cache of the queries executed during the current request. If you call `random` multiple times on the same model or scope, you will end up with the same SQL query again, which causes the cache to return the result of the last query. You will see the following in your log if this happens:
By default, ActiveRecord keeps a cache of the queries executed during the current request. If you call `order_by_rand` multiple times on the same model or scope, you will end up with the same SQL query again, which causes the cache to return the result of the last query. You will see the following in your log if this happens:

```
Artist Load (0.3ms) SELECT "artists".* FROM "artists" ORDER BY RANDOM() LIMIT 1
CACHE (0.0ms) SELECT "artists".* FROM "artists" ORDER BY RANDOM() LIMIT 1
```

Fortunately, there is an easy workaround: Just wrap your query in a call to ```uncached```, e.g. ```Artist.uncached { Artist.random }```.
Fortunately, there is an easy workaround: Just wrap your query in a call to ```uncached```, e.g. ```Artist.uncached { Artist.order_by_rand.first }```.

## Why

Expand Down
5 changes: 5 additions & 0 deletions lib/randumb/relation.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ module Relation
# If the max_items argument is omitted, one random entity will be returned.
# If you provide the integer argument, you will get back an array of records.
def random(max_items = nil, opts={})
ActiveSupport::Deprecation.warn "The random() method will be depricated in randumb 1.0 in favor of the order_by_rand scope."
relation = clone
return random_by_id_shuffle(max_items, opts) if is_randumb_postges_case?(relation)
scope = relation.order_by_rand(opts)
Expand All @@ -23,6 +24,7 @@ def random(max_items = nil, opts={})
# If ranking_column is provided, that named column wil be multiplied
# by a random number to determine probability of order. The ranking column must be numeric.
def random_weighted(ranking_column, max_items = nil, opts={})
ActiveSupport::Deprecation.warn "The random_weighted() method will be depricated in randumb 1.0 in favor of the order_by_rand_weighted scope."
relation = clone
return random_by_id_shuffle(max_items, opts) if is_randumb_postges_case?(relation, ranking_column)
raise_unless_valid_ranking_column(ranking_column)
Expand Down Expand Up @@ -66,6 +68,7 @@ def order_by_rand(opts={})
end

def order_by_rand_weighted(ranking_column, opts={})
raise_unless_valid_ranking_column(ranking_column)
build_order_scope(opts, ranking_column)
end

Expand Down Expand Up @@ -178,6 +181,7 @@ def order_by_rand_weighted(ranking_column, opts={})
module MethodMissingMagicks
def method_missing(symbol, *args)
if symbol.to_s =~ /^random_weighted_by_(\w+)$/
ActiveSupport::Deprecation.warn "Dynamic finders will be removed in randumb 1.0 http://guides.rubyonrails.org/active_record_querying.html#dynamic-finders"
random_weighted($1, *args)
else
super
Expand All @@ -186,6 +190,7 @@ def method_missing(symbol, *args)

def respond_to?(symbol, include_private=false)
if symbol.to_s =~ /^random_weighted_by_(\w+)$/
ActiveSupport::Deprecation.warn "Dynamic finders will be removed in randumb 1.0 http://guides.rubyonrails.org/active_record_querying.html#dynamic-finders"
true
else
super
Expand Down
4 changes: 4 additions & 0 deletions test/randumb_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,10 @@ def assert_equal_for_both_methods(expected, obj, *params)
10.times do
assert_equal seeded_order, Artist.random(2, seed: @seed)
end

10.times do
assert_equal seeded_order, Artist.order_by_rand(seed: @seed).limit(2)
end
end

should "always return the same order using shuffle method" do
Expand Down
3 changes: 3 additions & 0 deletions test/test_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,6 @@ def setup

end


# Silence deprications
ActiveSupport::Deprecation.silenced = true
29 changes: 26 additions & 3 deletions test/weighted_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ class WeightedTest < Test::Unit::TestCase

should "raise exception when called with a non-existent column" do
assert_raises(ArgumentError) do
Artist.random_weighted(:blah)
Artist.order_by_rand_weighted(:blah)
end
assert_raises(ArgumentError) do
Artist.random_weighted_by_blah
Expand All @@ -13,7 +13,7 @@ class WeightedTest < Test::Unit::TestCase

should "raise exception when called with a non-numeric column" do
assert_raises(ArgumentError) do
Artist.random_weighted(:name)
Artist.order_by_rand_weighted(:name)
end
assert_raises(ArgumentError) do
Artist.random_weighted_by_name
Expand All @@ -25,7 +25,7 @@ class WeightedTest < Test::Unit::TestCase
if ENV["DB"] == "postgres"
should "raise exception if being called with uniq/postgres" do
assert_raises(Exception) do
Artist.uniq.random_weighted(:name)
Artist.uniq.order_by_rand_weighted(:name)
end
end
else
Expand Down Expand Up @@ -66,6 +66,9 @@ class WeightedTest < Test::Unit::TestCase
assert_hits_per_views do
Artist.random_weighted("views").views
end
assert_hits_per_views do
Artist.order_by_rand_weighted("views").first.views
end
end

should "order by ranking column with method_missing" do
Expand All @@ -80,6 +83,12 @@ class WeightedTest < Test::Unit::TestCase
assert(result.size == 5)
result.first.views
end

assert_hits_per_views do
result = Artist.order_by_rand_weighted("views").limit(5).all
assert(result.size == 5)
result.first.views
end
end

should "order by ranking column with method_missing using max_items" do
Expand All @@ -98,6 +107,14 @@ class WeightedTest < Test::Unit::TestCase
result.last.views
end
end

assert_raises(MiniTest::Assertion) do
assert_hits_per_views do
result = Artist.order_by_rand_weighted(:views).limit(3)
assert(result.size == 3)
result.last.views
end
end
end

should "order by ranking column with method_missing using 1 max_items" do
Expand All @@ -106,6 +123,12 @@ class WeightedTest < Test::Unit::TestCase
assert(result.size == 1)
result.first.views
end

assert_hits_per_views do
result = Artist.order_by_rand_weighted(:views).limit(1)
assert(result.size == 1)
result.first.views
end
end
end

Expand Down

0 comments on commit 374785a

Please sign in to comment.