Helpers for dynamically managing postgres partitions and an ActiveRecord::Base layer for managing partitioned models.
Let's say we have a Newsletter
model which we want to partition by periods of months. We will always be creating one partition in advance for the coming month's publishes, and initially we'll create one for the current month so we can start storing data.
First, we begin by making our model partitionable. We do that by including the concern CanvasPartman::Concerns::Partitioned
.
class Newsletter < ActiveRecord::Base
include CanvasPartman::Concerns::Partitioned
# self.partitioning_field = 'created_at'
# self.partitioning_interval = :months
end
You can tune several partitioning parameters which you can find in the concern's source docs.
The next step is to set up our partitions so that we can start creating records.
partman = CanvasPartman::PartitionManager.create(Newsletter)
# a partition for this month's data:
partman.create_partition(Time.now)
# and another for the coming month:
partman.create_partition(1.month.from_now)
That's actually all you need to get started. However, you should be aware that due to way inheritance (and by extension, partitioning) works in postgres, any indices you define on the master table IN THE FUTURE are not inherited for existing partitions (though they will be inherited for future created partitions). This means our migrations can not solely deal with the master table.
We'll see how to manage the partition schema in the next section.
canvas-partman
defines a custom migration type you can use to manage the schema of all existing (and to-be-created) partition tables. The migrations work, and read, just like regular ActiveRecord migrations but are run in two different ways depending on the context. Basically:
- all partition migrations are run along with the regular ones when using
rake db:migrate
- all partition migrations for a specific master table are run when a new partition of that master table is created (this is handled implicitly by the PartitionManager)
Item #2 ensures that all newly-created partitions have a consistent schema with their predecessors which may have been migrated over time into their current schema.
A custom Rails generator partition_migration
is made available for generating skeleton partition migrations. Let's create a migration that adds an index on created_at
for sorting to all existing partitions as well as ones to be created in the future:
rails generate partition_migration AddCreatedAtIndexToNewsletters newsletters
# create db/migrate/20141115282317_add_created_at_index_to_newsletters.partitions.rb
Note the
.partitions.rb
part of the generated migration filename; more on this in the notes below, at the end of the section.
The generator requires 1 parameter which is similar to the stock migration
generator and that is the name of the migration. The second optional argument is a string denoting the master table for the partitions. If you leave this unspecified, you will have to manually specify it in the migration itself. Note that this is ONLY NECESSARY if you have previously created partitions that do not yet have this index.
Newly created partitions will inherit indexes from the master table.
Let's actually write the migration. Follow the inline comments:
# db/migrate/20141115282317_add_created_at_index_to_newsletters.partitions.rb
# We must subclass from CanvasPartman::Migration
# instead of ActiveRecord::Migration
class AddCreatedAtIndexToNewsletters < CanvasPartman::Migration
self.base_class = MyApp::Newsletter
def up
# #with_each_partition() is a helper available to the migration
# that allows you iterate over all existing partition tables.
#
# The passed block receives the name of the partition table as
# the only parameter:
with_each_partition do |partition|
add_index partition, :created_at
end
end
def down
with_each_partition do |partition|
remove_index partition, :created_at
end
end
end
Alternatively, you can define a reversible change
runner using a change_table
block which will be yielded with a table for every partition as you would expect:
class AddCreatedAtIndexToNewsletters < CanvasPartman::Migration
self.base_class = MyApp::Newsletter
def change
with_each_partition do |partition|
change_table(partition) do |t|
t.index :created_at
end
end
end
end
A few notes:
- Partition migration files must be "scoped" for
canvas-partman
to identify them and pick them up when creating new partitions. A scope is an identifier that comes after the name of the migration file and right before therb
extension, prefixed by a dot. In the example above, the scope ispartitions
. You can customize this by overridingCanvasPartman.migrations_scope = 'my_scope'
but DO NOT LEAVE IT EMPTY.
Adding, modifying, or removing columns do not need to be applied to each partition; instead we can rely on PG inheritance to take care of cascading those changes down to all partitions (and future ones.)
Because of this, we don't have to use CanvasPartman::Migration
migrations at all; just use the regular ActiveRecord ones. Let's add a new column called publisher
of type string
to our newsletters table:
rails g migration AddPublisherToNewsletters
# create db/migrate/20141115282318_add_publisher_to_newsletters.rb
And the migration:
# db/migrate/20141115282318_add_publisher_to_newsletters.rb
class AddPublisherToNewsletters < ActiveRecord::Migration
def change
change_table('newsletters') do |t|
t.string :name
end
# Done! No need to worry about handling each partition,
# pg inheritance will add this column to the master
# table as well as the existing partition tables.
end
end
If you already have a table full of data with indexes, you can gracefully switch to a partitioned scheme by writing a non-scoped migration like so:
# db/migrate/20161115282318_migrate_versions_to_partitions.rb
class MigrateVersionsToPartitions < ActiveRecord::Migration
def up
partman = CanvasPartman::PartitionManager.create(Version)
# create tables to hold the existing data
partman.create_initial_partitions
# then move the data over
partman.migrate_data_to_partitions
end
end
to get partman specs to run locally:
- create a database named canvas_partman_test in an available postgres instance
- set your DATABASE_URL environment variable to:
postgresql://user:pass@dbhostname/canvas_partman_test"
- run
bundle exec rspec spec
from the gem root directory
- We need to configure postgres's constraint_exclusion to be
partition
- see: http://www.postgresql.org/docs/9.1/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION [Update: this is not necessary as it is the default setting] - Need to come up with a way to use regular, multiple/successive migrations for partition schemas instead of a single "snapshot" of how the latest version of the schema looks like [Done as of 11/15/2014]