Persistence layer for Ruby domain objects in Elasticsearch, using the Repository and ActiveRecord patterns.
The library is compatible with Ruby 1.9.3 (or higher) and Elasticsearch 1.0 (or higher).
Install the package from Rubygems:
gem install elasticsearch-persistence
To use an unreleased version, either add it to your Gemfile
for Bundler:
gem 'elasticsearch-persistence', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
or install it from a source code checkout:
git clone https://github.com/elasticsearch/elasticsearch-rails.git
cd elasticsearch-rails/elasticsearch-persistence
bundle install
rake install
The Elasticsearch::Persistence::Repository
module provides an implementation of the
repository pattern and allows
to save, delete, find and search objects stored in Elasticsearch, as well as configure
mappings and settings for the index.
Let's have a simple plain old Ruby object (PORO):
class Note
attr_reader :attributes
def initialize(attributes={})
@attributes = attributes
end
def to_hash
@attributes
end
end
Let's create a default, "dumb" repository, as a first step:
require 'elasticsearch/persistence'
repository = Elasticsearch::Persistence::Repository.new
We can save a Note
instance into the repository...
note = Note.new id: 1, text: 'Test'
repository.save(note)
# PUT http://localhost:9200/repository/note/1 [status:201, request:0.210s, query:n/a]
# > {"id":1,"text":"Test"}
# < {"_index":"repository","_type":"note","_id":"1","_version":1,"created":true}
...find it...
n = repository.find(1)
# GET http://localhost:9200/repository/_all/1 [status:200, request:0.003s, query:n/a]
# < {"_index":"repository","_type":"note","_id":"1","_version":2,"found":true, "_source" : {"id":1,"text":"Test"}}
=> <Note:0x007fcbfc0c4980 @attributes={"id"=>1, "text"=>"Test"}>
...search for it...
repository.search(query: { match: { text: 'test' } }).first
# GET http://localhost:9200/repository/_search [status:200, request:0.005s, query:0.002s]
# > {"query":{"match":{"text":"test"}}}
# < {"took":2, ... "hits":{"total":1, ... "hits":[{ ... "_source" : {"id":1,"text":"Test"}}]}}
=> <Note:0x007fcbfc1c7b70 @attributes={"id"=>1, "text"=>"Test"}>
...or delete it:
repository.delete(note)
# DELETE http://localhost:9200/repository/note/1 [status:200, request:0.014s, query:n/a]
# < {"found":true,"_index":"repository","_type":"note","_id":"1","_version":3}
=> {"found"=>true, "_index"=>"repository", "_type"=>"note", "_id"=>"1", "_version"=>2}
The repository module provides a number of features and facilities to configure and customize the behaviour:
- Configuring the Elasticsearch client being used
- Setting the index name, document type, and object class for deserialization
- Composing mappings and settings for the index
- Creating, deleting or refreshing the index
- Finding or searching for documents
- Providing access both to domain objects and hits for search results
- Providing access to the Elasticsearch response for search results (aggregations, total, ...)
- Defining the methods for serialization and deserialization
You can use the default repository class, or include the module in your own. Let's review it in detail.
For simple cases, you can use the default, bundled repository class, and configure/customize it:
repository = Elasticsearch::Persistence::Repository.new do
# Configure the Elasticsearch client
client Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'], log: true
# Set a custom index name
index :my_notes
# Set a custom document type
type :my_note
# Specify the class to inicialize when deserializing documents
klass Note
# Configure the settings and mappings for the Elasticsearch index
settings number_of_shards: 1 do
mapping do
indexes :text, analyzer: 'snowball'
end
end
# Customize the serialization logic
def serialize(document)
super.merge(my_special_key: 'my_special_stuff')
end
# Customize the de-serialization logic
def deserialize(document)
puts "# ***** CUSTOM DESERIALIZE LOGIC KICKING IN... *****"
super
end
end
The custom Elasticsearch client will be used now, with a custom index and type names, as well as the custom serialization and de-serialization logic.
We can create the index with the desired settings and mappings:
repository.create_index! force: true
# PUT http://localhost:9200/my_notes
# > {"settings":{"number_of_shards":1},"mappings":{ ... {"text":{"analyzer":"snowball","type":"string"}}}}}
Save the document with extra properties added by the serialize
method:
repository.save(note)
# PUT http://localhost:9200/my_notes/my_note/1
# > {"id":1,"text":"Test","my_special_key":"my_special_stuff"}
{"_index"=>"my_notes", "_type"=>"my_note", "_id"=>"1", "_version"=>4, ... }
And deserialize
it:
repository.find(1)
# ***** CUSTOM DESERIALIZE LOGIC KICKING IN... *****
<Note:0x007f9bd782b7a0 @attributes={... "my_special_key"=>"my_special_stuff"}>
In most cases, though, you'll want to use a custom class for the repository, so let's do that:
require 'base64'
class NoteRepository
include Elasticsearch::Persistence::Repository
def initialize(options={})
index options[:index] || 'notes'
client Elasticsearch::Client.new url: options[:url], log: options[:log]
end
klass Note
settings number_of_shards: 1 do
mapping do
indexes :text, analyzer: 'snowball'
# Do not index images
indexes :image, index: 'no'
end
end
# Base64 encode the "image" field in the document
#
def serialize(document)
hash = document.to_hash.clone
hash['image'] = Base64.encode64(hash['image']) if hash['image']
hash.to_hash
end
# Base64 decode the "image" field in the document
#
def deserialize(document)
hash = document['_source']
hash['image'] = Base64.decode64(hash['image']) if hash['image']
klass.new hash
end
end
Include the Elasticsearch::Persistence::Repository
module to add the repository methods into the class.
You can customize the repository in the familiar way, by calling the DSL-like methods.
You can implement a custom initializer for your repository, add complex logic in its class and instance methods -- in general, have all the freedom of a standard Ruby class.
repository = NoteRepository.new url: 'http://localhost:9200', log: true
# Configure the repository instance
repository.index = 'notes_development'
repository.client.transport.logger.formatter = proc { |s, d, p, m| "\e[2m# #{m}\n\e[0m" }
repository.create_index! force: true
note = Note.new 'id' => 1, 'text' => 'Document with image', 'image' => '... BINARY DATA ...'
repository.save(note)
# PUT http://localhost:9200/notes_development/note/1
# > {"id":1,"text":"Document with image","image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}
puts repository.find(1).attributes['image']
# GET http://localhost:9200/notes_development/note/1
# < {... "_source" : { ... "image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}}
# => ... BINARY DATA ...
The repository uses the standard Elasticsearch client,
which is accessible with the client
getter and setter methods:
repository.client = Elasticsearch::Client.new url: 'http://search.server.org'
repository.client.transport.logger = Logger.new(STDERR)
The index
method specifies the Elasticsearch index to use for storage, lookup and search
(when not set, the value is inferred from the repository class name):
repository.index = 'notes_development'
The type
method specifies the Elasticsearch document type to use for storage, lookup and search
(when not set, the value is inferred from the document class name, or _all
is used):
repository.type = 'my_note'
The klass
method specifies the Ruby class name to use when initializing objects from
documents retrieved from the repository (when not set, the value is inferred from the
document _type
as fetched from Elasticsearch):
repository.klass = MyNote
The settings
and mappings
methods, provided by the
elasticsearch-model
gem, allow to configure the index properties:
repository.settings number_of_shards: 1
repository.settings.to_hash
# => {:number_of_shards=>1}
repository.mappings { indexes :title, analyzer: 'snowball' }
repository.mappings.to_hash
# => { :note => {:properties=> ... }}
The convenience methods create_index!
, delete_index!
and refresh_index!
allow you to manage the index lifecycle.
The serialize
and deserialize
methods allow you to customize the serialization of the document when passing it
to the storage, and the initialization procedure when loading it from the storage:
class NoteRepository
def serialize(document)
Hash[document.to_hash.map() { |k,v| v.upcase! if k == :title; [k,v] }]
end
def deserialize(document)
MyNote.new ActiveSupport::HashWithIndifferentAccess.new(document['_source']).deep_symbolize_keys
end
end
The save
method allows you to store a domain object in the repository:
note = Note.new id: 1, title: 'Quick Brown Fox'
repository.save(note)
# => {"_index"=>"notes_development", "_type"=>"my_note", "_id"=>"1", "_version"=>1, "created"=>true}
The update
method allows you to perform a partial update of a document in the repository.
Use either a partial document:
repository.update id: 1, title: 'UPDATED', tags: []
# => {"_index"=>"notes_development", "_type"=>"note", "_id"=>"1", "_version"=>2}
Or a script (optionally with parameters):
repository.update 1, script: 'if (!ctx._source.tags.contains(t)) { ctx._source.tags += t }', params: { t: 'foo' }
# => {"_index"=>"notes_development", "_type"=>"note", "_id"=>"1", "_version"=>3}
The delete
method allows to remove objects from the repository (pass either the object itself or its ID):
repository.delete(note)
repository.delete(1)
The find
method allows to find one or many documents in the storage and returns them as deserialized Ruby objects:
repository.save Note.new(id: 2, title: 'Fast White Dog')
note = repository.find(1)
# => <MyNote ... QUICK BROWN FOX>
notes = repository.find(1, 2)
# => [<MyNote... QUICK BROWN FOX>, <MyNote ... FAST WHITE DOG>]
When the document with a specific ID isn't found, a nil
is returned instead of the deserialized object:
notes = repository.find(1, 3, 2)
# => [<MyNote ...>, nil, <MyNote ...>]
Handle the missing objects in the application code, or call compact
on the result.
The search
method to retrieve objects from the repository by a query string or definition in the Elasticsearch DSL:
repository.search('fox or dog').to_a
# GET http://localhost:9200/notes_development/my_note/_search?q=fox
# => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]
repository.search(query: { match: { title: 'fox dog' } }).to_a
# GET http://localhost:9200/notes_development/my_note/_search
# > {"query":{"match":{"title":"fox dog"}}}
# => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]
The returned object is an instance of the Elasticsearch::Persistence::Repository::Response::Results
class,
which provides access to the results, the full returned response and hits.
results = repository.search(query: { match: { title: 'fox dog' } })
# Iterate over the objects
#
results.each do |note|
puts "* #{note.attributes[:title]}"
end
# * QUICK BROWN FOX
# * FAST WHITE DOG
# Iterate over the objects and hits
#
results.each_with_hit do |note, hit|
puts "* #{note.attributes[:title]}, score: #{hit._score}"
end
# * QUICK BROWN FOX, score: 0.29930896
# * FAST WHITE DOG, score: 0.29930896
# Get total results
#
results.total
# => 2
# Access the raw response as a Hashie::Mash instance
results.response._shards.failed
# => 0
An example Sinatra application is available in examples/notes/application.rb
,
and demonstrates a rich set of features:
- How to create and configure a custom repository class
- How to work with a plain Ruby class as the domain object
- How to integrate the repository with a Sinatra application
- How to write complex search definitions, including pagination, highlighting and aggregations
- How to use search results in the application view
The Elasticsearch::Persistence::Model
module provides an implementation of the
active record pattern,
with a familiar interface for using Elasticsearch as a persistence layer in
Ruby on Rails applications.
All the methods are documented with comprehensive examples in the source code, available also online at http://rubydoc.info/gems/elasticsearch-persistence/Elasticsearch/Persistence/Model.
To use the library in a Rails application, add it to your Gemfile
with a require
statement:
gem "elasticsearch-persistence", require: 'elasticsearch/persistence/model'
To use the library without Bundler, install it, and require the file:
gem install elasticsearch-persistence
# In your code
require 'elasticsearch/persistence/model'
The integration is implemented by including the module in a Ruby class. The model attribute definition support is implemented with the Virtus Rubygem, and the naming, validation, etc. features with the ActiveModel Rubygem.
class Article
include Elasticsearch::Persistence::Model
# Define a plain `title` attribute
#
attribute :title, String
# Define an `author` attribute, with multiple analyzers for this field
#
attribute :author, String, mapping: { fields: {
author: { type: 'string'},
raw: { type: 'string', analyzer: 'keyword' }
} }
# Define a `views` attribute, with default value
#
attribute :views, Integer, default: 0, mapping: { type: 'integer' }
# Validate the presence of the `title` attribute
#
validates :title, presence: true
# Execute code after saving the model.
#
after_save { puts "Successfuly saved: #{self}" }
end
Attribute validations work like for any other ActiveModel-compatible implementation:
article = Article.new # => #<Article { ... }>
article.valid?
# => false
article.errors.to_a
# => ["Title can't be blank"]
We can create a new article in the database...
Article.create id: 1, title: 'Test', author: 'John'
# PUT http://localhost:9200/articles/article/1 [status:201, request:0.015s, query:n/a]
... and find it:
article = Article.find(1)
# => #<Article { ... }>
article._index
# => "articles"
article.id
# => "1"
article.title
# => "Test"
To update the model, either update the attribute and save the model:
article.title = 'Updated'
article.save
# => {"_index"=>"articles", "_type"=>"article", "_id"=>"1", "_version"=>2, "created"=>false}
... or use the update_attributes
method:
article.update_attributes title: 'Test', author: 'Mary'
# => {"_index"=>"articles", "_type"=>"article", "_id"=>"1", "_version"=>3}
The implementation supports the familiar interface for updating model timestamps:
article.touch
# => => { ... "_version"=>4}
... and numeric attributes:
article.views
# => 0
article.increment :views
article.views
# => 1
Any callbacks defined in the model will be triggered during the persistence operations:
article.save
# Successfuly saved: #<Article {...}>
The model also supports familiar find_in_batches
and find_each
methods to efficiently
retrieve big collections of model instances, using the Elasticsearch's Scan API:
Article.find_each(_source_include: 'title') { |a| puts "===> #{a.title.upcase}" }
# GET http://localhost:9200/articles/article/_search?scroll=5m&search_type=scan&size=20
# GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhb...
# ===> TEST
# GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhb...
# => "c2Nhb..."
The model class provides a search
method to retrieve model instances with a regular
search definition, including highlighting, aggregations, etc:
results = Article.search query: { match: { title: 'test' } },
aggregations: { authors: { terms: { field: 'author.raw' } } },
highlight: { fields: { title: {} } }
puts results.first.title
# Test
puts results.first.hit.highlight['title']
# <em>Test</em>
puts results.response.aggregations.authors.buckets.each { |b| puts "#{b['key']} : #{b['doc_count']}" }
# John : 1
The module will set up a client,
connected to localhost:9200
, by default.
To use a client with different configuration:
Elasticsearch::Persistence.client = Elasticsearch::Client.new log: true
To set up a specific client for a specific model:
Article.client = Elasticsearch::Client.new host: 'api.server.org'
You might want to do this during you application bootstrap process, e.g. in a Rails initializer.
Please refer to the
elasticsearch-transport
library documentation for all the configuration options, and to the
elasticsearch-api
library documentation
for information about the Ruby client API.
The integration with Elasticsearch is implemented by embedding the repository object in the model.
You can access it through the gateway
method:
Artist.gateway.client.info
# GET http://localhost:9200/ [status:200, request:0.011s, query:n/a]
# => {"status"=>200, "name"=>"Lightspeed", ...}
The model instances are fully compatible with Rails' conventions and helpers:
url_for article
# => "http://localhost:3000/articles/1"
div_for article
# => '<div class="article" id="article_1"></div>'
... as well as form values for dates and times:
article = Article.new "title" => "Date", "published(1i)"=>"2014", "published(2i)"=>"1", "published(3i)"=>"1"
article.published.iso8601
# => "2014-01-01"
The library provides a Rails ORM generator to facilitate building the application scaffolding:
rails generate scaffold Person name:String email:String birthday:Date --orm=elasticsearch
A fully working Ruby on Rails application can be generated with the following command:
rails new music --force --skip --skip-bundle --skip-active-record --template https://raw.githubusercontent.com/elasticsearch/elasticsearch-rails/master/elasticsearch-persistence/examples/music/template.rb
The application demonstrates:
- How to set up model attributes with custom mappings
- How to define model relationships with Elasticsearch's parent/child
- How to configure models to use a common index, and create the index with proper mappings
- How to use Elasticsearch's completion suggester to drive auto-complete functionality
- How to use Elasticsearch-persisted models in Rails' views and forms
- How to write controller tests
The source files for the application are available in the examples/music
folder.
This software is licensed under the Apache 2 license, quoted below.
Copyright (c) 2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.