Skip to content

Commit

Permalink
Added the query cache overview document
Browse files Browse the repository at this point in the history
git-svn-id: svn+ssh://kelev.kaltura.com/usr/local/kalsource/backend/server/trunk/core@61996 6b8eccd3-e8c5-4e7d-8186-e12b5326b719
  • Loading branch information
erank committed May 9, 2011
1 parent 64ace86 commit 7b8cecb
Showing 1 changed file with 61 additions and 0 deletions.
61 changes: 61 additions & 0 deletions doc/Query Cache Overview.htm
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<html>
<head>
<title>Query Cache</title>
</head>
<body>
<h3>General</h3>
<p>
The query cache is a generic mechanism for caching the results of database queries using memcache.
The query cache uses a single shared memcache that will be added to each datacenter, and
not the machine local memcaches we use today.
The query cache will in time deprecate several caches we have today such as partnerPeer::retrieveByPK
and entryPeer::doCountWithCache.
</p>
<h3>How does it work ?</h3>
<p>
Let's take an example: <pre>Select * from flavor_asset where entry_id=x and partner_id=y and …</pre>
The first time this query is performed, it will get to the database. After the query completes and the
Propel objects are populated, the objects are serialized and stored to the memcache. The key used
for caching is a hash of the query string (actually, we use md5(serialize($criteria)) to avoid building the
query string unnecessarily). The next time the same query is performed it will return the results from
the memcache.
</p>
<h3>What is the performance gain ?</h3>
<p>
<ul>
<li>Complex select queries that have several conditions / count queries will be replaced by a simple
retrieveByPK-like query on the memcache.
<li>The query cache stores serialized objects, saving the time of the Propel hydration process.
</ul>
</p>
<h3>How do we know when a cached query is valid ?</h3>
<p>
Every query that is cached is associated with at least one 'invalidation key', each invalidation key holds
the time of the last relevant change to the database, in the example above, we use the key:
'flavor_asset:entry_id=x'. Before we return a cached query from the memcache, we compare the time
of the cached query to the time saved in all relevant invalidation keys. If one of the invalidation keys
is newer than the cached query, the cached query is treated as invalid and won't be used.
</p>
<h3>When do we update the invalidation keys ?</h3>
<p>
Whenever a flavor asset object of entry_id x is saved, it will also update the time saved in the memcache
under 'flavor_asset:entry_id=x', thus invalidating all the queries that contained entry_id=x.
On single datacenter environments the invalidation keys can be updated automatically by the 'save'
functions. On multi datacenter environments this won't work because it won't invalidate the queries
that are cached on the remote DC. So, instead, we'll define UDFs (User Defined Functions) on the
database that will perform the invalidation - whether the database was modified locally or by the
replication.
</p>
<h3>What's needed to add a new query to the cache ?</h3>
<p>
Override 2 functions:
<ul>
<li>&lt;object&gt;::getCacheInvalidationKeys – returns a list of invalidation keys that should be updated when
$this object is saved.
<li>&lt;peer&gt;::getCacheInvalidationKeys – returns a list of invalidation keys that should be checked before
the supplied $criteria can be returned from the cache.
See asset.php & assetPeer.php as an example.
</ul>
</p>
</body>
</html>

0 comments on commit 7b8cecb

Please sign in to comment.