Added the query cache overview document

git-svn-id: svn+ssh://kelev.kaltura.com/usr/local/kalsource/backend/server/trunk/core@61996 6b8eccd3-e8c5-4e7d-8186-e12b5326b719
AnnaAleksandrowicz · May 9, 2011 · 7b8cecb · 7b8cecb
1 parent 64ace86
commit 7b8cecb
Showing 1 changed file with 61 additions and 0 deletions.
diff --git a/doc/Query Cache Overview.htm b/doc/Query Cache Overview.htm
@@ -0,0 +1,61 @@
+<html>
+<head>
+<title>Query Cache</title>
+</head>
+<body>
+<h3>General</h3>
+<p>
+The query cache is a generic mechanism for caching the results of database queries using memcache.
+The query cache uses a single shared memcache that will be added to each datacenter, and 
+not the machine local memcaches we use today.
+The query cache will in time deprecate several caches we have today such as partnerPeer::retrieveByPK 
+and entryPeer::doCountWithCache.
+</p>
+<h3>How does it work ?</h3>
+<p>
+Let's take an example: <pre>Select * from flavor_asset where entry_id=x and partner_id=y and …</pre>
+The first time this query is performed, it will get to the database. After the query completes and the
+Propel objects are populated, the objects are serialized and stored to the memcache. The key used 
+for caching is a hash of the query string (actually, we use md5(serialize($criteria)) to avoid building the 
+query string unnecessarily). The next time the same query is performed it will return the results from
+the memcache.
+</p>
+<h3>What is the performance gain ?</h3>
+<p>
+<ul>
+<li>Complex select queries that have several conditions / count queries will be replaced by a simple 
+retrieveByPK-like query on the memcache.
+<li>The query cache stores serialized objects, saving the time of the Propel hydration process.
+</ul>
+</p>
+<h3>How do we know when a cached query is valid ?</h3>
+<p>
+Every query that is cached is associated with at least one 'invalidation key', each invalidation key holds
+the time of the last relevant change to the database, in the example above, we use the key:
+'flavor_asset:entry_id=x'. Before we return a cached query from the memcache, we compare the time
+of the cached query to the time saved in all relevant invalidation keys. If one of the invalidation keys
+is newer than the cached query, the cached query is treated as invalid and won't be used.
+</p>
+<h3>When do we update the invalidation keys ?</h3>
+<p>
+Whenever a flavor asset object of entry_id x is saved, it will also update the time saved in the memcache
+under 'flavor_asset:entry_id=x', thus invalidating all the queries that contained entry_id=x.
+On single datacenter environments the invalidation keys can be updated automatically by the 'save'
+functions. On multi datacenter environments this won't work because it won't invalidate the queries
+that are cached on the remote DC. So, instead, we'll define UDFs (User Defined Functions) on the 
+database that will perform the invalidation - whether the database was modified locally or by the 
+replication.
+</p>
+<h3>What's needed to add a new query to the cache ?</h3>
+<p>
+Override 2 functions:
+<ul>
+<li>&lt;object&gt;::getCacheInvalidationKeys – returns a list of invalidation keys that should be updated when
+$this object is saved.
+<li>&lt;peer&gt;::getCacheInvalidationKeys – returns a list of invalidation keys that should be checked before
+the supplied $criteria can be returned from the cache.
+See asset.php & assetPeer.php as an example.
+</ul>
+</p>
+</body>
+</html>