Updated DataSet and DataPoint objects to now make available all data …

…returned in a Data Feed as object attributes. Moved Data Feed parsing logic out of `Account.get_data()`. `DataSet` now takes care of much of the parsing directly as it creates its `DataPoint` elements. Updated docs and tests for new features. BACKWARDS INCOMPATIBLE: Removed DataSet.dict property as this was broken for multiple metrics/dimensions (i.e. it only worked if there was *exactly* one dimension and one metric each).
clintecker · Dec 4, 2009 · a8ee3e6 · a8ee3e6
1 parent ab84c05
commit a8ee3e6
Show file tree

Hide file tree

Showing 5 changed files with 218 additions and 82 deletions.
diff --git a/USAGE.md b/USAGE.md
@@ -59,8 +59,9 @@ Here's a really basic call:
 >>> account = connection.get_account('1234')
 >>> start_date = datetime.date(2009, 04, 10)
 >>> end_date = datetime.date(2009, 04, 10)
->>> account.get_data(start_date, end_date, metrics=['pageviews'])
-[&lt;DataPoint: ga:4567 / None&gt;]
+>>> data = account.get_data(start_date, end_date, metrics=['pageviews'])
+>>> data.list
+[[[], [4567]]]
 </pre>
 
 You can optionally retrieve metrics by various dimensions, such as a list of browsers that accessed your site in your timeframe and how many page views each of those browsers generated.
@@ -72,8 +73,9 @@ You can optionally retrieve metrics by various dimensions, such as a list of bro
 >>> account = connection.get_account('1234')
 >>> start_date = datetime.date(2009, 04, 10)
 >>> end_date = datetime.date(2009, 04, 10)
->>> account.get_data(start_date, end_date, metrics=['pageviews'], dimensions=['browser',])
-[&lt;DataPoint: ga:6367750 / ga:browser=Chrome&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Firefox&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Internet Explorer&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Mozilla Compatible Agent&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Safari&gt;]
+>>> data = account.get_data(start_date, end_date, metrics=['pageviews'], dimensions=['browser',])
+>>> data.list
+[[['Chrome'], [43293]], [['Firefox'], [6367750]], [['Internet Explorer'], [5391084]], [['Mozilla Compatible Agent'],[238179]], [['Safari'], [567432]]]
 </pre>
 
 You could get Google to sort that for you (note FireFox is first now):
@@ -85,8 +87,9 @@ You could get Google to sort that for you (note FireFox is first now):
 >>> account = connection.get_account('1234')
 >>> start_date = datetime.date(2009, 04, 10)
 >>> end_date = datetime.date(2009, 04, 10)
->>> account.get_data(start_date, end_date, metrics=['pageviews',], dimensions=['browser',], sort=['-pageviews',])
-[&lt;DataPoint: ga:6367750 / ga:browser=Firefox&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Internet Explorer&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Safari&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Chrome&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Mozilla Compatible Agent&gt;]
+>>> data = account.get_data(start_date, end_date, metrics=['pageviews',], dimensions=['browser',], sort=['pageviews',])
+>>> data.list
+[[['Firefox'], [6367750]], [['Internet Explorer'], [5391084]], [['Safari'], [567432]], [['Mozilla Compatible Agent'],[238179]], [['Chrome'], [43293]]]
 </pre>
 
 And you could do some fun filtering, get a list of browsers, sorted descending by page views, and filtered to only contain browser strings which match the three regexs below (starting with Fire OR Internet OR Saf):
@@ -103,15 +106,16 @@ And you could do some fun filtering, get a list of browsers, sorted descending b
 ...   ['browser', '=~', '^Internet', 'OR'],
 ...   ['browser', '=~', '^Saf'],
 ... ]
->>> account.get_data(start_date, end_date, metrics=['pageviews',], dimensions=['browser',], sort=['-pageviews',], filters=filters)
-[&lt;DataPoint: ga:6367750 / ga:browser=Firefox&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Internet Explorer&gt;, &lt;DataPoint: ga:6367750 / ga:browser=Safari&gt;]
+>>> data = account.get_data(start_date, end_date, metrics=['pageviews',], dimensions=['browser',], sort=['-pageviews',], filters=filters)
+>>> data.list
+[[['Firefox'], [6367750]], [['Internet Explorer'], [5391084]], [['Safari'], [567432]]]
 </pre>
 
 ### Data ###
 
-At this point you should be asking me how this data is returned to you.  In the above examples, the data is returned as a `googleanalytics.data.DataSet` object which is essentially a Python list with three "properties" (`list`/`tuple`/`dict`) added to it.  This list is populated with `googleanalytics.data.DataPoint` objects.  Each of these has an associated dimension and metric (i.e. "Firefox" and "30293") and a little more data.
+At this point you should be asking me how this data is returned to you.  In the above examples, the data is returned as a `googleanalytics.data.DataSet` object which is essentially a Python list with two shortcut "properties" (`list`/`tuple`) added to it.  This list is populated with `googleanalytics.data.DataPoint` objects.  Each `DataPoint` has `dimensions` and `metrics` properties, which are just arrays of `googleanalytics.data.Dimension` and `googleanalytics.data.Metric` objects respectively.
 
-So how do you get useful data?  You _could_ iterate over the `DataSet` and access each `DataPoint`'s metric and dimension properties directly, or you could output the whole dataset as a list of lists, tuple or tuples, or dictionary.  Example:
+So how do you get useful data?  The quickest path to the dimension and metric data is to output the whole dataset as a list of lists or a tuple of tuples.  Example:
 
 <pre>
 >>> from googleanalytics import Connection
@@ -122,20 +126,17 @@ So how do you get useful data?  You _could_ iterate over the `DataSet` and acces
 >>> end_date = datetime.date(2009, 04, 10)
 >>> data = account.get_data(start_date, end_date, metrics=['pageviews',], dimensions=['browser',], sort=['-pageviews',])
 >>> data.list
-[['Firefox', 21], ['Internet Explorer', 17], ['Safari', 17], ['Chrome', 6], ['Mozilla Compatible Agent', 5]]
+[[['Firefox'], [21]], [['Internet Explorer'], [17]], [['Safari'], [17]], [['Chrome'], [6]], [['Mozilla Compatible Agent'], [5]]]
 >>> data.tuple
-(('Firefox', 21), ('Internet Explorer', 17), ('Safari', 17), ('Chrome', 6), ('Mozilla Compatible Agent', 5))
->>> data.dict
-{'Chrome': 6, 'Internet Explorer': 17, 'Firefox': 21, 'Safari': 17, 'Mozilla Compatible Agent': 5}
+((['Firefox'], [21]), (['Internet Explorer'], [17]), (['Safari'], [17]), (['Chrome'], [6]), (['Mozilla Compatible Agent'], [5]))
 </pre>
 
-If you're concerned with the sort-order, you shouldn't really use the `dict` output as order isn't guaranteed.  `list` and `tuple` will retain the sorting order that Google Analytics output the data in.
+`list` and `tuple` will retain the sorting order of the Google Analytics results.  Each item in `list` or `tuple` are an ordered pair of lists.  The first list is the dimensions (which will be an empty list if no dimensions were defined). The second list contains the metrics, in the order they were requested in the get_data call.
 
-If you don't add these, we can't really test any future data pulling, and some of the account stuff.  In the future perhaps we can build a list of accounts from get_all_accounts and proceed that way.
 
 #### Pulling multiple dimensions/metrics ####
 
-Patrick Collison has graciously implemented pulling multiple metrics and data in a single request.  Instead of simply passing in a list with one metric or dimension, pass in as many as you like<sup>1</sup>
+Patrick Collison has graciously implemented pulling multiple metrics and data in a single request.  Instead of simply passing in a list with one metric or dimension, pass in as many as you like<sup>1</sup>.  The metrics will be returned in the order they were requested in the get_data call.
 
 <pre>
 >>> from googleanalytics import Connection
@@ -144,15 +145,106 @@ Patrick Collison has graciously implemented pulling multiple metrics and data in
 >>> account = connection.get_account('1234')
 >>> end_date = datetime.datetime.today()
 >>> start_date = end_date-datetime.timedelta(days=2)
->>> data = account.get_data(start_date, end_date, metrics=['pageviews','timeOnPage','entrances'], dimensions=['pageTitle', 'pagePath'], max_results=10)
->>> data
-[&lt;DataPoint: ga:7337113 / ga:pageTitle=How to find out more about Clint Ecker - Django Developer | ga:pagePath=/&gt;]
-&gt;&gt;&gt; data.tuple
-((['How to find out more about Clint Ecker - Django Developer', '/'], [5, '0.0', 5]),)
+>>> metrics = ['pageviews','timeOnPage','entrances']
+>>> dimensions = ['pageTitle', 'pagePath']
+>>> data = account.get_data(start_date, end_date, metrics=metrics, dimensions=dimensions, max_results=1)
+>>> data.list
+[[[u'How to find out more about Clint Ecker - Django Developer', u'/'], [5, '0.0', 5]]]
+>>> for row in data.list:
+...     print dict(zip(dimensions, row[0]))
+...     print dict(zip(metrics, row[1]))
+...     print '.'*50
+...
+{'pageTitle': u'How to find out more about Clint Ecker - Django Developer', 'pagePath': u'/'}
+{'entrances': 5, 'pageviews': 5, 'timeOnPage': u'0.0'}
 </pre>
 
 1: The Google Analytics API allows a maximum of 10 metrics and 7 dimensions for a given query, although not every metric/dimension combination is valid. See [the official docs](http://code.google.com/intl/en-US/apis/analytics/docs/gdata/gdataReferenceValidCombos.html) for more details.
 
+#### Working with `Dimension` and `Metric` objects ####
+
+Google Analytics returns far more data than just the metrics and dimensions. As such, `DataSet` (and the `DataPoint` objects the `DataSet` contains) have many attributes which make this data available.  For more information on the exact data that is returned, [see the official docs](http://code.google.com/intl/en-US/apis/analytics/docs/gdata/gdataReferenceDataFeed.html#dataResponse).  In short, all of the top-level Data Feed attributes are direct attributes of the `DataSet` instance and all of the `entry` attributes of the Data Feed are direct attributes of the `DataPoint` instances.  These attributes are named identically to the names used in the returned Data Feed, with the leading xml namespace removed (e.g. 'dxp:startDate' becomes simply 'startDate').
+
+Assume the following code as given in the following examples:
+
+<pre>
+>>> from googleanalytics import Connection
+>>> import datetime
+>>> connection = Connection('[email protected]', 'fakefake')
+>>> account = connection.get_account('1234')
+>>> end_date = datetime.datetime.today()
+>>> start_date = end_date-datetime.timedelta(days=2)
+>>> metrics = ['pageviews','timeOnPage','entrances']
+>>> dimensions = ['pageTitle', 'pagePath']
+>>> dataset = account.get_data(start_date, end_date, metrics=metrics, dimensions=dimensions, max_results=1)
+</pre>
+
+One example of a `DataSet` level attribute is the property 'aggregates' which is an array of `Metric` objects. This property is aggregate metric data, irrespective of any dimensions for the given time span:
+<pre>
+>>> dataset.aggregates
+[<googleanalytics.data.Metric object at 0x102094b10>, <googleanalytics.data.Metric object at 0x102094ed0>, <googleanalytics.data.Metric object at 0x102094f10>]
+>>> for metric in dataset.aggregates:
+...     print "%s => %s" % (metric.name, metric.value)
+...
+pageviews => 217870
+timeOnPage => 1.2157589E7
+entrances => 63873
+</pre>
+
+The aggregate metric values are also available as direct attributes of the `DataSet` object:
+
+<pre>
+>>> dataset.pageviews
+217870
+>>> dataset.timeOnPage
+1.2157589E7
+>>> dataset.entrances
+63873
+</pre>
+
+`DataPoint` objects are comprised of `Metric` and `Dimension` objects (if applicable).  These metrics and dimensions are available directly through arrays:
+
+<pre>
+>>> dataset
+[<googleanalytics.data.DataPoint object at 0x102094f50>]
+>>> datapoint = dataset[0]
+>>> datapoint.metrics
+[<googleanalytics.data.Metric object at 0x102094b10>, <googleanalytics.data.Metric object at 0x102094ed0>, <googleanalytics.data.Metric object at 0x102094f10>]
+>>> datapoint.dimensions
+[<googleanalytics.data.Dimension object at 0x1020990d0>, <googleanalytics.data.Dimension object at 0x102099110>]
+</pre>
+
+As with the aggregates array attribute of `DataSet`, each of these metrics and dimensions are also direct attributes of the `DataPoint` object:
+
+<pre>
+>>> datapoint.pageviews
+5
+>>> datapoint.timeOnPage
+u'0.0'
+>>> datapoint.entrances
+5
+>>> datapoint.pageTitle
+u'How to find out more about Clint Ecker - Django Developer'
+>>> datapoint.pagePath
+u'/'
+</pre> 
+
+If you really need the low level data for each metric, then you should iterate through the `metrics` array of the `DataPoint` instance:
+
+<pre>
+>>> metric = datapoint.metrics[0]
+>>> metric.name
+u'pageviews'
+>>> metric.value
+5
+>>> metric.type
+u'integer'
+>>> metric.confidenceInterval
+u'0.0'
+</pre>
+
+For now, all metric values are returned as strings except in the case of when `type` is `u'integer'`. The code will cast the metric value as an integer in this case.
+
 #### Pagination in data results ####
 
 Robert Kosera has added `max_results` and `start_index` to `account.get_data` and they work just like you might expect.  There are examples in tests.py
diff --git a/src/googleanalytics/account.py b/src/googleanalytics/account.py
@@ -1,12 +1,9 @@
 from googleanalytics.exception import GoogleAnalyticsClientError
-from googleanalytics.data import DataPoint, DataSet
+from googleanalytics.data import DataSet
 
 import urllib
 
 filter_operators = ['==', '!=', '>', '<', '>=', '<=', '=~', '!~', '=@', '!@']
-data_converters = {
-   'integer': int,
-}
 
 class Account:
     def __init__(self, connection=None, title=None, id=None,
@@ -149,7 +146,9 @@ def get_data(self, start_date, end_date, metrics, dimensions=[], sort=[], filter
 
         if dimensions:
             data['dimensions'] = ",".join(['ga:' + d for d in dimensions])
+
         data['metrics'] = ",".join(['ga:' + m for m in metrics])
+
         if sort:
             _sort = []
             for s in sort:
@@ -159,36 +158,15 @@ def get_data(self, start_date, end_date, metrics, dimensions=[], sort=[], filter
                     s = s[1:]
                 _sort.append(pre + s)
             data['sort'] = ",".join(_sort)
+
         if filters:
             filter_string = self.process_filters(filters)
             data['filters'] = filter_string
 
-        processed_data = DataSet()
         data = urllib.urlencode(data)
-
         response = self.connection.make_request('GET', path=path, data=data)
         raw_xml = response.read()
-        xml_tree = self.connection.parse_response(raw_xml)
-        data_rows = xml_tree.getiterator('{http://www.w3.org/2005/Atom}entry')
-        for row in data_rows:
-            values = {}
-            ms = row.findall('{http://schemas.google.com/analytics/2009}metric')
-            ds = row.findall('{http://schemas.google.com/analytics/2009}dimension')
-            title = row.find('{http://www.w3.org/2005/Atom}title').text
-            if len(ms) == 0:
-                continue
-            # detect datatype and convert if possible
-            for m in ms:
-                if m.attrib['type'] in data_converters.keys():
-                    m.attrib['value'] = data_converters[m.attrib['type']](m.attrib['value'])
-            dp = DataPoint(
-                account=self,
-                connection=self.connection,
-                title=title,
-                metrics=[m.attrib['value'] for m in ms],
-                dimensions=[d.attrib['value'] for d in ds]
-            )
-            processed_data.append(dp)
+        processed_data = DataSet(raw_xml)
         return processed_data
 
     def process_filters(self, filters):

diff --git a/src/googleanalytics/connection.py b/src/googleanalytics/connection.py
@@ -41,7 +41,7 @@ def get_accounts(self, start_index=1, max_results=None):
         data = urllib.urlencode(data)
         response = self.make_request('GET', path, data=data)
         raw_xml = response.read()
-        xml_tree = self.parse_response(raw_xml)
+        xml_tree = ElementTree.fromstring(raw_xml)
         account_list = []
         accounts = xml_tree.getiterator('{http://www.w3.org/2005/Atom}entry')
         for account in accounts:
@@ -74,10 +74,6 @@ def get_account(self, profile_id):
             if account.profile_id == profile_id:
                 return account
 
-    def parse_response(self, xml):
-        tree = ElementTree.fromstring(xml)
-        return tree
-
     def make_request(self, method, path, headers=None, data=''):
         if headers == None:
             headers = {