Updated docs to document uint for count, plus list glyphs fully (holo…

…viz#923)
stjordanis · May 25, 2020 · 3ebf18f · 3ebf18f
1 parent 26f03fa
commit 3ebf18f
Show file tree

Hide file tree

Showing 2 changed files with 26 additions and 15 deletions.
diff --git a/datashader/reductions.py b/datashader/reductions.py
@@ -123,7 +123,7 @@ def _build_finalize(self, dshape):
 
 
 class OptionalFieldReduction(Reduction):
-    """Base class for things like ``count`` or ``any``"""
+    """Base class for things like ``count`` or ``any`` for which the field is optional"""
     def __init__(self, column=None):
         self.column = column
 
@@ -216,7 +216,7 @@ def finalize(bases, cuda=False, **kwargs):
         return finalize
 
 class count(OptionalFieldReduction):
-    """Count elements in each bin.
+    """Count elements in each bin, returning the result as a uint32.
 
     Parameters
     ----------

diff --git a/examples/getting_started/2_Pipeline.ipynb b/examples/getting_started/2_Pipeline.ipynb
@@ -145,34 +145,45 @@
     "\n",
     "<!-- This section really belongs under Scene, above-->\n",
     "\n",
-    "Once a `Canvas` object has been specified, it can then be used to guide aggregating the data into a fixed-sized grid. You'll first need to know what your data points represent, i.e., what form each datapoint should take as it maps onto the rectangular grid. The library currently supports:\n",
-    "   - **Canvas.points**: mapping each datapoint into the single closest grid cell to that datapoint's location\n",
-    "   - **Canvas.lines**: mapping each datapoint into every grid cell falling between this point's location and the next.  \n",
-    "   - **Canvas.raster**: mapping each datapoint into an axis-aligned rectangle forming a regular grid with adjacent points.\n",
+    "Once a `Canvas` object has been specified, it can then be used to guide aggregating the data into a fixed-sized grid. Data is assumed to consist of a series of items, each of which has some visible representation (its rendering as a \"glyph\") that is combined with the representation of other items to produce an aggregate representation of the whole set of items in the rectangular grid. The available glyph types for representing a data item are currently:\n",
+    "   - **Canvas.points**: each data item is a coordinate location (an x,y pair), mapping into the single closest grid cell to that datapoint's location.\n",
+    "   - **Canvas.line**: each data item is a coordinate location, mapping into every grid cell falling between this point's location and the next in a straight line segment.\n",
+    "   - **Canvas.area**: each data item is a coordinate location, rendered as a shape filling the axis-aligned area between this point, the next point, and a baseline (e.g. zero, filling the area between a line and a base).\n",
+    "   - **Canvas.trimesh**: each data item is a triple of coordinate locations specifying a triangle, filling in the region bounded by that triangle.\n",
+    "   - **Canvas.polygons**: each data item is a sequence of coordinate locations specifying a polygon, filling in the region bounded by that polygon (minus holes if specified separately).\n",
+    "   - **Canvas.raster**: the collection of data items is an array specifying regularly spaced axis-aligned rectangles forming a regular grid; each cell in this array is rendered as a filled rectangle.\n",
+    "   - **Canvas.quadmesh**: the collection of data items is an array specifying irregularly spaced quadrilaterals forming a grid that is regular in the input space but can have arbitrary rectilinear or curvilinear shapes in the aggregate grid; each cell in this array is rendered as a filled quadrilateral.\n",
     "\n",
-    "Datashader can be extended to add additional types here and in each section below; see  [Extending Datashader](../user_guide/9-Extending.ipynb) for more details.  Other plots like time series and network graphs are constructed out of these basic primitives.\n",
+    "These types are each covered in detail in the [User Guide](../user_guide/).  Datashader can be extended to add additional types here and in each section below; see  [Extending Datashader](../user_guide/9-Extending.ipynb) for more details.  Many other plots like time series and network graphs can be constructed out of these basic primitives.\n",
     "\n",
     "\n",
-    "<!-- (to here) -->\n",
-    "\n",
+    "<!-- (to here) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "### Reductions\n",
     "\n",
-    "One you have determined your mapping, you'll next need to choose a reduction operator to use when aggregating multiple datapoints into a given pixel. All of the currently supported reduction operators are incremental, which means that we can efficiently process datasets in a single pass. Given an aggregate bin to update (typically corresponding to one eventual pixel) and a new datapoint, the reduction operator updates the state of the bin in some way. (Actually, datapoints are normally processed in batches for efficiency, but it's simplest to think about the operator as being applied per data point, and the mathematical result should be the same.) A large number of useful [reduction operators]((https://datashader.org/api.html#reductions) are supplied in `ds.reductions`, including:\n",
+    "One you have determined your mapping, you'll next need to choose a reduction operator to use when aggregating multiple datapoints into a given pixel. For points, each datapoint is mapped into a single pixel, while the other glyphs have spatial extent and can thus map into multiple pixels, each of which operates the same way. All glyphs act like points if the entire glyph is contained within that pixel. Here we will talk only about \"datapoints\" for simplicity, which for an area-based glyph should be interpreted as \"the part of that glyph that falls into this pixel\".\n",
+    "\n",
+    "All of the currently supported reduction operators are incremental, which means that we can efficiently process datasets in a single pass. Given an aggregate bin to update (typically corresponding to one eventual pixel) and a new datapoint, the reduction operator updates the state of the bin in some way. (Actually, datapoints are normally processed in batches for efficiency, but it's simplest to think about the operator as being applied per data point, and the mathematical result should be the same.) A large number of useful [reduction operators]((https://datashader.org/api.html#reductions) are supplied in `ds.reductions`, including:\n",
     "\n",
     "**`count(column=None)`**:\n",
-    "  increment an integer count each time a datapoint maps to this bin.\n",
+    "  increment an integer count each time a datapoint maps to this bin. The resulting aggregate array will be an unsigned integer type, allowing counts to be distinguished from the other types that are normally floating point.\n",
     "  \n",
     "**`any(column=None)`**:\n",
     "  the bin is set to 1 if any datapoint maps to it, and 0 otherwise.\n",
     "  \n",
     "**`sum(column)`**:\n",
     "  add the value of the given column for this datapoint to a running total for this bin.\n",
     "   \n",
-    "**`count_cat(column)`**:\n",
-    "  given a bin with categorical data (i.e., [Pandas' `categorical` datatype](https://pandas-docs.github.io/pandas-docs-travis/categorical.html)), count each category separately, adding the given datapoint to an appropriate category within this bin.  These categories can later be collapsed into a single count if needed; see example below.\n",
+    "**`by(column, reduction)`**:\n",
+    "  given a bin with categorical data (i.e., [Pandas' `categorical` datatype](https://pandas-docs.github.io/pandas-docs-travis/categorical.html)), aggregate each category separately, accumulating the given datapoint in an appropriate category within this bin.  These categories can later be collapsed into a single aggregate if needed; see examples below.\n",
     "  \n",
     "**`summary(name1=op1,name2=op2,...)`**:\n",
-    "  allows multiple reduction operators to be computed in a single pass over the data; just provide a name for each resulting aggregate and the corresponding reduction operator to use when creating that aggregate.\n",
+    "  allows multiple reduction operators to be computed in a single pass over the data; just provide a name for each resulting aggregate and the corresponding reduction operator to use when creating that aggregate. If multiple aggregates are needed for the same dataset and the same Canvas, using `summary` will generally be much more efficient than making multiple separate passes over the dataset.\n",
     "  \n",
     "The API documentation contains the complete list of [reduction operators]((https://datashader.org/api.html#reductions) provided, including `mean`, `min`, `max`, `var` (variance), `std` (standard deviation).  The reductions are also imported into the ``datashader`` namespace for convenience, so that they can be accessed like ``ds.mean()`` here.\n",
     "\n",
@@ -372,7 +383,7 @@
     "\n",
     "In each of the above examples, you may have noticed that we were never required to specify any parameters about the data values; the plots just appear like magic.  That magic is implemented in `tf.shade`.  What `tf.shade` does for a 2D aggregate (non-categorical) is:\n",
     "\n",
-    "1. **Mask** out all bins with a `NaN` value (for floating-point arrays) or a zero value (for unsigned integer count arrays); these bins will not have any effect on subsequent computations.  Unfortunately, integer arrays do not support `NaN`; using zero as a pseudo-`NaN` works well for counts but not for all integer data, which is something that may need to be generalized in a future version of the library (a [to-do item](https://github.com/bokeh/datashader/issues/142)).\n",
+    "1. **Mask** out all bins with a `NaN` value (for floating-point arrays) or a zero value (for the unsigned integer arrays that are returned from `count`); these bins will not have any effect on subsequent computations.  \n",
     "\n",
     "2. **Transform** the bin values using a specified scalar function `how`.  Calculates the value of that function for the difference between each bin value and the minimum non-masked bin value.  E.g. for `how=\"linear\"`, simply returns the difference unchanged.  Other `how` functions are discussed below.\n",
     "\n",