Latest v1.6.10 (#344)

* v1.6.10-rc0 * v1.6.10-rc1 --------- Co-authored-by: rtosholdings-bot <[email protected]>
rtosholdings · Apr 13, 2023 · b7715da · b7715da
1 parent 4944f8b
commit b7715da
Show file tree

Hide file tree

Showing 17 changed files with 177 additions and 166 deletions.
diff --git a/conda_recipe/meta.yaml b/conda_recipe/meta.yaml
@@ -16,7 +16,7 @@ requirements:
     - setuptools_scm
   run:
     - python
-    - riptide_cpp >=1.12.0,<2 # run with any (compatible) version in this range
+    - riptide_cpp >=1.12.1,<2 # run with any (compatible) version in this range
     - pandas >=0.24,<2.0
     - ansi2html >=1.5.2
     - ipykernel

diff --git a/docs/source/tutorial/RiptableExercises.ipynb b/docs/source/tutorial/RiptableExercises.ipynb
@@ -327,10 +327,11 @@
    "source": []
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Create a DateTimeNano of the combined TradeDateTime by simple addition. Riptable knows how to sum the types.**\n",
+    "**Create a DateTimeNano of the combined TradeTime + Date by simple addition. Riptable knows how to sum the types.**\n",
     "\n",
     "Be careful here, by default you'll get a GMT timezone, you can force NYC with `rt.DateTimeNano(..., from_tz='NYC')`."
    ]
@@ -887,7 +888,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "riptable-sphinxdoc",
    "language": "python",
    "name": "python3"
   },
@@ -901,7 +902,12 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.13"
+   "version": "3.10.6"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "bd27bbf9d08d999c15d6ab686ecdc65a1056d9b0e13010aed0eef84441088a82"
+   }
   }
  },
  "nbformat": 4,

diff --git a/docs/source/tutorial/RiptableSolutions.ipynb b/docs/source/tutorial/RiptableSolutions.ipynb
@@ -489,10 +489,11 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Create a DateTimeNano of the combined TradeDateTime by simple addition. Riptable knows how to sum the types.**\n",
+    "**Create a DateTimeNano of the combined TradeTime + Date by simple addition. Riptable knows how to sum the types.**\n",
     "\n",
     "Be careful here, by default you'll get a GMT timezone, you can force NYC with `rt.DateTimeNano(..., from_tz='NYC')`."
    ]
@@ -1362,7 +1363,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "riptable-sphinxdoc",
    "language": "python",
    "name": "python3"
   },
@@ -1376,7 +1377,12 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.13"
+   "version": "3.10.6"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "bd27bbf9d08d999c15d6ab686ecdc65a1056d9b0e13010aed0eef84441088a82"
+   }
   }
  },
  "nbformat": 4,

diff --git a/docs/source/tutorial/tutorial_categoricals.rst b/docs/source/tutorial/tutorial_categoricals.rst
@@ -56,7 +56,10 @@ to sum for each group::
     TSLA         50
 
 A Dataset is returned containing the groups from the Categorical and the result
-of the operation we called on each group.
+of the operation we called on each group. 
+
+Note the prepended '*' in the Symbol column. This indicates that the column 
+was used as the grouping variable in an operation.
 
 Categoricals as Split, Apply, Combine Operations
 ------------------------------------------------
@@ -117,17 +120,17 @@ of the original data::
 The alignment of the result to the original data is easier to see if you add 
 the results to the Dataset::
 
-    >>> ds.CumValue = ds.Symbol.cumsum(ds.Value)
-    >>> # Sort to make the cumulative sum more clear, then display only the relevant columns.
-    >>> ds.sort_copy('Symbol')
-    #   Symbol   Value2   CumSum
-    -   ------   ------   ------
-    0   AAPL          2        2
-    1   AAPL          5        7
-    2   MSFT         10       10
-    3   MSFT          8       18
-    4   TSLA         25       25
-    5   TSLA         20       45
+    >>> ds.CumValue2 = ds.Symbol.cumsum(ds.Value2)
+    >>> # Sort to make the cumulative sum per group more clear, then display only the relevant columns.
+    >>> ds.sort_copy('Symbol').col_filter(['Symbol', 'Value2', 'CumValue2'])
+    #   Symbol   Value2   CumValue2
+    -   ------   ------   ---------
+    0   AAPL          2           2
+    1   AAPL          5           7
+    2   MSFT         10          10
+    3   MSFT          8          18
+    4   TSLA         25          25
+    5   TSLA         20          45
 
 A commonly used non-reducing function is ``shift()``. You can use it to
 compare values with shifted versions of themselves – for example,
@@ -438,6 +441,9 @@ omitted from calculations on the Categorical::
     b            3
     c            0
 
+Note that the first column in the output is labeled 'key_0'. This was code-generated because there was no explicit column name declaration. 
+You can use the :meth:`.FastArray.set_name` method to assign a column name to the Categorical before doing any grouping operations.
+The Count column was created by the ``count()`` method.
 
 Filter Values or Categories from Certain Categorical Operations
 ---------------------------------------------------------------
@@ -468,7 +474,7 @@ for only that operation.
 
 To filter out an entire category::
 
-    >>> ds.Symbol.mean(ds.value, filter=ds.Symbol != 'MSFT')
+    >>> ds.Symbol.mean(ds.Value, filter=ds.Symbol != 'MSFT')
     *Symbol   Value
     -------   -----
     AAPL      10.00
@@ -746,8 +752,8 @@ Notice that the buckets form the groups of a Categorical::
       FastArray([b'-3.011->221.182', b'221.182->445.376', b'445.376->669.569', b'669.569->893.763', b'893.763->1117.956'], dtype='|S17') Unique count: 5
 
 To choose your own intervals, provide the endpoints. Here, we define
-bins that cover two intervals: one bin for prices from 0 to 500 (0
-excluded), and one for prices from 500 to 1,000 (500 excluded)::
+bins that cover two intervals: one bin for prices from 0 to 600 (0
+excluded), and one for prices from 600 to 1,200 (600 excluded)::
 
     >>> buckets = [0, 600, 1200]
     >>> ds2.PriceBucket2 = rt.cut(ds2.Price, buckets)
@@ -1168,6 +1174,8 @@ Our second function performs two non-reducing operations::
 Because the operations in this function are non-reducing operations, the
 resulting Dataset is expanded.
 
+Note that until a reported bug is fixed, column names might not persist through grouping operations.
+
 In the next section, `Accums <tutorial_accums.rst>`__, we look at
 another way to do multi-key groupings with fancier output.
 

diff --git a/docs/source/tutorial/tutorial_datasets.rst b/docs/source/tutorial/tutorial_datasets.rst
@@ -1,4 +1,4 @@
-Intro to Riptable Datasets, FastArrays, and Structs
+Riptable Datasets, FastArrays, and Structs
 ===================================================
 
 What Is a Dataset?
@@ -58,6 +58,9 @@ Another way to think of a Dataset is as a dictionary of same-length
 FastArrays, where each key is a column name that’s mapped to a FastArray
 of values that all have the same dtype.
 
+For Python dictionary details, see `Python’s
+documentation <https://docs.python.org/3/tutorial/datastructures.html#dictionaries>`__.
+
 Use the Dataset Constructor with Dictionary-Style Input
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -371,7 +374,7 @@ statistics:
 =============== ==============================
 Count           Total number of items
 Valid           Total number of valid values
-Nans            Total number of NaN values
+Nans            Total number of NaN values*
 Mean            Mean
 Std             Standard deviation
 Min             Minimum value
@@ -384,6 +387,9 @@ Max             Maximum value
 MeanM           Mean without top or bottom 10%
 =============== ==============================
 
+\*NaN stands for Not a Number, and is commonly used to represent missing data. 
+For details, see `Working with Missing Data <tutorial_missing_data.rst>`__.
+
 You can also use ``describe()`` on a single column::
 
     >>> ds2.Value.describe()
@@ -404,7 +410,8 @@ You can also use ``describe()`` on a single column::
     MeanM     0.54
 
 If your Dataset is very large, you can get column statistics with
-``statx()``, which you can import from ``riptable.rt_stats``. It gives
+``statx()``, which you can import from ``riptable.rt_stats``. 
+``statx()`` provides rapid sampling and gives
 you a few more percentiles than ``describe()`` does, but it works only
 on one column at a time::
 
@@ -436,6 +443,10 @@ See a column’s unique values::
     >>> ds2.Symbol.unique()
     FastArray([b'AAPL', b'AMZN', b'GME', b'SPY', b'TSLA'], dtype='|S4')
 
+A note about strings in FastArrays: When you view a FastArray of
+strings, you’ll see a ‘b’ next to each string. These b's indicate that the strings are encoded to byte strings,
+which saves memory compared to saving strings as ASCII.
+
 Count the number of unique values in a column::
 
     >>> ds2.Symbol.count()
@@ -598,65 +609,6 @@ More often, you’ll probably use filters to get subsets of your data. That's
 covered in more detail in `Get and Operate on Subsets of Data Using
 Filters <tutorial_filters.rst>`__.
 
-Delete a Column from a Dataset
-------------------------------
-
-To delete a column from a Dataset, use ``del ds.ColumnName``.
-
-Hold Two or More Datasets in a Struct
--------------------------------------
-
-When you’re working with multiple Datasets, it can be helpful to keep
-them together in a Riptable Struct. Structs were created as a base class
-for Datasets. They also replicate Matlab structs.
-
-You can think of a Struct as a Python dictionary, but with attribute
-access allowed for keys.
-
-Data structures stored together in a Struct don’t need to be aligned::
-
-    >>> s = rt.Struct()
-    >>> s.ds = ds
-    >>> s.ds2 = ds2
-
-You can access each data structure using attribute-style access. For
-example:
-
-    >>> s.ds2
-      #   Symbol   Size   Value
-    ---   ------   ----   -----
-      0   AAPL      300    0.77
-      1   AMZN      100    0.44
-      2   AAPL      300    0.86
-      3   GME       500    0.70
-      4   SPY       100    0.09
-      5   AMZN      300    0.98
-      6   TSLA      200    0.76
-      7   SPY       300    0.79
-      8   TSLA      300    0.13
-      9   TSLA      300    0.45
-     10   AAPL      400    0.37
-     11   AAPL      400    0.93
-     12   AAPL      400    0.64
-     13   GME       100    0.82
-     14   AMZN      100    0.44
-    ...   ...       ...     ...
-     35   GME       200    0.19
-     36   TSLA      400    0.13
-     37   SPY       200    0.48
-     38   AMZN      500    0.23
-     39   GME       400    0.67
-     40   AAPL      300    0.44
-     41   SPY       100    0.83
-     42   TSLA      500    0.70
-     43   AAPL      500    0.31
-     44   AAPL      100    0.83
-     45   AAPL      200    0.80
-     46   AMZN      400    0.39
-     47   AMZN      500    0.29
-     48   AMZN      300    0.68
-     49   AMZN      400    0.14
-
 Perform Operations on Dataset Columns
 -------------------------------------
 
@@ -737,6 +689,11 @@ right length for the Dataset you want to add it to::
     1   1   5   6   6.10   -2.25   22
     2   2   5   7   7.10   -4.00   24
 
+Delete a Column from a Dataset
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To delete a column from a Dataset, use ``del ds.ColumnName``.
+
 Reducing Operations vs. Non-Reducing Operations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -992,14 +949,59 @@ separate columns, so we’ll add a fix::
     3   AAPL:191018:260    0.45   -0.14   AAPL     191018       C         260.00
     4   AAPL:191018:265   -0.81    0.68   AAPL     191018       P         265.00
 
-A note about strings in FastArrays: When you view a FastArray of
-strings, you’ll see a ‘b’ next to each string::
+Hold Two or More Datasets in a Struct
+-------------------------------------
 
-    >>> ds5.Symbol
-    FastArray([b'SPY', b'SPY', b'TLT', b'AAPL', b'AAPL'], dtype='|S4')
+When you’re working with multiple Datasets, it can be helpful to keep
+them together in a Riptable Struct. Structs were created as a base class
+for Datasets. They also replicate Matlab structs.
 
-These b's indicate that the strings are encoded to byte strings,
-which saves memory compared to saving strings as ASCII.
+You can think of a Struct as a Python dictionary, but with attribute
+access allowed for keys.
+
+Data structures stored together in a Struct don’t need to be aligned::
+
+    >>> s = rt.Struct()
+    >>> s.ds = ds
+    >>> s.ds2 = ds2
+
+You can access each data structure using attribute-style access. For
+example:
+
+    >>> s.ds2
+      #   Symbol   Size   Value
+    ---   ------   ----   -----
+      0   AAPL      300    0.77
+      1   AMZN      100    0.44
+      2   AAPL      300    0.86
+      3   GME       500    0.70
+      4   SPY       100    0.09
+      5   AMZN      300    0.98
+      6   TSLA      200    0.76
+      7   SPY       300    0.79
+      8   TSLA      300    0.13
+      9   TSLA      300    0.45
+     10   AAPL      400    0.37
+     11   AAPL      400    0.93
+     12   AAPL      400    0.64
+     13   GME       100    0.82
+     14   AMZN      100    0.44
+    ...   ...       ...     ...
+     35   GME       200    0.19
+     36   TSLA      400    0.13
+     37   SPY       200    0.48
+     38   AMZN      500    0.23
+     39   GME       400    0.67
+     40   AAPL      300    0.44
+     41   SPY       100    0.83
+     42   TSLA      500    0.70
+     43   AAPL      500    0.31
+     44   AAPL      100    0.83
+     45   AAPL      200    0.80
+     46   AMZN      400    0.39
+     47   AMZN      500    0.29
+     48   AMZN      300    0.68
+     49   AMZN      400    0.14
 
 Riptable has a few other methods for operating on strings. We'll use them as
 the basis for filtering data in the next section, `Get and Operate on Subsets 

diff --git a/docs/source/tutorial/tutorial_datetimes.rst b/docs/source/tutorial/tutorial_datetimes.rst
@@ -19,9 +19,9 @@ information for display.
 
 A Date object stores an array of dates with no time data attached. You
 can create Date arrays from strings, integer date values, or Matlab
-ordinal dates. Creating Date arrays from strings is fairly common.
+ordinal dates. For Matlab details, see `Matlab Dates and Times <https://www.mathworks.com/help/matlab/date-and-time-operations.html>`__. 
 
-If your string dates are in YYYYMMDD format, you can simply pass the
+Creating Date arrays from strings is fairly common. If your string dates are in YYYYMMDD format, you can simply pass the
 list of strings to ``rt.Date()``::
 
     >>> rt.Date(['20210101', '20210519', '20220308'])
@@ -33,8 +33,8 @@ what to expect using Python ``strptime`` format code::
     >>> rt.Date(['12/31/19', '6/30/19', '02/21/19'], format='%m/%d/%y')
     Date(['2019-12-31', '2019-06-30', '2019-02-21'])
 
-For a list of format codes, see `Python’s
-documentation <https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior>`__.
+For a list of format codes and ``strptime`` implementation details, see `Python’s
+'strftime' cheatsheet <https://strftime.org/>`__. The formatting codes are the same for ``strftime`` and ``strptime``. 
 
 Note: Under the hood, dates are stored as integers – specifically, as
 the number of days since the Unix epoch, 01-01-1970::

diff --git a/docs/source/tutorial/tutorial_filters.rst b/docs/source/tutorial/tutorial_filters.rst
@@ -41,7 +41,7 @@ the result is an array of Booleans.
 
     >>> a = rt.FastArray([1, 2, 3, 4, 5])
     >>> b = rt.FastArray([0, 5, 2, 4, 8])
-    >>> a < 3 
+    >>> a > 3 
     FastArray([False, False, False,  True,  True])
     >>> a <= b
     FastArray([False,  True, False,  True,  True])