Skip to content

Commit

Permalink
Latest v1.6.10 (#344)
Browse files Browse the repository at this point in the history
* v1.6.10-rc0

* v1.6.10-rc1

---------

Co-authored-by: rtosholdings-bot <[email protected]>
  • Loading branch information
OrestZborowski-SIG and rtosholdings-bot authored Apr 13, 2023
1 parent 4944f8b commit b7715da
Show file tree
Hide file tree
Showing 17 changed files with 177 additions and 166 deletions.
2 changes: 1 addition & 1 deletion conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ requirements:
- setuptools_scm
run:
- python
- riptide_cpp >=1.12.0,<2 # run with any (compatible) version in this range
- riptide_cpp >=1.12.1,<2 # run with any (compatible) version in this range
- pandas >=0.24,<2.0
- ansi2html >=1.5.2
- ipykernel
Expand Down
12 changes: 9 additions & 3 deletions docs/source/tutorial/RiptableExercises.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -327,10 +327,11 @@
"source": []
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Create a DateTimeNano of the combined TradeDateTime by simple addition. Riptable knows how to sum the types.**\n",
"**Create a DateTimeNano of the combined TradeTime + Date by simple addition. Riptable knows how to sum the types.**\n",
"\n",
"Be careful here, by default you'll get a GMT timezone, you can force NYC with `rt.DateTimeNano(..., from_tz='NYC')`."
]
Expand Down Expand Up @@ -887,7 +888,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "riptable-sphinxdoc",
"language": "python",
"name": "python3"
},
Expand All @@ -901,7 +902,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.10.6"
},
"vscode": {
"interpreter": {
"hash": "bd27bbf9d08d999c15d6ab686ecdc65a1056d9b0e13010aed0eef84441088a82"
}
}
},
"nbformat": 4,
Expand Down
12 changes: 9 additions & 3 deletions docs/source/tutorial/RiptableSolutions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -489,10 +489,11 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Create a DateTimeNano of the combined TradeDateTime by simple addition. Riptable knows how to sum the types.**\n",
"**Create a DateTimeNano of the combined TradeTime + Date by simple addition. Riptable knows how to sum the types.**\n",
"\n",
"Be careful here, by default you'll get a GMT timezone, you can force NYC with `rt.DateTimeNano(..., from_tz='NYC')`."
]
Expand Down Expand Up @@ -1362,7 +1363,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "riptable-sphinxdoc",
"language": "python",
"name": "python3"
},
Expand All @@ -1376,7 +1377,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.10.6"
},
"vscode": {
"interpreter": {
"hash": "bd27bbf9d08d999c15d6ab686ecdc65a1056d9b0e13010aed0eef84441088a82"
}
}
},
"nbformat": 4,
Expand Down
38 changes: 23 additions & 15 deletions docs/source/tutorial/tutorial_categoricals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,10 @@ to sum for each group::
TSLA 50

A Dataset is returned containing the groups from the Categorical and the result
of the operation we called on each group.
of the operation we called on each group.

Note the prepended '*' in the Symbol column. This indicates that the column
was used as the grouping variable in an operation.

Categoricals as Split, Apply, Combine Operations
------------------------------------------------
Expand Down Expand Up @@ -117,17 +120,17 @@ of the original data::
The alignment of the result to the original data is easier to see if you add
the results to the Dataset::

>>> ds.CumValue = ds.Symbol.cumsum(ds.Value)
>>> # Sort to make the cumulative sum more clear, then display only the relevant columns.
>>> ds.sort_copy('Symbol')
# Symbol Value2 CumSum
- ------ ------ ------
0 AAPL 2 2
1 AAPL 5 7
2 MSFT 10 10
3 MSFT 8 18
4 TSLA 25 25
5 TSLA 20 45
>>> ds.CumValue2 = ds.Symbol.cumsum(ds.Value2)
>>> # Sort to make the cumulative sum per group more clear, then display only the relevant columns.
>>> ds.sort_copy('Symbol').col_filter(['Symbol', 'Value2', 'CumValue2'])
# Symbol Value2 CumValue2
- ------ ------ ---------
0 AAPL 2 2
1 AAPL 5 7
2 MSFT 10 10
3 MSFT 8 18
4 TSLA 25 25
5 TSLA 20 45

A commonly used non-reducing function is ``shift()``. You can use it to
compare values with shifted versions of themselves – for example,
Expand Down Expand Up @@ -438,6 +441,9 @@ omitted from calculations on the Categorical::
b 3
c 0

Note that the first column in the output is labeled 'key_0'. This was code-generated because there was no explicit column name declaration.
You can use the :meth:`.FastArray.set_name` method to assign a column name to the Categorical before doing any grouping operations.
The Count column was created by the ``count()`` method.

Filter Values or Categories from Certain Categorical Operations
---------------------------------------------------------------
Expand Down Expand Up @@ -468,7 +474,7 @@ for only that operation.

To filter out an entire category::

>>> ds.Symbol.mean(ds.value, filter=ds.Symbol != 'MSFT')
>>> ds.Symbol.mean(ds.Value, filter=ds.Symbol != 'MSFT')
*Symbol Value
------- -----
AAPL 10.00
Expand Down Expand Up @@ -746,8 +752,8 @@ Notice that the buckets form the groups of a Categorical::
FastArray([b'-3.011->221.182', b'221.182->445.376', b'445.376->669.569', b'669.569->893.763', b'893.763->1117.956'], dtype='|S17') Unique count: 5

To choose your own intervals, provide the endpoints. Here, we define
bins that cover two intervals: one bin for prices from 0 to 500 (0
excluded), and one for prices from 500 to 1,000 (500 excluded)::
bins that cover two intervals: one bin for prices from 0 to 600 (0
excluded), and one for prices from 600 to 1,200 (600 excluded)::

>>> buckets = [0, 600, 1200]
>>> ds2.PriceBucket2 = rt.cut(ds2.Price, buckets)
Expand Down Expand Up @@ -1168,6 +1174,8 @@ Our second function performs two non-reducing operations::
Because the operations in this function are non-reducing operations, the
resulting Dataset is expanded.

Note that until a reported bug is fixed, column names might not persist through grouping operations.

In the next section, `Accums <tutorial_accums.rst>`__, we look at
another way to do multi-key groupings with fancier output.

Expand Down
138 changes: 70 additions & 68 deletions docs/source/tutorial/tutorial_datasets.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Intro to Riptable Datasets, FastArrays, and Structs
Riptable Datasets, FastArrays, and Structs
===================================================

What Is a Dataset?
Expand Down Expand Up @@ -58,6 +58,9 @@ Another way to think of a Dataset is as a dictionary of same-length
FastArrays, where each key is a column name that’s mapped to a FastArray
of values that all have the same dtype.

For Python dictionary details, see `Python’s
documentation <https://docs.python.org/3/tutorial/datastructures.html#dictionaries>`__.

Use the Dataset Constructor with Dictionary-Style Input
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -371,7 +374,7 @@ statistics:
=============== ==============================
Count Total number of items
Valid Total number of valid values
Nans Total number of NaN values
Nans Total number of NaN values*
Mean Mean
Std Standard deviation
Min Minimum value
Expand All @@ -384,6 +387,9 @@ Max Maximum value
MeanM Mean without top or bottom 10%
=============== ==============================

\*NaN stands for Not a Number, and is commonly used to represent missing data.
For details, see `Working with Missing Data <tutorial_missing_data.rst>`__.

You can also use ``describe()`` on a single column::

>>> ds2.Value.describe()
Expand All @@ -404,7 +410,8 @@ You can also use ``describe()`` on a single column::
MeanM 0.54

If your Dataset is very large, you can get column statistics with
``statx()``, which you can import from ``riptable.rt_stats``. It gives
``statx()``, which you can import from ``riptable.rt_stats``.
``statx()`` provides rapid sampling and gives
you a few more percentiles than ``describe()`` does, but it works only
on one column at a time::

Expand Down Expand Up @@ -436,6 +443,10 @@ See a column’s unique values::
>>> ds2.Symbol.unique()
FastArray([b'AAPL', b'AMZN', b'GME', b'SPY', b'TSLA'], dtype='|S4')

A note about strings in FastArrays: When you view a FastArray of
strings, you’ll see a ‘b’ next to each string. These b's indicate that the strings are encoded to byte strings,
which saves memory compared to saving strings as ASCII.

Count the number of unique values in a column::

>>> ds2.Symbol.count()
Expand Down Expand Up @@ -598,65 +609,6 @@ More often, you’ll probably use filters to get subsets of your data. That's
covered in more detail in `Get and Operate on Subsets of Data Using
Filters <tutorial_filters.rst>`__.

Delete a Column from a Dataset
------------------------------

To delete a column from a Dataset, use ``del ds.ColumnName``.

Hold Two or More Datasets in a Struct
-------------------------------------

When you’re working with multiple Datasets, it can be helpful to keep
them together in a Riptable Struct. Structs were created as a base class
for Datasets. They also replicate Matlab structs.

You can think of a Struct as a Python dictionary, but with attribute
access allowed for keys.

Data structures stored together in a Struct don’t need to be aligned::

>>> s = rt.Struct()
>>> s.ds = ds
>>> s.ds2 = ds2

You can access each data structure using attribute-style access. For
example:

>>> s.ds2
# Symbol Size Value
--- ------ ---- -----
0 AAPL 300 0.77
1 AMZN 100 0.44
2 AAPL 300 0.86
3 GME 500 0.70
4 SPY 100 0.09
5 AMZN 300 0.98
6 TSLA 200 0.76
7 SPY 300 0.79
8 TSLA 300 0.13
9 TSLA 300 0.45
10 AAPL 400 0.37
11 AAPL 400 0.93
12 AAPL 400 0.64
13 GME 100 0.82
14 AMZN 100 0.44
... ... ... ...
35 GME 200 0.19
36 TSLA 400 0.13
37 SPY 200 0.48
38 AMZN 500 0.23
39 GME 400 0.67
40 AAPL 300 0.44
41 SPY 100 0.83
42 TSLA 500 0.70
43 AAPL 500 0.31
44 AAPL 100 0.83
45 AAPL 200 0.80
46 AMZN 400 0.39
47 AMZN 500 0.29
48 AMZN 300 0.68
49 AMZN 400 0.14

Perform Operations on Dataset Columns
-------------------------------------

Expand Down Expand Up @@ -737,6 +689,11 @@ right length for the Dataset you want to add it to::
1 1 5 6 6.10 -2.25 22
2 2 5 7 7.10 -4.00 24

Delete a Column from a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To delete a column from a Dataset, use ``del ds.ColumnName``.

Reducing Operations vs. Non-Reducing Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -992,14 +949,59 @@ separate columns, so we’ll add a fix::
3 AAPL:191018:260 0.45 -0.14 AAPL 191018 C 260.00
4 AAPL:191018:265 -0.81 0.68 AAPL 191018 P 265.00

A note about strings in FastArrays: When you view a FastArray of
strings, you’ll see a ‘b’ next to each string::
Hold Two or More Datasets in a Struct
-------------------------------------

>>> ds5.Symbol
FastArray([b'SPY', b'SPY', b'TLT', b'AAPL', b'AAPL'], dtype='|S4')
When you’re working with multiple Datasets, it can be helpful to keep
them together in a Riptable Struct. Structs were created as a base class
for Datasets. They also replicate Matlab structs.

These b's indicate that the strings are encoded to byte strings,
which saves memory compared to saving strings as ASCII.
You can think of a Struct as a Python dictionary, but with attribute
access allowed for keys.

Data structures stored together in a Struct don’t need to be aligned::

>>> s = rt.Struct()
>>> s.ds = ds
>>> s.ds2 = ds2

You can access each data structure using attribute-style access. For
example:

>>> s.ds2
# Symbol Size Value
--- ------ ---- -----
0 AAPL 300 0.77
1 AMZN 100 0.44
2 AAPL 300 0.86
3 GME 500 0.70
4 SPY 100 0.09
5 AMZN 300 0.98
6 TSLA 200 0.76
7 SPY 300 0.79
8 TSLA 300 0.13
9 TSLA 300 0.45
10 AAPL 400 0.37
11 AAPL 400 0.93
12 AAPL 400 0.64
13 GME 100 0.82
14 AMZN 100 0.44
... ... ... ...
35 GME 200 0.19
36 TSLA 400 0.13
37 SPY 200 0.48
38 AMZN 500 0.23
39 GME 400 0.67
40 AAPL 300 0.44
41 SPY 100 0.83
42 TSLA 500 0.70
43 AAPL 500 0.31
44 AAPL 100 0.83
45 AAPL 200 0.80
46 AMZN 400 0.39
47 AMZN 500 0.29
48 AMZN 300 0.68
49 AMZN 400 0.14

Riptable has a few other methods for operating on strings. We'll use them as
the basis for filtering data in the next section, `Get and Operate on Subsets
Expand Down
8 changes: 4 additions & 4 deletions docs/source/tutorial/tutorial_datetimes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ information for display.

A Date object stores an array of dates with no time data attached. You
can create Date arrays from strings, integer date values, or Matlab
ordinal dates. Creating Date arrays from strings is fairly common.
ordinal dates. For Matlab details, see `Matlab Dates and Times <https://www.mathworks.com/help/matlab/date-and-time-operations.html>`__.

If your string dates are in YYYYMMDD format, you can simply pass the
Creating Date arrays from strings is fairly common. If your string dates are in YYYYMMDD format, you can simply pass the
list of strings to ``rt.Date()``::

>>> rt.Date(['20210101', '20210519', '20220308'])
Expand All @@ -33,8 +33,8 @@ what to expect using Python ``strptime`` format code::
>>> rt.Date(['12/31/19', '6/30/19', '02/21/19'], format='%m/%d/%y')
Date(['2019-12-31', '2019-06-30', '2019-02-21'])

For a list of format codes, see `Python’s
documentation <https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior>`__.
For a list of format codes and ``strptime`` implementation details, see `Python’s
'strftime' cheatsheet <https://strftime.org/>`__. The formatting codes are the same for ``strftime`` and ``strptime``.

Note: Under the hood, dates are stored as integers – specifically, as
the number of days since the Unix epoch, 01-01-1970::
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorial/tutorial_filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ the result is an array of Booleans.

>>> a = rt.FastArray([1, 2, 3, 4, 5])
>>> b = rt.FastArray([0, 5, 2, 4, 8])
>>> a < 3
>>> a > 3
FastArray([False, False, False, True, True])
>>> a <= b
FastArray([False, True, False, True, True])
Expand Down
Loading

0 comments on commit b7715da

Please sign in to comment.