Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atr (and others) indicators not working with resample #1243

Open
ironhak opened this issue Mar 12, 2025 · 11 comments
Open

atr (and others) indicators not working with resample #1243

ironhak opened this issue Mar 12, 2025 · 11 comments
Labels
bug Something isn't working upstream Issue affects a dependency of ours

Comments

@ironhak
Copy link

ironhak commented Mar 12, 2025

Hi, I'm using minute level data:

Dataframe sample
                              Open     High      Low    Close
Date                                                         
2020-01-01 22:02:00+00:00  1.32463  1.32464  1.32462  1.32463
2020-01-01 22:03:00+00:00  1.32463  1.32466  1.32462  1.32466
2020-01-01 22:04:00+00:00  1.32466  1.32466  1.32463  1.32463
2020-01-01 22:05:00+00:00  1.32465  1.32466  1.32462  1.32462
2020-01-01 22:06:00+00:00  1.32462  1.32470  1.32462  1.32463
...                            ...      ...      ...      ...
2020-01-29 23:55:00+00:00  1.30208  1.30208  1.30208  1.30208
2020-01-29 23:56:00+00:00  1.30207  1.30208  1.30207  1.30208
2020-01-29 23:57:00+00:00  1.30208  1.30208  1.30208  1.30208
2020-01-29 23:58:00+00:00  1.30208  1.30208  1.30203  1.30203
2020-01-29 23:59:00+00:00  1.30202  1.30207  1.30202  1.30207

I need the average daily range, so I thought I could just resample the atr to daily frequency. So I followed the documentation:

def init(self):
        # Average daily range
        self.adr = resample_apply('D', ta.atr, self.data.High, self.data.Low, self.data.Close)
Error output:
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/my_strat/.venv/lib/python3.12/site-packages/backtesting/backtesting.py:150, in Strategy.I(self, func, name, plot, overlay, color, scatter, *args, **kwargs)
    149 try:
--> 150     value = func(*args, **kwargs)
    151 except Exception as e:

File ~/my_strat/.venv/lib/python3.12/site-packages/backtesting/lib.py:322, in resample_apply.<locals>.wrap_func(resampled, *args, **kwargs)
    321 # Resample back to data index
--> 322 if not isinstance(result.index, pd.DatetimeIndex):
    323     result.index = resampled.index

AttributeError: 'numpy.ndarray' object has no attribute 'index'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[205], line 73
     70 bt = Backtest(data, smr_01, margin = 1/100)
     71 # import time
     72 # start_time = time.time()
---> 73 stats = bt.run()
     74 # end_time = time.time()
     75 
     76 # time_taken = end_time - start_time
   (...)     82 # print(f"Number of candlesticks: {num_candlesticks}")
     83 # print(f"Candlesticks per second: {candlesticks_per_second}")
     84 bt.plot()

File ~/my_strat/.venv/lib/python3.12/site-packages/backtesting/backtesting.py:1296, in Backtest.run(self, **kwargs)
   1293 broker: _Broker = self._broker(data=data)
   1294 strategy: Strategy = self._strategy(broker, data, kwargs)
-> 1296 strategy.init()
   1297 data._update()  # Strategy.init might have changed/added to data.df
   1299 # Indicators used in Strategy.next()

Cell In[205], line 39, in smr_01.init(self)
     37 self.range_25, self.range_50, self.range_75 = self.I(range_levels, self.daily_high, self.daily_low, overlay=True)
     38 # Average daily range
---> 39 self.adr = resample_apply('D', ta.atr, self.data.High, self.data.Low, self.data.Close)
     40 # self.adr = resample_apply('D', ta.sma, self.data.Close, 14)#, self.data.Low, self.data.Close)
     42 self.adr2 = self.I(average_daily_range, self.data.df)

File ~/my_strat/.venv/lib/python3.12/site-packages/backtesting/lib.py:330, in resample_apply(rule, func, series, agg, *args, **kwargs)
    326     return result
    328 wrap_func.__name__ = func.__name__
--> 330 array = strategy_I(wrap_func, resampled, *args, **kwargs)
    331 return array

File ~/my_strat/.venv/lib/python3.12/site-packages/backtesting/backtesting.py:152, in Strategy.I(self, func, name, plot, overlay, color, scatter, *args, **kwargs)
    150     value = func(*args, **kwargs)
    151 except Exception as e:
--> 152     raise RuntimeError(f'Indicator "{name}" error. See traceback above.') from e
    154 if isinstance(value, pd.DataFrame):
    155     value = value.values.T

RuntimeError: Indicator "atr(H[D],L,C)" error. See traceback above.

However I don't get any error with:

self.sma = resample_apply('D', ta.sma, self.data.Close, 14)

So, again following the docs, I tried doing it myself:

def average_daily_range(df, period):

    df_resampled = df.resample('D', label='right').agg({'High': 'max', 'Low': 'min', 'Close': 'last'})
    print(df_resampled)
    df_resampled.dropna()
    atr = ta.atr(df_resampled['High'], df_resampled['Low'], df_resampled['Close'], period)
    atr = atr.reindex(df.index).ffill()
    return atr

class smr_01(Strategy):

    def init(self):
        # Average daily range
        # self.adr = resample_apply('D', ta.atr, self.data.High, self.data.Low, self.data.Close)
        self.sma = resample_apply('D', ta.sma, self.data.Close, 14)
        self.adr = self.I(average_daily_range, self.data.df, 14)
Resampled df:
                              High      Low    Close
Date                                                
2020-01-02 00:00:00+00:00  1.32608  1.32457  1.32497
2020-01-03 00:00:00+00:00  1.32661  1.31152  1.31467
2020-01-04 00:00:00+00:00  1.31600  1.30531  1.30787
2020-01-05 00:00:00+00:00      NaN      NaN      NaN
2020-01-06 00:00:00+00:00  1.30855  1.30633  1.30768
2020-01-07 00:00:00+00:00  1.31785  1.30638  1.31711
2020-01-08 00:00:00+00:00  1.32120  1.30948  1.31134
2020-01-09 00:00:00+00:00  1.31694  1.30799  1.31051
2020-01-10 00:00:00+00:00  1.31233  1.30126  1.30691
2020-01-11 00:00:00+00:00  1.30968  1.30422  1.30569
2020-01-12 00:00:00+00:00      NaN      NaN      NaN
2020-01-13 00:00:00+00:00  1.30441  1.30287  1.30432
2020-01-14 00:00:00+00:00  1.30450  1.29608  1.29859
2020-01-15 00:00:00+00:00  1.30329  1.29542  1.30211
2020-01-16 00:00:00+00:00  1.30582  1.29850  1.30392
2020-01-17 00:00:00+00:00  1.30828  1.30252  1.30760
2020-01-18 00:00:00+00:00  1.31184  1.30050  1.30058
2020-01-19 00:00:00+00:00      NaN      NaN      NaN
2020-01-20 00:00:00+00:00  1.30071  1.29915  1.30051
2020-01-21 00:00:00+00:00  1.30132  1.29617  1.30035
2020-01-22 00:00:00+00:00  1.30831  1.29952  1.30451
2020-01-23 00:00:00+00:00  1.31525  1.30343  1.31435
2020-01-24 00:00:00+00:00  1.31508  1.30966  1.31186
2020-01-25 00:00:00+00:00  1.31739  1.30565  1.30701
2020-01-26 00:00:00+00:00      NaN      NaN      NaN
2020-01-27 00:00:00+00:00  1.30799  1.30606  1.30606
2020-01-28 00:00:00+00:00  1.31050  1.30395  1.30588
2020-01-29 00:00:00+00:00  1.30649  1.29752  1.30231
2020-01-30 00:00:00+00:00  1.30273  1.29892  1.30207

Now I have a working ATR resampled to daily... But there is a problem, as you may have noticed both sma and atr are resampled daily with a period of 14:

Image

As you see the ATR start on 20 jan 2020 at 12:00, while the SMA start on 17 jan 2020 at 12:00. So am I doing something wrong or is the library that should be updated?

Packages version:
Package                 Version
----------------------- -----------
asttokens               3.0.0
backtesting             0.6.3
bokeh                   3.6.3
comm                    0.2.2
contourpy               1.3.1
cycler                  0.12.1
debugpy                 1.8.13
decorator               5.2.1
executing               2.2.0
fonttools               4.56.0
ipykernel               6.29.5
ipython                 9.0.0
ipython-pygments-lexers 1.1.1
jedi                    0.19.2
jinja2                  3.1.6
jupyter-client          8.6.3
jupyter-core            5.7.2
kiwisolver              1.4.8
markupsafe              3.0.2
matplotlib              3.10.1
matplotlib-inline       0.1.7
mplfinance              0.12.10b0
nest-asyncio            1.6.0
numpy                   2.2.3
packaging               24.2
pandas                  2.2.3
pandas-ta               0.3.14b0
parso                   0.8.4
pexpect                 4.9.0
pillow                  11.1.0
platformdirs            4.3.6
prompt-toolkit          3.0.50
psutil                  7.0.0
ptyprocess              0.7.0
pure-eval               0.2.3
pygments                2.19.1
pyparsing               3.2.1
python-dateutil         2.9.0.post0
pytz                    2025.1
pyyaml                  6.0.2
pyzmq                   26.2.1
setuptools              76.0.0
six                     1.17.0
stack-data              0.6.3
tornado                 6.4.2
traitlets               5.14.3
tzdata                  2025.1
wcwidth                 0.2.13
xyzservices             2025.1.0
@ironhak
Copy link
Author

ironhak commented Mar 12, 2025

Just to add:

Image

On the left is atr applied to daily data, on the right is the data on the left converted back to minute data, i.e. the following command that I found in the docs:

atr = atr.reindex(df.index).ffill()

As you see the daily atr starts on 18 Jan while the one sampled back to 1m data starts on 20 Jan... tough the values are the same.

@ironhak
Copy link
Author

ironhak commented Mar 12, 2025

According to docs:

Notice label='right'. If it were set to 'left' (default), the strategy would exhibit look-ahead bias.
But doing:

data2 = data.resample('D',label='right').agg({'Open': 'first', 
                                 'High': 'max', 
                                 'Low': 'min', 
                                 'Close': 'last'})#.dropna()

print(data2)
Gives as output
Open     High      Low    Close
Date                                          
2020-01-02  1.32463  1.32608  1.32457  1.32497
2020-01-03  1.32497  1.32661  1.31152  1.31467
2020-01-04  1.31466  1.31600  1.30531  1.30787 --> Error, no data for this day (saturday)
2020-01-05      NaN      NaN      NaN      NaN
2020-01-06  1.30808  1.30855  1.30633  1.30768
2020-01-07  1.30767  1.31785  1.30638  1.31711
2020-01-08  1.31708  1.32120  1.30948  1.31134
2020-01-09  1.31135  1.31694  1.30799  1.31051
2020-01-10  1.31047  1.31233  1.30126  1.30691
2020-01-11  1.30691  1.30968  1.30422  1.30569 --> Error, no data for this day (saturday)
2020-01-12      NaN      NaN      NaN      NaN
2020-01-13  1.30347  1.30441  1.30287  1.30432
2020-01-14  1.30432  1.30450  1.29608  1.29859
2020-01-15  1.29858  1.30329  1.29542  1.30211
2020-01-16  1.30211  1.30582  1.29850  1.30392
2020-01-17  1.30395  1.30828  1.30252  1.30760
2020-01-18  1.30758  1.31184  1.30050  1.30058 --> Error, no data for this day (saturday)
2020-01-19      NaN      NaN      NaN      NaN
2020-01-20  1.29915  1.30071  1.29915  1.30051
2020-01-21  1.30052  1.30132  1.29617  1.30035
2020-01-22  1.30037  1.30831  1.29952  1.30451
2020-01-23  1.30451  1.31525  1.30343  1.31435
2020-01-24  1.31435  1.31508  1.30966  1.31186
2020-01-25  1.31185  1.31739  1.30565  1.30701 --> Error, no data for this day (saturday)
2020-01-26      NaN      NaN      NaN      NaN
2020-01-27  1.30799  1.30799  1.30606  1.30606
2020-01-28  1.30606  1.31050  1.30395  1.30588
2020-01-29  1.30588  1.30649  1.29752  1.30231
2020-01-30  1.30230  1.30273  1.29892  1.30207

Indeed if I check:

print(data[data.index.date == pd.to_datetime('2020-01-04').date()])
Empty DataFrame
Columns: [Open, High, Low, Close]
Index: []

Instead if I don't specify the label

```python
data2 = data.resample('D').agg({'Open': 'first', 
                                 'High': 'max', 
                                 'Low': 'min', 
                                 'Close': 'last'}).dropna()

Everything works good, and also:

Image

now the sma indicator and atr strart on the same day, which is the intended result. Hence the final resampled atr function would be:

def average_daily_range(df, period):
    # df_resampled = df.resample('D', label='right').agg({'High': 'max', 'Low': 'min', 'Close': 'last'})
    df_resampled = df.resample('D').agg({'High': 'max', 'Low': 'min', 'Close': 'last'})
    df_resampled.dropna(inplace=True)
    atr = ta.atr(df_resampled['High'], df_resampled['Low'], df_resampled['Close'], period)
    atr = atr.reindex(data.index).ffill()

    return atr

So there are two problems that emerged from this issue:

  1. Why the ta.atr function doesn't work with resample_apply?
  2. Why according to documentation I must use label=right even tough doing so produces a bad result not properly aligned with original data?

@kernc
Copy link
Owner

kernc commented Mar 12, 2025

  1. Why the ta.atr function doesn't work with resample_apply?

AttributeError: 'numpy.ndarray' object has no attribute 'index'

Seems to break here:

result = func(resampled, *args, **kwargs)
if not isinstance(result, pd.DataFrame) and not isinstance(result, pd.Series):
result = np.asarray(result)
if result.ndim == 1:
result = pd.Series(result, name=resampled.name)
elif result.ndim == 2:
result = pd.DataFrame(result.T)
# Resample back to data index
if not isinstance(result.index, pd.DatetimeIndex):
result.index = resampled.index

Following the logic, I think the inner branch is missing a trailing else: raise ... clause.
What (shape) does your ta.atr(self.data.High, self.data.Low, self.data.Close) actually return?


  1. Why according to documentation I must use label=right even tough doing so produces a bad result not properly aligned with original data?

Use of label='right' ensures the labeled bin consists only of data of the preceding period, e.g. when there is no data on a Saturday, the value for Sunday is empty/nan. This prevents look-ahead bias where value at time t inadvertently incorporates future values.

@kernc kernc added the bug Something isn't working label Mar 12, 2025
@ironhak
Copy link
Author

ironhak commented Mar 12, 2025

ta.atr comes from pandas-ta library. ATR is just a single column, like ta.sma.

Use of label='right' ensures the labeled bin consists only of data of the preceding period, e.g. when there is no data on a Saturday, the value for Sunday is empty/nan. This prevents look-ahead bias where value at time t inadvertently incorporates future values.

Yes but on my data there should be data in Sunday. In my minute data (forex) there is data from Sunday 22:00 to Friday 23:00. So the fact that the resampled version using label = 'Right' leaves Sunday empty and Saturday populated is indeed wrong and not consistent with original data. Am I missing something here?

Thanks for your kind response and for this amazing library.

@kernc
Copy link
Owner

kernc commented Mar 12, 2025

ta.atr comes from pandas-ta library. ATR is just a single column

What its shape, ndim? It looks like this condition holds: ta.atr(h, l, c).ndim not in (1, 2), whereas it shouldn't!

Just to add: ...

On the left is atr applied to daily data, on the right is the data on the left converted back to minute data, i.e. the following command that I found in the docs:

atr = atr.reindex(df.index).ffill()

Note, the lib actually does:

result = result.reindex(index=series.index.union(resampled.index),
method='ffill').reindex(series.index)

Does this help?

@ironhak
Copy link
Author

ironhak commented Mar 12, 2025

Hello, so:

atr = ta.atr(data.High, data.Low, data.Close)
print(type(atr))
print(atr.shape)

Produces:

<class 'pandas.core.series.Series'>
(31669,)

So why doesn't it work if I do: self.adr = resample_apply('D', ta.atr, self.data.High, self.data.Low, self.data.Close)?

Also the problem about label='right' persists.

@ironhak
Copy link
Author

ironhak commented Mar 13, 2025

Doing more tests... Resampling candlestick data from 1min to 5min using suggested label='right:

    data5m = data.resample('5min', label='right').agg({
        "Open": "first",
        "High": "max",
        "Low": "min",
        "Close": "last",
    })

Produces:

Image

As you see the end result is a candlestick that starts at 22:05 but contains datas from 22:00 to 22:04. This is wrong because the index refers to the Open datetime, hence the row 22:05 in a 5min timeframe should have:

  • Open: market price at 22:05:00
  • High: highest market price from 22:05:00 to 22:09:59
  • Low: lowest market price from 22:05:00 to 22:09:59
  • Close: market price at 22:09:59.

Indeed, if I don't use label='right', i.e.:

    data5m = data.resample('5min').agg({
        "Open": "first",
        "High": "max",
        "Low": "min",
        "Close": "last",
    })

I obtain the intended outcome:

Image

So the resample_apply function:

  • Should avoid using label='right' (?)
  • Should be fixed in order to work with indicators like ta.atr.
Edit: I double checked with TradingView charts and they are identical, proving even further we don't need `label='right'`

Resampled data chart:

Image

Tradingview:

Image

@kernc
Copy link
Owner

kernc commented Mar 13, 2025

As you see the end result is a candlestick that starts at 22:05 but contains datas from 22:00 to 22:04. This is wrong

Thanks for the illustrative example!

You don't learn that the supposed 22:00:00 bar closed at 1.20032 until 22:04:00 bar closes! I don't know what labeling your data source uses, but plotting that info anytime before the complete end of bar 22:04:00 would introduce look-ahead bias.

Likewise, applying a simple passthrough function:

from backtesting import Strategy, Backtest
from backtesting.test import EURUSD

class S(Strategy):
    def init(self):
        resample_apply('1d', lambda x: x, self.data.Close, color='blue')
    def next(self):
        pass

bt = Backtest(EURUSD, S)
_ = bt.run()
bt.plot()

Image

you can see it uses previous complete bar's value as the current value. Had it used the current bar's (potentially incomplete) value, this would introduce look-ahead bias and would redraw / repaint / mislead, like TradingView does.

This part of the issue is "works-as-planned" / wontfix.


  1. Should be fixed in order to work with indicators like ta.atr.
AttributeError: 'numpy.ndarray' object has no attribute 'index'

Please provide the following output:

>>> atr = ta.atr(h, l, c)
>>> atr.__class__.__mro__
>>> atr
>>> np.ndim(atr)

@ironhak
Copy link
Author

ironhak commented Mar 13, 2025

Dear kernc, yes I see your point of view. I always used the TradingView and MT4/5 way of labeling data, always been aware of repainting and learned to account for it on my testings.

You don't learn that the supposed 22:00:00 bar closed at 1.20032 until 22:04:00 bar closes!

Yes, the idea is that the Open price is fixed and High, Low, Close changes every new tick until the candle closes. I see that other data providers, like Bloomberg, label the data on the close.

This part of the issue is "works-as-planned" / wontfix.

That's perfectly fine, I'll account for this. Thank's for taking time to explain :)


Please provide the following output:

atr = ta.atr(data.High, data.Low, data.Close)
print(atr.__class__.__mro__)
print(atr)
print(np.ndim(atr))
(<class 'pandas.core.series.Series'>, <class 'pandas.core.base.IndexOpsMixin'>, <class 'pandas.core.arraylike.OpsMixin'>, <class 'pandas.core.generic.NDFrame'>, <class 'pandas.core.base.PandasObject'>, <class 'pandas.core.accessor.DirNamesMixin'>, <class 'pandas.core.indexing.IndexingMixin'>, <class 'object'>)
Date
2020-01-01 22:02:00         NaN
2020-01-01 22:03:00         NaN
2020-01-01 22:04:00         NaN
2020-01-01 22:05:00         NaN
2020-01-01 22:06:00         NaN
                         ...   
2020-04-30 23:55:00    0.000128
2020-04-30 23:56:00    0.000122
2020-04-30 23:57:00    0.000125
2020-04-30 23:58:00    0.000127
2020-04-30 23:59:00    0.000125
Name: ATRr_14, Length: 123800, dtype: float64
1

@kernc
Copy link
Owner

kernc commented Mar 13, 2025

Well, that's confusing. If the result object is already a Series, there's no way I see for it to crash with AttributeError: 'numpy.ndarray' object has no attribute 'index' ... 🤔

@ironhak
Copy link
Author

ironhak commented Mar 14, 2025

Well... If you manage to find some time you can manage to try for yourself...

pip install pandas-ta
import pandas_ta as ta
from backtesting import Strategy
import backtesting as bt
from backtesting.lib import resample_apply

# Import some minute-level data

class myStrat(Strategy):
  def init(self):
    self.atr = resample_apply('D', ta.atr, self.data.High, self.data.Low, self.data.Close)

...

Sorry if the snippet is not 100% accurate, I'm on my phone. Anyway I'm sure you get what I'm trying to say.

@kernc kernc added the upstream Issue affects a dependency of ours label Mar 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Issue affects a dependency of ours
Projects
None yet
Development

No branches or pull requests

2 participants