Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request:addplot does not take dataframe index #323

Open
cosmostronomer opened this issue Jan 25, 2021 · 10 comments
Open

Feature Request:addplot does not take dataframe index #323

cosmostronomer opened this issue Jan 25, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@cosmostronomer
Copy link

cosmostronomer commented Jan 25, 2021

I would like to plot two data as below.

plot2 = mpf.make_addplot(data2, type='candle')
mpf.plot(data, type='candle', addplot=plot2)

However, the problem is that data2 contains 3 candles and data contains 4 candles with different indices.

data2.index
['2021-01-05', '2021-01-06', '2021-01-07']

data.index
['2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07']

Despite the fact that the time index of data2 starts from 2021-01-05, the plot starts from 2021-01-04
You can see three overlapped candles on 01-04, 01-05, 01-06.
Selection_007

Please allow addplot to have its own x values as the index of the input dataframe.

@cosmostronomer cosmostronomer added the enhancement New feature or request label Jan 25, 2021
@DanielGoldfarb
Copy link
Collaborator

DanielGoldfarb commented Jan 26, 2021

@cosmostronomer
Thanks for your interest in mplfinance.

Please note that mpf.plot() and mpf.make_addplot() data all share the same x-axis. Therefore the data sets must either be the same length, or share the same pandas.DatetimeIndex. This means, if one data set is shorter than the other, or sparse, then the missing datetimes must not be missing but rather must have corresponding nan values at those datetimes.

There is an example of a way to do this posted here: #315 (comment).

At this point in time, I'm thinking that adding code to mplfinance to automatically fill sparse or short data sets, with nan values, will unnecessarily complicate the mplfinance code. (My thinking is that mplfinance would not only have to fill nan values, but would also have to deal with subtleties of detecting potentially missaligned data sets; the caller has this information and can act accordingly, so the caller's nan filling code would be simpler than mplfinance's. Furthermore, mplfinance would have to maintain this relatively complex chunk of code only in order to handle some use-cases. Ideally, I'd prefer to keep the mplfinance code as simple as practically possible).


Regarding aligning candles side-by-side, this is an interesting problem. I have some ideas that may work, but need time to test them out (and I don't expect to have that time for at least a few days). I'm thinking something perhaps along the lines of adding a slight time-shift to some of the candles, for example, for daily data, have some candles at 12:00 and others at 12:10 each day. But I am also concerned regarding mplfinance's code that automatically adjusts candle widths may not work properly, and may need to be modified to prevent candles from overlapping. I'll let you know if I come up with a solution. In the meantime please let me know if you also figure out a good way to do it.

All the best. --Daniel

@cosmostronomer
Copy link
Author

Dear Daniel,

Thank you for the response. I greatly appreciate your work and it has been working great so far.

I agree with your thought on keeping the code simple. If needed the user should handle with the correct data format with their own intention. Analogically, it goes along with why python does not type check data. (maybe cause I don't know python very well)


My workaround with candles side-by-side is exactly what you said. I added + 1 hrs to second dataset. Though this has not quite worked well with candles, they are working fine with normal plots. I will experiment more and let you know if I achieve what I want.

Best wishes,
Cosmostronomer

@DanielGoldfarb
Copy link
Collaborator

DanielGoldfarb commented Feb 22, 2021

Another possible workaround to having multiple candles on the same plot is to [also] set the alpha on the candles to maybe 0.5 (so the candles become see-through). This can be done by using mpf.make_marketcolors() to modify the candle alpha of any existing mpf style (see styles tutorial).

@mac133k
Copy link

mac133k commented Feb 28, 2021

I have another use case: I need to plot candles from a numpy array without a date/time column. I take OHLCV columns from the array and convert it to a pandas DataFrame for mpf.plot(), but it refuses to draw a plot due to the missing DatetimeIndex. I would argue that it is not always important for a user to have dates on the plot and that numerical indices would provide more flexibility to align the candles with other data series.

@DanielGoldfarb
Copy link
Collaborator

@mac133k

  • Are you saying you want to pass in OHLC(V) data with no dates at all for any of your OHLC(V) data sets?
  • Or you have one OHLC(V) data set with dates, and other OHLC data sets with no dates that you want to align with the first data set?

Please provide/describe a specific use case with some detail: what data do you have, and what do you want the plot to look like.

@mac133k
Copy link

mac133k commented Feb 28, 2021

@DanielGoldfarb
I generate batches of data from date-stamped OHLCV that become numbers-only price+features numpy arrays (later to be fed to ML models). I need to be able to visualize a batch for sanity checks. For now I am using mpf.plot() with a dataframe as an input where OHLCV are a slice of a batch and a fake DatetimeIndex generated from PeriodIndex. Visually the results are fine, but I couldn't figure out how to reset X axis index to a sequence of number, so I get rubbish coordinates pyplot's cursor locator.

I propose that there should be a parameter to switch off indexing by date.

@DanielGoldfarb
Copy link
Collaborator

DanielGoldfarb commented Feb 28, 2021

@mac133k
Thanks. Just to be clear, I want to understand this comment:

Visually the results are fine, but I couldn't figure out how to reset X axis index to a sequence of number, so I get rubbish coordinates pyplot's cursor locator.

So are you saying, with your fake DatetimeIndex, the plot looks fine, but you don't want to see date labels on the x-axis, rather you want to see just integers from 0 up to len(data) ?

If so, I am thinking we could possibly fake a datetime index internally, or something similar, so that the user doesn't have to pass a datetime index; I just want to clarify that, (aside from possibly faking it out internally) with your current work-around, you just want to see an index number, or row number, on the x-axis. Is that correct?

@mac133k
Copy link

mac133k commented Feb 28, 2021

@DanielGoldfarb

So are you saying, with your fake DatetimeIndex, the plot looks fine, but you don't want to see date labels on the x-axis, rather you want to see just integers from 0 up to len(data) ?

Exactly.

I just want to clarify that, (aside from possibly faking it out internally) with your current work-around, you just want to see an index number, or row number, on the x-axis. Is that correct?

Yes, that is correct.

If so, I am thinking we could possibly fake a datetime index internally, or something similar, so that the user doesn't have to pass a datetime index;

Perhaps internally dropping the index from OHLC(V) series, so they appear as numpy arrays, would be sufficient. Pyplot's plot functions index the Y values from 0 by default, so X defaults to numpy.arange(len(Y)).

@DanielGoldfarb
Copy link
Collaborator

@mac133k

One possible work-around that may work for you immediately:

Generate a fake datetime index that always begins Jan 1st, and uses every day (including weekends).

Then set the datetime_format to "Day of the year as a decimal number."

mpf.plot(data,...,datetime_format='%-j')

As long has you have less than 365 data points plotted, you will not see a repeat of the x-axis numbers, and it will appear as a simple sequential index.

@DanielGoldfarb
Copy link
Collaborator

@mac133k

Perhaps internally dropping the index from OHLC(V) series, so they appear as numpy arrays, would be sufficient. Pyplot's plot functions index the Y values from 0 by default, so X defaults to numpy.arange(len(Y)).

It's not quite that simple. There is a lot of code that assumes we are plotting a time series. There are benefits to doing that, for example, the x-axis is automatically formatted detecting whether the data is in minutes, hours, days, weeks, etc. It also allows users to specify trend lines and similar annotations by specifying dates and/or times, and will do the appropriate time-interpolation for the user. Users also can specify that the x-axis should be linear with time, so that non-trading periods show as gaps in the data.

This is not to say that what you are requesting (for the x-axis to be a simple range index) cannot be done; but we would have to carefully go through the code to ensure we don't break anything that relies on the time series assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants