Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace plotter.XYs with slices #284

Open
btracey opened this issue May 31, 2016 · 22 comments
Open

Replace plotter.XYs with slices #284

btracey opened this issue May 31, 2016 · 22 comments

Comments

@btracey
Copy link
Member

btracey commented May 31, 2016

In my experience, most data exists as x, y []float64, and not as []struct{X,Y float64}. plotter should replace these types

See discussion https://groups.google.com/d/msg/gonum-dev/FrGpgA-pZQs/FCy8op2JCwAJ

@kortschak
Copy link
Member

Now if we had tables...

@sbinet
Copy link
Member

sbinet commented Jun 1, 2016

wrt tables, as far as gonum/plot is concerned, everything is a float64.
so, perhaps we could get away with a float64-based ndim array?
but then, is it really the best option to couple gonum/plot with that specific implementation?

should we give @Kunde21's numgo.Array64 a go?

relying on []float64 seems simpler. row-wise data isn't very cache friendly anyways...

@kortschak
Copy link
Member

kortschak commented Jun 1, 2016 via email

@ctessum
Copy link
Contributor

ctessum commented Jun 1, 2016

Why not just add the code:

type XYArray struct {
   X, Y []float64
}

func (xy XYArray) Len() int {
   return len(xy.X)
}

func (xy XYArray) XY(i int) (x,y float64) {
   return xy.X[i], xy.Y[i]
}

Then one could just do

vals := XYArray{X: x, Y: y}

and directly add vals to a plotter.

@btracey
Copy link
Member Author

btracey commented Jun 1, 2016

https://godoc.org/github.com/btracey/myplot#VecXY :)

Still, I think the slice representation is more natural for gonum/plot. I think it's more common for users of the code, and it's easier to add in more information (say, z data). Frequently the data does need to exist as a slice even internal to plot, see for example the code for NewErrorPoints

@ctessum
Copy link
Contributor

ctessum commented Jun 1, 2016

So is the question, then, whether to replace the XYs type with VecXY or XYArray, or whether to replace the XYer interface with a concrete type in function arguments? i.e.

NewScatter(xys XYer) 

would become

NewScatter(x, y []float64)

To me, the first option seems like a good idea. I see a tradeoff with the second option, where making the change would make things easier for a lot use cases (i.e., where the data to be plotted is already in x and y arrays), but it would make things more difficult for some use cases (e.g., plotting geographic locations, which typically are in the form struct{X, Y float64}).

I would lean toward keeping the current signature for the function arguments but replacing XYs with VecXY or XYVector as that seems to be most in line with the 'do less, enable more' philosophy, and allows the user to make decisions about their data format rather than having the decisions made for them by the plotting package.

@btracey
Copy link
Member Author

btracey commented Jun 1, 2016

I think your suggestion is reasonable, but just a bit of devil's advocate: Is it actually true that geographic data usually come as struct{X, Y float64})? Doesn't it usually come as struct{Lat, Long, Altitude, {other data} float64})? It doesn't affect your argument, but it does mean that some kind of type munging is always necessary for that kind of data.

@ctessum
Copy link
Contributor

ctessum commented Jun 1, 2016

Yes, although I think that supports my point rather than refutes it. Having an interface function argument means that the user can just implement the XYer interface for whatever complicated data type they have (which, as shown above, takes 6 lines of code), rather have having to mung. So, for example:

type Point {
   Lat,Lon,Altitude, other, variables float64
}

type Points []Point

func (p Points) Len() int {
   return len(p)
}

func (p Points) XY(i int) (x,y float64) {
   return p[i].Lon, p[i].Lat
}

@btracey
Copy link
Member Author

btracey commented Jun 1, 2016

Yes, I agree it supports your point.

@eaburns
Copy link
Member

eaburns commented Jun 1, 2016

Maybe I'm confused, because I don't understand the problem. Doesn't
everything take the XYer interface? You can implement that however you
like. That's why I made it an interface.

On Wed, Jun 1, 2016, 12:24 Brendan Tracey [email protected] wrote:

Yes, I agree it supports your point.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#284 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAOXB2z8efGhNToSNjq8M8VeklCO56Fqks5qHbJZgaJpZM4Iq_th
.

@kostya-sh
Copy link
Contributor

I think this ticket should be about providing an easy way to create XYer from X and Y.

This can be done by adding XYArray type as proposed by @ctessum (the name requires some thought though).

Or via function:

func ZipXY(x []float64, y []float64) XYer

Adding some convenience functions like:

NewScatterXY(x []float64, y []float64) (*Scatter, error)

might be good idea too.

@sbinet
Copy link
Member

sbinet commented Jun 7, 2016

@kortschak wrt tables and what not: I was browsing the interwebs the other day and stumbled upon @aclements' playground for providing a gg-inspired toolkit for Go:
https://godoc.org/github.com/aclements/go-gg

especially: https://godoc.org/github.com/aclements/go-gg/table#Table

It's an interesting read :)

@kortschak
Copy link
Member

That is something that looks like what we would be likely to implement as part of a data frame package, but not what I was talking about here.

@kortschak
Copy link
Member

I'm really not sure this is worth it. The code for it is just

type zip struct { x, y []float64 }

func (z zip) Len() int                { return len(z.x) }
func (z zip) XY(i int) (x, y float64) { return z.x[i], z.y[i] }

This is less than implementing sort.Interface.

@sbinet
Copy link
Member

sbinet commented Jun 8, 2016

you could flip it around: "sort" provides implementations for sort.Strings, sort.Float64s and sort.Ints and it's in the stdlib (so the bar for inclusion is higher.)

adding just one exported function (func ZipXY(x, y []float64) XYer) is IMHO hitting the sweet spot.

@Kunde21
Copy link
Member

Kunde21 commented Jun 8, 2016

I have to agree with @sbinet about the usefulness of a ZipXY(x,y []float64) XYer) function. A portion of the users will have existing slices of structs that can easily implement XYer, but there's a lot of data that starts in the form of one X vector and many Y's. That ends up requiring a zip-style function in each and every project that doesn't have a clean slice of structs which can implement XYer directly. Yes, it's a helper function, but it's not even close to an esoteric use.

Edit: By comparison sort only needs an interface implementations, whereas a set of Y slices requires a new constructor or closure before implementing the XYer interface.

@kortschak
Copy link
Member

kortschak commented Jun 8, 2016

What does the function give you? If the type is exported you get exactly the same functionality with a smaller API surface, unless there is a proposal that ZipXY panics when len(x) != len(y), which the implementation may also do (though later)* and can be made to explicitly via the Len method.

* This is another issue I'd like to address again - that all data are copied by plot.

@eaburns
Copy link
Member

eaburns commented Jun 8, 2016

The copying was a conscious decision. If we don't copy the data, then it can change out from under the plotter — safer to copy it. Is copying really causing a problem for anyone? Either the plotter does not copy and stores only a summary of the data (boxplot), or the plotter copies the data and plots all of it (lines, scatter, etc). In the first case there is no copy, so it's not a problem. In the second case, if you are passing so much data that you just can't copy it, then you will certainly will have an issue when you go to plot that many points.

@kortschak
Copy link
Member

Yeah; we've had this discussion in the past. All our (gonum) data structures allow things to change underfoot unless they are created by the type/func. Because of this I think it's something worth looking at again. My interest is not size/allocations (which is our drive elsewhere) but flexibility.

@jasonpfox
Copy link

As a new user of plot and plotter, I would definitely not want to see this change. The current XYer approach is elegant and easy to use with a myriad of data types. I am plotting a lot of things, and in all cases I have a slice or set of much more complicated data structures form which I generate an 'X' and 'Y' for plot. Data definitely does not exist as X, Y []float64. I could construct that, but it would be much less user friendly that the current XYer interfaces. As it is with XYer, I am even using cases with different interfaces pointing the same underlying data structures with different XY methods returning different 'X' and 'Y' values for different plots based on the same underlying data structures. If there is anything new, try to keep the XYer approach alongside it.

@kortschak
Copy link
Member

kortschak commented Jun 19, 2016 via email

@brunnre8
Copy link

A ZipXY(x,y []float64) XYer) would definitely be appreciated, that's actually something that I was just searching for as my Data tends to be in slices of X /y data (one slice for each)

Some convenience functions wouldn't hurt anyone, especially ones you would need to write again and again (or write an own package, which defeats the point of using a plotting package in the first place)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants