Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reshape task (pivot, unpivot) #175

Open
jangorecki opened this issue Dec 30, 2020 · 4 comments
Open

reshape task (pivot, unpivot) #175

jangorecki opened this issue Dec 30, 2020 · 4 comments
Labels

Comments

@jangorecki
Copy link
Contributor

nice examples by @grantmcdermott can be found
https://grantmcdermott.com/reshape-benchmarks/
https://grantmcdermott.com/even-more-reshape/

@grantmcdermott
Copy link

Agree: I think that a reshaping benchmark is an important addition to the list. Happy to add a PR with my examples as-is if that helps? Some quick thoughts/issues:

  • I'd drop the Stata runs — I'm guessing you don't have the license reqs — leaving DT, dplyr, pandas, and DataFrames.jl implementations.
  • My examples are only wide-to-long, but easy enough to add a complement going long-to-wide (i.e. back to the original dataset).
  • My dataset is deliberately sparse (lots of missing obs). Would you want the same thing for this benchmark?

@jangorecki
Copy link
Contributor Author

@grantmcdermott Thank you for your comment.
Yes, Stata needs to be dropped, we stick to open source software.
No need PR, but eventually some assistance in reviewing design may be useful.

Ideally reshape task should test:

  • melt
  • dcast
  • 95%, 5%, 0% missing
  • different functions applied during dcast
  • multiple columns on id side
  • multiple columns on measure side
  • probably quite few other features (need to look at common usage patterns on SO)

All that needs to be categorized into: 5 "basic" and 5 "advanced" queries.
So the scope will be way bigger than your posts, yet your posts are very useful working example to start on them.

@grantmcdermott
Copy link

Sounds good. Lmk if and when you'd like someone to cast an extra eye over the tests.

@skanskan
Copy link

I came here to suggest we also need to benchmark reshaping times.
Glad to see other thought the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants