Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removes ZeroFrog's "optimized" memcpy and memcmp functions. #370

Merged

Conversation

Sonicadvance1
Copy link
Contributor

These were only compiled in on Windows and x86_32.
They provided "optimized" copies and compares based on blocksizes for the AMD Athlon and Duron CPU families.
The code was taken from something that AMD provides with a as-is license.
Just get rid of this crap.

@delroth
Copy link
Member

delroth commented May 15, 2014

LGTM, libc memcpy is probably much faster nowadays.

@degasus
Copy link
Member

degasus commented May 15, 2014

As x86_32 is deprecated, the performance doesn't matter much, so LGTM

@delroth
Copy link
Member

delroth commented May 17, 2014

Please rebase.

@lioncash
Copy link
Member

Looks good to me as well (following the rebase)

These were only compiled in on Windows and x86_32.
They provided "optimized" copies and compares based on blocksizes for the AMD Athlon and Duron CPU families.
The code was taken from something that AMD provides with a as-is license.
Just get rid of this crap.
@shuffle2
Copy link
Contributor

Maybe we should replace these with optimized x86_64 (or other platform specific ones), or just a placeholder wrapper so they can be marked for improvement in the future.
It's hard to believe that any standard library routine would be faster, especially if you know e.g. the size or alignment attributes of the buffers at compile time, but the compiler can't prove they are constant for whatever reason (for example, I would guess that many GX/DMA/etc buffers used by the emulated software happen to be nicely aligned on the host, as well). Would be interesting to see some tests around this...

@Parlane
Copy link
Member

Parlane commented May 22, 2014

First, a word of advice. Assume that the people who wrote your standard
library are not stupid. If there was a faster way to implement a general
memcpy, they'd have done it.

Second, yes, there are better alternatives.

  • In C++, use the std::copy function. It does the same thing, but it is
    1. safer, and 2) potentially faster in some cases. It is a template,
      meaning that it can be specialized for specific types, making it
      potentially faster than the general C memcpy.
  • Or, you can use your superior knowledge of your specific situation.
    The implementers of memcpy had to write it so it performed well in
    every case. If you have specific information about the situation where
    you need it, you might be able to write a faster version. For example, how
    much memory do you need to copy? How is it aligned? That might allow you to
    write a more efficient memcpy for _this_specific case. But it won't be
    as good in most other cases (if it'll work at all)

Matthew Parlane

On 22 May 2014 14:57, shuffle2 [email protected] wrote:

Maybe we should replace these with optimized x86_64 (or other platform
specific ones), or just a placeholder wrapper so they can be marked for
improvement in the future.
It's hard to believe that any standard library routine would be faster,
especially if you know e.g. the size or alignment attributes of the buffers
at compile time, but the compiler can't prove they are constant for
whatever reason (for example, I would guess that many GX/DMA/etc buffers
used by the emulated software happen to be nicely aligned on the host, as
well). Would be interesting to see some tests around this...


Reply to this email directly or view it on GitHubhttps://github.com//pull/370#issuecomment-43843861
.

@Parlane
Copy link
Member

Parlane commented May 22, 2014

@shuffle2
Copy link
Contributor

That is basically what I said...I expect we have situations which fall into "Or, you can use your superior knowledge of your specific situation.", especially around graphics and other buffers which the game programmers must have made properly aligned for the device to operate correctly. Our memory base is aligned, so "gamecube-aligned" buffers happen to be implicitly aligned from dolphin's view. However, the compiler cannot automagically see this.

@Sonicadvance1
Copy link
Contributor Author

If someone feels like taking the time to write a "faster" x86_64 specific memcpy and memcmp then do it. That is outside of the scope of this PR.

@galop1n
Copy link
Contributor

galop1n commented May 22, 2014

I made a Memcpy16 with dst and size have to be aligned on 16 for vertex buffer upload but not in git yet and i have first to profile to see if it is worth the effort

@shuffle2
Copy link
Contributor

Alright, we will leave the architecture-specific optimizations to future PRs.
Comment on that: It would be nice to first collect a list of candidate areas in dolphin where such "gc-aligned" data is copied.

shuffle2 added a commit that referenced this pull request May 22, 2014
Removes ZeroFrog's "optimized" memcpy and memcmp functions.
@shuffle2 shuffle2 merged commit b58753b into dolphin-emu:master May 22, 2014
@galop1n
Copy link
Contributor

galop1n commented May 22, 2014

Mine was dedicated to help VB uploading, nVidia and AMD advertise to align everything on 16 to help the driver and have as little possible overhead as possible.

@shuffle2
Copy link
Contributor

@galop1n OK, those could be nice as well :) Data with alignment requirements on the host may have slightly different alignment/size requirements, so it might be good to have a different specialization for such things (which may take into account host GPU model, CPU model, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants