-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64 Nonlinear Optimizer Issues #803
Comments
Somehow (I think it is maybe happening when the LM optimizer is initialized) the left operand of Debug output from Rot2 before optimizer's first linearization:
|
Yes I agree this could be uninitialized memory. Could you run the unit tests with clang sanitizer on? |
Despite having clang installed on my Jetson, it looks like GCC is the compiler in use both when building my entire project as well as when building only gtsam with Switched to use Is disabling that going to cause other issues in aarch64..? |
Interesting that |
(Currently building it on my Jetson with GTSAM_BUILD_WITH_MARCH_NATIVE OFF. Build seems to be proceeding, but it takes long enough to complete that I thought I would go ahead and mention that it is currently underway over here with
) |
Maybe you can try cross compiling on PC. The Jetson CPU cores are very slow. |
*also compiling on my Jetson Nano here |
Any hints out of running |
Things seem to have successfully compiled on my Jetson using clang-6 with @jiblancoc thanks for the suggestion! I am still getting re-acquainted with C++ so did not know about valgrind. Running
Even this very very basic test
|
Thanks for the report! Could you also run valgrind with this: https://www.valgrind.org/docs/manual/mc-manual.html#opt.track-origins BTW, are you using JetPack? I think JetPack is still on Ubuntu Bionic whose Boost version has some bugs that prevents |
Excellent to know about track-origins, thanks for highlighting that option! I will definitely have to try that out. After much tedious debugging, I found that the functions returning by-value instances of Rot2 were the culprit of the garbage values I was seeing. Values of cosine & sine of rotation looked how they should during construction of the instances to be returned, but when the value was actually returned, those good values disappeared and the calling code was left with junk. Adding a trivial Rot2 copy constructor as a user-defined constructor,
As for why the compiler-generated copy constructor is not sufficient for the simple return-by-value functions inside of Rot2, of that I am not yet completely sure, and I am surprised that such a trivial constructor would be problematic. Because compiling with both gcc and clang result in the same misbehavior on aarch64, it seems like the root cause of it may be something in the ARM64 implementation. Seems sort of like copy elision in return value optimization might be the culprit, and that user definition of the copy constructor makes it "non-trivial", avoiding such elision as described in https://developer.arm.com/documentation/ihi0059/latest , section GC++ABI 3.1.4 Return Values:
|
And yes, this is on JetPack, boost version is 1.65.1 |
Interesting find! I think this deserves a more formal investigation, but could you put up a quick PR to fix this first? Thanks a lot! |
@ProfFan I think a PR is pre-mature. We first need to find the root cause, as per the discussion on the user group (please read up on it :-)) |
I am not sure how much more there is to uncover here. I reverted all local changes with the exception of the addition of the copy constructor I added, I am rebuilding everything from scratch again to re-run I will report back with the assembly instructions with & without the user-defined copy constructor for |
Jetson is taking forever to build the rest of the tests as usual, but here are the resulting disassembly in amd64. It has been some time since I have worked with assembly and have never actually worked with aarch64 instruction set yet, but you can see in the
|
100% of I also just re-ran Will get that additional disassembly for completeness and then send a PR. |
wow, awesome! |
@dellaert Thanks! Below are the resulting variations of aarch64 instructions for the same function. I mentioned above that I am no expert in assembly, and certainly not in aarch64 flavor, so anyone feel free to correct me, but it does seem to at least share a similar difference pointed out above between the amd64 instruction variations that hint at copy elision: when implicit copy constructor is in use, the instructions At this point, I don't think a super deep dive into where the copy eliding version is going wrong is warranted, given that tests are passing in both gtsam and my own project.
|
(seems I am unable to create the link between issue & PR) |
@jaelrod you can link from the drop down in the right side. Closing since this has been resolved! |
@jaelrod pointed out something interesting in their suggestion to update this line, Line 110 in aec2cf0
to: Rot2 inverse() const { return fromCosSin(c_, -s_);} The suggested form calls Given this comment:
There might be some assumptions in I'm unsure how the introduction of a user-defined copy ctor made a difference here. My best guess is similar to jaelrod's: before, the implementation before the PR was performing URVO copy elision (mandatory as of C++17). Now, it no longer qualifies for copy elision per the spec (the return type of |
@ProfFan thank you for re-opening this. We are seeing similar behavior in a larger project where GTSAM is only a small part. The code runs normal on x86 and when we port to Jetson Orin NX 16G we get different behavior. Is this bug currently fixed? We run a 3D visual SLAM system, but found this issue to be close enough that we wanted to ask. Thanks a lot |
Description
I have some unit tests that chain together some BetweenFactor using an odometry mock, inject some slightly contradictory supplemental evidence in the form of other factors, such as a unary factor akin to the "GPS-like" factor from LocalizationExample.cpp, and finally use LevenbergMarquardtOptimizer to
optimize()
the factor graph and assert the result.For example, a test might start with some rather noisy mocked odometry moving forward at a consistent speed with a heading of pi/4, qualify the noisy estimates with some very not-noisy GPS-like factors that suggest movement is actually happening in the pi/2 direction, rather than the pi/4 direction, then optimize the graph and make sure that the optimized poses were along the trajectory suggested by the more reliable GPS-like factors.
Such a test and several others work fine on my desktop (x86_64, Ubuntu 18.04) but fail on my Jetson Xavier NX (aarch64, Ubuntu 18.04).
Checking out borglab gtsam source from github and running
make check
to invoke unit tests on the aarch64 device results in many failures, which I expect are related to my own, the following example of which I think is similar to one of the more basic included examples in gtsam :I am in the process of debugging gtsam source through verbose optimizer output and will provide updates as I make progress, but I am suspicious of Eigen anti-aliasing or use of
auto
keyword and type resolution on aarch64.One thing I have observed so far only in aarch64 is:
Another thing I have noticed in aarch64 is that the initial total error of the same system seems much lower than that of amd64, possibly resulting in less/no optimization iterations.
Update 1:
Seems like the calculation of error of
BetweenFactor<Pose2>
is indeed incorrect on aarch64.Debug output from amd64 of BetweenFactor::evaluateError for the very first error calculation based on estimates:
Debug output from aarch64 of BetweenFactor::evaluateError for almost the exact same poses:
Playing around inside of
BetweenFactor
andLieGroup
but haven't had much luck so far.Update 2:
I haven't 100% figured out why, but
Rot2 inverse()
in geometry/Rot2.h was returning an inverse with incorrect heading angle when inverting Pose2 as a part of factorization ofBetweenFactor<Pose2>
.Changing this line,
gtsam/gtsam/geometry/Rot2.h
Line 110 in aec2cf0
Rot2 inverse() const { return fromCosSin(c_, -s_);}
Like I said, not yet sure why this makes a difference, as all fromCosSin appears to be doing is normalizing c_ & s_, and those should have already been normalized...
Update 3:
Also, Rot2::operator*(const Rot2& R) does not behave as expected in aarch64. The following seems to relieve the problem: return fromAngle(theta() + R.theta());
Although, again, I am not sure why this works, as making use of the trig identity for cos(th1+th2) and sin(th1+th2) as is done in master branch should be identical in terms of resulting behavior...
Steps to reproduce
BetweenFactor<Pose2>
and some initial estimatesExpected behavior
Factor graph optimizes in the same way it does on amd64.
Environment
Jetson Xavier NX, Ubuntu 18.04
The text was updated successfully, but these errors were encountered: