Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aarch64 Nonlinear Optimizer Issues #803

Open
jaelrod opened this issue Jun 23, 2021 · 23 comments · Fixed by #806
Open

aarch64 Nonlinear Optimizer Issues #803

jaelrod opened this issue Jun 23, 2021 · 23 comments · Fixed by #806
Assignees

Comments

@jaelrod
Copy link
Contributor

jaelrod commented Jun 23, 2021

Description

I have some unit tests that chain together some BetweenFactor using an odometry mock, inject some slightly contradictory supplemental evidence in the form of other factors, such as a unary factor akin to the "GPS-like" factor from LocalizationExample.cpp, and finally use LevenbergMarquardtOptimizer to optimize() the factor graph and assert the result.

For example, a test might start with some rather noisy mocked odometry moving forward at a consistent speed with a heading of pi/4, qualify the noisy estimates with some very not-noisy GPS-like factors that suggest movement is actually happening in the pi/2 direction, rather than the pi/4 direction, then optimize the graph and make sure that the optimized poses were along the trajectory suggested by the more reliable GPS-like factors.

Such a test and several others work fine on my desktop (x86_64, Ubuntu 18.04) but fail on my Jetson Xavier NX (aarch64, Ubuntu 18.04).

Checking out borglab gtsam source from github and running make check to invoke unit tests on the aarch64 device results in many failures, which I expect are related to my own, the following example of which I think is similar to one of the more basic included examples in gtsam :

gtsam/tests/testNonlinearOptimizer.cpp:484: Failure: "assert_equal(expected, dl_result, tol)" 
Not equal:
expected:
Values with 3 values:
Value x1: (gtsam::Pose2)
(0, 0, 0)
Value x2: (gtsam::Pose2)
(1.5, 0, 0)
Value x3: (gtsam::Pose2)
(3, 0, 0)

actual:
Values with 3 values:
Value x1: (gtsam::Pose2)
(-3.74995e-06, 0, 0)
Value x2: (gtsam::Pose2)
(1.49999, 0, 0)
Value x3: (gtsam::Pose2)
(3, 0, 0)

I am in the process of debugging gtsam source through verbose optimizer output and will provide updates as I make progress, but I am suspicious of Eigen anti-aliasing or use of auto keyword and type resolution on aarch64.

One thing I have observed so far only in aarch64 is:

Partial Cholesky on HessianFactor failed.

Frontal keys Position 0: 0
HessianFactor:
 keys: 0(3) 1(3) 
Augmented information matrix: [
	1.00453889587e+81, -8.863578493e+80, 1.35333966459e+82, -1.22992576855e+42, -7.68703605343e+40, 0, -1.78961791309e+41;
	-8.863578493e+80, 1.00453889587e+81, -1.19233688982e+82, 1.22992576855e+42, -7.68703605343e+40, 0, 3.06711287414e+41;
	1.35333966459e+82, -1.19233688982e+82, 1.82326703797e+83, -1.65582451852e+43, -1.04723572027e+42, -400, -2.39906858576e+42;
	-1.22992576855e+42, 1.22992576855e+42, -1.65582451852e+43, 1600, 0, 0, 315.903994301;
	-7.68703605343e+40, -7.68703605343e+40, -1.04723572027e+42, 0, 100, 0, -83.0941179516;
	0, 0, -400, 0, 0, 400, 628.318530718;
	-1.78961791309e+41, 3.06711287414e+41, -2.39906858576e+42, 315.903994301, -83.0941179516, 628.318530718, 304968.261602
]

Another thing I have noticed in aarch64 is that the initial total error of the same system seems much lower than that of amd64, possibly resulting in less/no optimization iterations.

Update 1:
Seems like the calculation of error of BetweenFactor<Pose2> is indeed incorrect on aarch64.

Debug output from amd64 of BetweenFactor::evaluateError for the very first error calculation based on estimates:

p1: (11, 10.5, 0.392699081699), p2: (20, 23, 0.785398163397), err: [13.1184182938, 7.63899870558, 0.352566699052]

Debug output from aarch64 of BetweenFactor::evaluateError for almost the exact same poses:

p1: (11, 10.5, 0.392699081699), p2: (20, 23, 0.785398163397), err: [-0.261580542738, -0.309698692039, -1.57079632679]

Playing around inside of BetweenFactor and LieGroup but haven't had much luck so far.

Update 2:
I haven't 100% figured out why, but Rot2 inverse() in geometry/Rot2.h was returning an inverse with incorrect heading angle when inverting Pose2 as a part of factorization of BetweenFactor<Pose2>.

Changing this line,

Rot2 inverse() const { return Rot2(c_, -s_);}
to read the following fixes at least that issue (seems like there are more still to find):
Rot2 inverse() const { return fromCosSin(c_, -s_);}

Like I said, not yet sure why this makes a difference, as all fromCosSin appears to be doing is normalizing c_ & s_, and those should have already been normalized...

Update 3:
Also, Rot2::operator*(const Rot2& R) does not behave as expected in aarch64. The following seems to relieve the problem: return fromAngle(theta() + R.theta());

Although, again, I am not sure why this works, as making use of the trig identity for cos(th1+th2) and sin(th1+th2) as is done in master branch should be identical in terms of resulting behavior...

Steps to reproduce

  1. Using an aarch64 device
  2. Create a factor graph with some BetweenFactor<Pose2> and some initial estimates
  3. Optimize the factor graph
  4. Optimization does not work as expected

Expected behavior

Factor graph optimizes in the same way it does on amd64.

Environment

Jetson Xavier NX, Ubuntu 18.04

@ProfFan ProfFan self-assigned this Jun 23, 2021
@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 23, 2021

Somehow (I think it is maybe happening when the LM optimizer is initialized) the left operand of Rot2*Rot2 has an uninitialized c_ and s_ during an invocation before the optimizer takes its first iteration. I think what's happening is that when this uninitialized rotation gets inverted and multiplied during the optimization iterations, the cosine and/or sine of the rotation component of pose estimates becomes extremely positive or negative, the result of which is atan2() for the pose heading angle returning one of its inflection points, as documented in cppreference. Have not figured out from where this uninitialized rotation is coming, but given that reproducing it repeatedly yields wildly different figures for c_ and s_ for that rotation that shows up before the optimization, that does appear to be what is happening.

Debug output from Rot2 before optimizer's first linearization:

Rot2::operator*(), c_: 3.03428333399e-86, s_: 1.14371993161e-71

Initial values
Values with 1 values:
Value 0: (gtsam::Pose2)
(10.0808679401, 9.99346051488, 0.428400412304)

@ProfFan
Copy link
Collaborator

ProfFan commented Jun 23, 2021

Yes I agree this could be uninitialized memory. Could you run the unit tests with clang sanitizer on?

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 23, 2021

Despite having clang installed on my Jetson, it looks like GCC is the compiler in use both when building my entire project as well as when building only gtsam with make check.

Switched to use clang and have:
clang: error: the clang compiler does not support '-march=native'

Is disabling that going to cause other issues in aarch64..?

@ProfFan
Copy link
Collaborator

ProfFan commented Jun 23, 2021

Interesting that -march=native is unsupported, let me try on my Jetson Nano.

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 23, 2021

(Currently building it on my Jetson with GTSAM_BUILD_WITH_MARCH_NATIVE OFF. Build seems to be proceeding, but it takes long enough to complete that I thought I would go ahead and mention that it is currently underway over here with

--  C compilation flags                              :  -fsanitize=memory -O3 -DNDEBUG
--  C++ compilation flags                            :  -fsanitize=memory -O3 -DNDEBUG

)

@ProfFan
Copy link
Collaborator

ProfFan commented Jun 23, 2021

Maybe you can try cross compiling on PC. The Jetson CPU cores are very slow.

@ProfFan
Copy link
Collaborator

ProfFan commented Jun 23, 2021

*also compiling on my Jetson Nano here

@jlblancoc
Copy link
Member

Any hints out of running make testXXX.valgrind on a native aarch64 system?

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 25, 2021

Things seem to have successfully compiled on my Jetson using clang-6 with -fsanitize=memory on, but I don't really see anything different when running gtsam tests. Any luck on your side, @ProfFan ?

@jiblancoc thanks for the suggestion! I am still getting re-acquainted with C++ so did not know about valgrind. Running make testNonlinearOptimizer.valgrind on amd64 looks pretty clean (some of the tests fail but no memory errors reported). Running it in aarch64 reports loads of issues looking like out-of-bounds access (both writes & reads), some of which seem to be fairly pertinent (these are in the order in which they appear):

==2475== Invalid write of size 8
==2475==    at 0x4A5227C: gtsam::Pose2::equals(gtsam::Pose2 const&, double) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4BB4507: gtsam::Values::equals(gtsam::Values const&, double) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4342C7: bool gtsam::assert_equal<gtsam::Values>(gtsam::Values const&, gtsam::Values const&, double) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x42168B: NonlinearOptimizerFactorizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x444E43: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==  Address 0x1ffefffa60 is on thread 1's stack
==2475==  48 bytes below stack pointer
==2475== Invalid write of size 8
==2475==    at 0x4B64450: gtsam::VectorValues::equals(gtsam::VectorValues const&, double) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x43561B: bool gtsam::assert_equal<gtsam::VectorValues>(gtsam::VectorValues const&, gtsam::VectorValues const&, double) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x424633: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==  Address 0x1ffeffef40 is on thread 1's stack
==2475==  48 bytes below stack pointer
==2475== 
==2475== Conditional jump or move depends on uninitialised value(s)
==2475==    at 0x4A3A18C: gtsam::equal_with_abs_tol(Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, double) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4B644C7: gtsam::VectorValues::equals(gtsam::VectorValues const&, double) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x43561B: bool gtsam::assert_equal<gtsam::VectorValues>(gtsam::VectorValues const&, gtsam::VectorValues const&, double) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x424633: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475== 
==2475== Conditional jump or move depends on uninitialised value(s)
==2475==    at 0x4BB0D2C: gtsam::NonlinearOptimizer::defaultOptimize() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x424937: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475== 
==2475== Conditional jump or move depends on uninitialised value(s)
==2475==    at 0x4BB0D2C: gtsam::NonlinearOptimizer::defaultOptimize() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x424937: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475== 
==2475== Conditional jump or move depends on uninitialised value(s)
==2475==    at 0x4BB0D30: gtsam::NonlinearOptimizer::defaultOptimize() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x424937: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475== 
==2475== Conditional jump or move depends on uninitialised value(s)
==2475==    at 0x4BB0D3C: gtsam::NonlinearOptimizer::defaultOptimize() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x424937: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475== 
==2475== Conditional jump or move depends on uninitialised value(s)
==2475==    at 0x511ED5C: cos (s_sin.c:559)
==2475==    by 0x4A52887: gtsam::Pose2::ChartAtOrigin::Retract(Eigen::Matrix<double, 3, 1, 0, 3, 1> const&, gtsam::OptionalJacobian<3, 3>) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x43C84F: gtsam::LieGroup<gtsam::Pose2, 3>::retract(Eigen::Matrix<double, 3, 1, 0, 3, 1> const&, gtsam::OptionalJacobian<3, 3>, gtsam::OptionalJacobian<3, 3>) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x43C39F: gtsam::GenericValue<gtsam::Pose2>::retract_(Eigen::Matrix<double, -1, 1, 0, -1, 1> const&) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4BB3E03: gtsam::Values::Values(gtsam::Values const&, gtsam::VectorValues const&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4B6D3E3: gtsam::DoglegOptimizerImpl::IterationResult gtsam::DoglegOptimizerImpl::Iterate<gtsam::GaussianBayesTree, gtsam::NonlinearFactorGraph, gtsam::Values>(double, gtsam::DoglegOptimizerImpl::TrustRegionAdaptationMode, gtsam::VectorValues const&, gtsam::VectorValues const&, gtsam::GaussianBayesTree const&, gtsam::NonlinearFactorGraph const&, gtsam::Values const&, double, bool) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4B6C193: gtsam::DoglegOptimizer::iterate() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4BB0B9B: gtsam::NonlinearOptimizer::defaultOptimize() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x424937: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475== 
==2475== Conditional jump or move depends on uninitialised value(s)
==2475==    at 0x511EDA8: cos (s_sin.c:562)
==2475==    by 0x4A52887: gtsam::Pose2::ChartAtOrigin::Retract(Eigen::Matrix<double, 3, 1, 0, 3, 1> const&, gtsam::OptionalJacobian<3, 3>) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x43C84F: gtsam::LieGroup<gtsam::Pose2, 3>::retract(Eigen::Matrix<double, 3, 1, 0, 3, 1> const&, gtsam::OptionalJacobian<3, 3>, gtsam::OptionalJacobian<3, 3>) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x43C39F: gtsam::GenericValue<gtsam::Pose2>::retract_(Eigen::Matrix<double, -1, 1, 0, -1, 1> const&) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4BB3E03: gtsam::Values::Values(gtsam::Values const&, gtsam::VectorValues const&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4B6D3E3: gtsam::DoglegOptimizerImpl::IterationResult gtsam::DoglegOptimizerImpl::Iterate<gtsam::GaussianBayesTree, gtsam::NonlinearFactorGraph, gtsam::Values>(double, gtsam::DoglegOptimizerImpl::TrustRegionAdaptationMode, gtsam::VectorValues const&, gtsam::VectorValues const&, gtsam::GaussianBayesTree const&, gtsam::NonlinearFactorGraph const&, gtsam::Values const&, double, bool) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4B6C193: gtsam::DoglegOptimizer::iterate() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4BB0B9B: gtsam::NonlinearOptimizer::defaultOptimize() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x424937: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475== 
==2475== Use of uninitialised value of size 8
==2475==    at 0x511EE28: do_cos (s_sin.c:165)
==2475==    by 0x511EE28: cos (s_sin.c:564)
==2475==    by 0x4A52887: gtsam::Pose2::ChartAtOrigin::Retract(Eigen::Matrix<double, 3, 1, 0, 3, 1> const&, gtsam::OptionalJacobian<3, 3>) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x43C84F: gtsam::LieGroup<gtsam::Pose2, 3>::retract(Eigen::Matrix<double, 3, 1, 0, 3, 1> const&, gtsam::OptionalJacobian<3, 3>, gtsam::OptionalJacobian<3, 3>) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x43C39F: gtsam::GenericValue<gtsam::Pose2>::retract_(Eigen::Matrix<double, -1, 1, 0, -1, 1> const&) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4BB3E03: gtsam::Values::Values(gtsam::Values const&, gtsam::VectorValues const&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4B6D3E3: gtsam::DoglegOptimizerImpl::IterationResult gtsam::DoglegOptimizerImpl::Iterate<gtsam::GaussianBayesTree, gtsam::NonlinearFactorGraph, gtsam::Values>(double, gtsam::DoglegOptimizerImpl::TrustRegionAdaptationMode, gtsam::VectorValues const&, gtsam::VectorValues const&, gtsam::GaussianBayesTree const&, gtsam::NonlinearFactorGraph const&, gtsam::Values const&, double, bool) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4B6C193: gtsam::DoglegOptimizer::iterate() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x4BB0B9B: gtsam::NonlinearOptimizer::defaultOptimize() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==2475==    by 0x424937: NonlinearOptimizerMoreOptimizationTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4448FB: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)

Even this very very basic test TEST( NonlinearOptimizer, paramsEquals ) experiences an out-of-bounds error:

==2475== Invalid write of size 8
==2475==    at 0x41B85C: NonlinearOptimizerparamsEqualsTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x444E43: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==    by 0x4330BF: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==2475==  Address 0x1ffeffff40 is on thread 1's stack
==2475==  96 bytes below stack pointer

@ProfFan
Copy link
Collaborator

ProfFan commented Jun 25, 2021

Thanks for the report! Could you also run valgrind with this: https://www.valgrind.org/docs/manual/mc-manual.html#opt.track-origins

BTW, are you using JetPack? I think JetPack is still on Ubuntu Bionic whose Boost version has some bugs that prevents ubsan and asan from working. (I tried and failed on my Jetson Nano).

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 28, 2021

Excellent to know about track-origins, thanks for highlighting that option! I will definitely have to try that out.

After much tedious debugging, I found that the functions returning by-value instances of Rot2 were the culprit of the garbage values I was seeing. Values of cosine & sine of rotation looked how they should during construction of the instances to be returned, but when the value was actually returned, those good values disappeared and the calling code was left with junk. Adding a trivial Rot2 copy constructor as a user-defined constructor, Rot2(const Rot2& rot) : c_(rot.c_), s_(rot.s_) {}, does result in it getting called when returning Rot2 by value from a function, and it does actually resolve the uninitialized values issue (without needing my cosine & sine normalization band-aid fix from above). The same test make testNonlinearOptimizer.valgrind drops from loads of uninitialized memory errors to only the following two:

==23437== Invalid read of size 8
==23437==    at 0x144C4C: gtsam::BetweenFactor<Eigen::Matrix<double, 2, 1, 0, 2, 1> >::evaluateError(Eigen::Matrix<double, 2, 1, 0, 2, 1> const&, Eigen::Matrix<double, 2, 1, 0, 2, 1> const&, boost::optional<Eigen::Matrix<double, -1, -1, 0, -1, -1>&>, boost::optional<Eigen::Matrix<double, -1, -1, 0, -1, -1>&>) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==    by 0x143B77: gtsam::NoiseModelFactor2<Eigen::Matrix<double, 2, 1, 0, 2, 1>, Eigen::Matrix<double, 2, 1, 0, 2, 1> >::unwhitenedError(gtsam::Values const&, boost::optional<std::vector<Eigen::Matrix<double, -1, -1, 0, -1, -1>, std::allocator<Eigen::Matrix<double, -1, -1, 0, -1, -1> > >&>) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==    by 0x4BB260F: gtsam::NoiseModelFactor::error(gtsam::Values const&) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x4BB4C07: gtsam::NonlinearFactorGraph::error(gtsam::Values const&) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x4B7B807: gtsam::GaussNewtonOptimizer::GaussNewtonOptimizer(gtsam::NonlinearFactorGraph const&, gtsam::Values const&, gtsam::GaussNewtonParams const&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x13534F: NonlinearOptimizerPoint2LinearOptimizationWithHuberTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==    by 0x145FB7: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==    by 0x12A5B3: main (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==  Address 0x5656ce0 is 0 bytes after a block of size 16 alloc'd
==23437==    at 0x4845BFC: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==23437== 
==23437== Invalid read of size 8
==23437==    at 0x144C4C: gtsam::BetweenFactor<Eigen::Matrix<double, 2, 1, 0, 2, 1> >::evaluateError(Eigen::Matrix<double, 2, 1, 0, 2, 1> const&, Eigen::Matrix<double, 2, 1, 0, 2, 1> const&, boost::optional<Eigen::Matrix<double, -1, -1, 0, -1, -1>&>, boost::optional<Eigen::Matrix<double, -1, -1, 0, -1, -1>&>) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==    by 0x143BDB: gtsam::NoiseModelFactor2<Eigen::Matrix<double, 2, 1, 0, 2, 1>, Eigen::Matrix<double, 2, 1, 0, 2, 1> >::unwhitenedError(gtsam::Values const&, boost::optional<std::vector<Eigen::Matrix<double, -1, -1, 0, -1, -1>, std::allocator<Eigen::Matrix<double, -1, -1, 0, -1, -1> > >&>) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==    by 0x4BB2893: gtsam::NoiseModelFactor::linearize(gtsam::Values const&) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x4BB420F: gtsam::(anonymous namespace)::_LinearizeOneFactor::operator()(tbb::blocked_range<unsigned long> const&) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x4BB440B: tbb::interface9::internal::start_for<tbb::blocked_range<unsigned long>, gtsam::(anonymous namespace)::_LinearizeOneFactor, tbb::auto_partitioner const>::execute() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x4DA0557: ??? (in /usr/lib/aarch64-linux-gnu/libtbb.so.2)
==23437==    by 0x4D9DA9F: ??? (in /usr/lib/aarch64-linux-gnu/libtbb.so.2)
==23437==    by 0x4BB573B: gtsam::NonlinearFactorGraph::linearize(gtsam::Values const&) const (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x4B7A873: gtsam::GaussNewtonOptimizer::iterate() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x4BB9C77: gtsam::NonlinearOptimizer::defaultOptimize() (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/gtsam/libgtsam.so.4.1.0)
==23437==    by 0x135357: NonlinearOptimizerPoint2LinearOptimizationWithHuberTest::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==    by 0x145FB7: TestRegistry::run(TestResult&) (in /home/adsasine/code/Azure-Compute-Robotics/src/robot/common/gtsam/build/tests/testNonlinearOptimizer)
==23437==  Address 0x5657840 is 0 bytes after a block of size 16 alloc'd
==23437==    at 0x4845BFC: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==23437== 

As for why the compiler-generated copy constructor is not sufficient for the simple return-by-value functions inside of Rot2, of that I am not yet completely sure, and I am surprised that such a trivial constructor would be problematic. Because compiling with both gcc and clang result in the same misbehavior on aarch64, it seems like the root cause of it may be something in the ARM64 implementation. Seems sort of like copy elision in return value optimization might be the culprit, and that user definition of the copy constructor makes it "non-trivial", avoiding such elision as described in https://developer.arm.com/documentation/ihi0059/latest , section GC++ABI 3.1.4 Return Values:

GC++ABI 3.1.4 Return Values

When a return value has a non-trivial copy constructor or destructor, the address of the caller-allocated temporary is passed in the Indirect Result Location Register (r8) rather than as an implicit first parameter before the this parameter and user parameters.

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 28, 2021

And yes, this is on JetPack, boost version is 1.65.1

@ProfFan
Copy link
Collaborator

ProfFan commented Jun 28, 2021

Interesting find! I think this deserves a more formal investigation, but could you put up a quick PR to fix this first? Thanks a lot!

@dellaert
Copy link
Member

@ProfFan I think a PR is pre-mature. We first need to find the root cause, as per the discussion on the user group (please read up on it :-))

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 29, 2021

@dellaert, @ProfFan

I am not sure how much more there is to uncover here. I reverted all local changes with the exception of the addition of the copy constructor I added, Rot2(const Rot2& r), and all of the unit tests on my end are now passing on my Jetson (the ones described as having some BetweenFactor<Pose2> chains constrained by some custom derived unary factors along the way).

I am rebuilding everything from scratch again to re-run make check on my Jetson to see if the rest of the gtsam tests also pass. My plan is to capture the disassembly of a simple function returning a Rot2 by-value (Rot2::atan2(double y, double x) looks promising), remove the user-defined copy constructor, recompile the code, and then compare the resulting assembly instructions for the same function, in hopes to identify how exactly the implicit compiler-generated copy constructor is misbehaving for the return-by-value functions in Rot2, but I think ultimately that misbehavior is not the responsibility of gtsam.

I will report back with the assembly instructions with & without the user-defined copy constructor for Rot2, but probably it makes sense to go ahead and send a PR for this and then just keep an eye out for similar issues in aarch64 (and other platforms).

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 29, 2021

Jetson is taking forever to build the rest of the tests as usual, but here are the resulting disassembly in amd64. It has been some time since I have worked with assembly and have never actually worked with aarch64 instruction set yet, but you can see in the objdump from amd64 a hint that I am on the right track with respect to return value optimization: you can see that without the user-defined copy constructor, the function Rot2::atan2 does not push/pop the stack, yet when the user-defined constructor is there, it does push %rbx and then pop %rbx before ret, suggesting that copy elision is being performed as a part of return value optimization in the absence of a non-trivial copy constructor. My thought is that probably this copy elision is happening in both platforms is not being handled correctly in the aarch64 implementation of the standard.

AMD64 -- implicit `Rot2(const Rot2& r)`
00000000000001f0 <_ZN5gtsam4Rot25atan2Edd>:
 1f0:	48 83 ec 28          	sub    $0x28,%rsp
 1f4:	c5 f1 14 c0          	vunpcklpd %xmm0,%xmm1,%xmm0
 1f8:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
 1ff:	00 00 
 201:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
 206:	31 c0                	xor    %eax,%eax
 208:	48 89 e7             	mov    %rsp,%rdi
 20b:	c5 f8 29 04 24       	vmovaps %xmm0,(%rsp)
 210:	e8 00 00 00 00       	callq  215 <_ZN5gtsam4Rot25atan2Edd+0x25>
 215:	48 8b 10             	mov    (%rax),%rdx
 218:	48 8b 40 08          	mov    0x8(%rax),%rax
 21c:	c4 e1 f9 6e c8       	vmovq  %rax,%xmm1
 221:	48 8b 44 24 18       	mov    0x18(%rsp),%rax
 226:	64 48 33 04 25 28 00 	xor    %fs:0x28,%rax
 22d:	00 00 
 22f:	75 0a                	jne    23b <_ZN5gtsam4Rot25atan2Edd+0x4b>
 231:	c4 e1 f9 6e c2       	vmovq  %rdx,%xmm0
 236:	48 83 c4 28          	add    $0x28,%rsp
 23a:	c3                   	retq   
 23b:	e8 00 00 00 00       	callq  240 <_ZN5gtsam4Rot26ExpmapERKN5Eigen6MatrixIdLi1ELi1ELi0ELi1ELi1EEENS_16OptionalJacobianILi1ELi1EEE>


AMD64 -- user-defined `Rot2(const Rot2& r)`
0000000000000210 <_ZN5gtsam4Rot25atan2Edd>:
 210:	53                   	push   %rbx
 211:	c5 f1 14 c0          	vunpcklpd %xmm0,%xmm1,%xmm0
 215:	48 89 fb             	mov    %rdi,%rbx
 218:	48 83 ec 20          	sub    $0x20,%rsp
 21c:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
 223:	00 00 
 225:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
 22a:	31 c0                	xor    %eax,%eax
 22c:	48 89 e7             	mov    %rsp,%rdi
 22f:	c5 f8 29 04 24       	vmovaps %xmm0,(%rsp)
 234:	e8 00 00 00 00       	callq  239 <_ZN5gtsam4Rot25atan2Edd+0x29>
 239:	48 8b 54 24 18       	mov    0x18(%rsp),%rdx
 23e:	64 48 33 14 25 28 00 	xor    %fs:0x28,%rdx
 245:	00 00 
 247:	c5 f9 10 00          	vmovupd (%rax),%xmm0
 24b:	c5 f8 11 03          	vmovups %xmm0,(%rbx)
 24f:	75 09                	jne    25a <_ZN5gtsam4Rot25atan2Edd+0x4a>
 251:	48 83 c4 20          	add    $0x20,%rsp
 255:	48 89 d8             	mov    %rbx,%rax
 258:	5b                   	pop    %rbx
 259:	c3                   	retq   
 25a:	e8 00 00 00 00       	callq  25f <_ZN5gtsam4Rot25atan2Edd+0x4f>
 25f:	90                   	nop

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 29, 2021

100% of make check passes on Jetson.

I also just re-ran make testNonlinearOptimizer.valgrind and it has nothing interesting to report. Something I was playing around with before finding the real issue must have been somehow cached in a build output somewhere, resulting in those last two dangling memory errors.

Will get that additional disassembly for completeness and then send a PR.

@dellaert
Copy link
Member

wow, awesome!

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 29, 2021

@dellaert Thanks!

Below are the resulting variations of aarch64 instructions for the same function. I mentioned above that I am no expert in assembly, and certainly not in aarch64 flavor, so anyone feel free to correct me, but it does seem to at least share a similar difference pointed out above between the amd64 instruction variations that hint at copy elision: when implicit copy constructor is in use, the instructions stp & str push x29, x30, and x19 onto the stack, but when user-defined copy constructor is there, that str is instead a stp resulting in an additional pushing of x20 onto the stack; the same can be seen of popping using a second ldp instead of ldr.

At this point, I don't think a super deep dive into where the copy eliding version is going wrong is warranted, given that tests are passing in both gtsam and my own project.

ARM64 -- implicit `Rot2(const Rot2& r)`
00000000000001b8 <_ZN5gtsam4Rot25atan2Edd>:
 1b8:	a9bc7bfd 	stp	x29, x30, [sp, #-64]!
 1bc:	910003fd 	mov	x29, sp
 1c0:	f9000bf3 	str	x19, [sp, #16]
 1c4:	90000013 	adrp	x19, 0 <__stack_chk_guard>
 1c8:	9100a3a0 	add	x0, x29, #0x28
 1cc:	f9400273 	ldr	x19, [x19]
 1d0:	6d0283a1 	stp	d1, d0, [x29, #40]
 1d4:	f9400261 	ldr	x1, [x19]
 1d8:	f9001fa1 	str	x1, [x29, #56]
 1dc:	d2800001 	mov	x1, #0x0                   	// #0
 1e0:	94000000 	bl	f0 <_ZN5gtsam4Rot29normalizeEv>
 1e4:	a9400001 	ldp	x1, x0, [x0]
 1e8:	9e670020 	fmov	d0, x1
 1ec:	9e670001 	fmov	d1, x0
 1f0:	f9401fa1 	ldr	x1, [x29, #56]
 1f4:	f9400260 	ldr	x0, [x19]
 1f8:	ca000020 	eor	x0, x1, x0
 1fc:	b5000080 	cbnz	x0, 20c <_ZN5gtsam4Rot25atan2Edd+0x54>
 200:	f9400bf3 	ldr	x19, [sp, #16]
 204:	a8c47bfd 	ldp	x29, x30, [sp], #64
 208:	d65f03c0 	ret
 20c:	94000000 	bl	0 <__stack_chk_fail>

ARM64 -- user-defined `Rot2(const Rot2& r)`
00000000000001c0 <_ZN5gtsam4Rot25atan2Edd>:
 1c0:	a9bc7bfd 	stp	x29, x30, [sp, #-64]!
 1c4:	910003fd 	mov	x29, sp
 1c8:	a90153f3 	stp	x19, x20, [sp, #16]
 1cc:	90000013 	adrp	x19, 0 <__stack_chk_guard>
 1d0:	aa0803f4 	mov	x20, x8
 1d4:	9100a3a0 	add	x0, x29, #0x28
 1d8:	f9400273 	ldr	x19, [x19]
 1dc:	6d0283a1 	stp	d1, d0, [x29, #40]
 1e0:	f9400261 	ldr	x1, [x19]
 1e4:	f9001fa1 	str	x1, [x29, #56]
 1e8:	d2800001 	mov	x1, #0x0                   	// #0
 1ec:	94000000 	bl	f0 <_ZN5gtsam4Rot29normalizeEv>
 1f0:	3dc00000 	ldr	q0, [x0]
 1f4:	f9401fa0 	ldr	x0, [x29, #56]
 1f8:	f9400261 	ldr	x1, [x19]
 1fc:	ca010001 	eor	x1, x0, x1
 200:	3d800280 	str	q0, [x20]
 204:	b50000a1 	cbnz	x1, 218 <_ZN5gtsam4Rot25atan2Edd+0x58>
 208:	aa1403e0 	mov	x0, x20
 20c:	a94153f3 	ldp	x19, x20, [sp, #16]
 210:	a8c47bfd 	ldp	x29, x30, [sp], #64
 214:	d65f03c0 	ret
 218:	94000000 	bl	0 <__stack_chk_fail>
 21c:	d503201f 	nop

@jaelrod
Copy link
Contributor Author

jaelrod commented Jun 29, 2021

#806

(seems I am unable to create the link between issue & PR)

@varunagrawal varunagrawal linked a pull request Aug 1, 2021 that will close this issue
@varunagrawal
Copy link
Collaborator

@jaelrod you can link from the drop down in the right side. Closing since this has been resolved!

@asasine
Copy link

asasine commented Nov 17, 2023

@jaelrod pointed out something interesting in their suggestion to update this line,

Rot2 inverse() const { return Rot2(c_, -s_);}

to:

Rot2 inverse() const { return fromCosSin(c_, -s_);}

The suggested form calls fromCosSin, which normalizes the value (the header indicates it shouldn't, created #1672 for that). The current form calls the Rot2(double, double) constructor, which does not perform normalization.

Given this comment:

I haven't 100% figured out why, but Rot2 inverse() in geometry/Rot2.h was returning an inverse with incorrect heading angle when inverting Pose2 as a part of factorization of BetweenFactor<Pose2>.

There might be some assumptions in BetweenFactor that rely on Rot2::inverse returning a normalized Rot2. Could that be the root cause for this?

I'm unsure how the introduction of a user-defined copy ctor made a difference here. My best guess is similar to jaelrod's: before, the implementation before the PR was performing URVO copy elision (mandatory as of C++17). Now, it no longer qualifies for copy elision per the spec (the return type of Rot2::normalize is Rot2&, which is not the same class type as the Rot2::fromCosSin return value), so the user-defined copy constructor is called.

@ProfFan ProfFan reopened this Nov 17, 2023
@lkskstlr
Copy link

@ProfFan thank you for re-opening this. We are seeing similar behavior in a larger project where GTSAM is only a small part. The code runs normal on x86 and when we port to Jetson Orin NX 16G we get different behavior. Is this bug currently fixed? We run a 3D visual SLAM system, but found this issue to be close enough that we wanted to ask.

Thanks a lot
Lukas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants