Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple ROS-based simulator (using RVIZ for visualization) #504

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

whoenig
Copy link
Contributor

@whoenig whoenig commented Oct 15, 2021

The re-base somehow messed up the diff, so re-opening a new PR here (see #350 for earlier discussion)

This is a much improved version, with:

  • A ROS way to simulate existing scripts using roslaunch crazyswarm hover_swarm.launch sim:=True
  • More features (i.e., broadcast functions; trajectory support)
  • A much more feature-complete, yet simple example (example_distributed.{py,launch}
  • Some more code clean-up, integrating @jpreiss earlier feedback

What's missing:

  • Documentation updates: Where/How should we mention this? I think the 'old' way should still be the default, because it is much more lightweight. Perhaps we can add a new tutorial for distributed computation that explains this special mode?
  • Python2 support? Not sure how important that is, given that Python2 reached EOL

whoenig added 4 commits March 29, 2021 21:46
Usage:
Terminal 1: roslaunch crazyswarm sim.launch
Terminal 2: python3 example_distributed.py
Usage:
Terminal 1: roslaunch crazyswarm sim.launch
Terminal 2: python3 example_distributed.py
* Improved example_distributed
* The new ROS simulator can be used with

$ roslaunch crazyswarm hover_swarm.launch sim:=True
$ (python-script-without --sim-argument)

* Implements some missing features
@whoenig whoenig requested a review from jpreiss October 15, 2021 15:46
Copy link
Collaborator

@jpreiss jpreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality looks good! Design-wise, I am not sold on the use of inheritance. I think composition would be more clear. Is there some place I am missing where inheritance is making the code much simpler?

ros_ws/src/crazyswarm/scripts/example_distributed.py Outdated Show resolved Hide resolved
ros_ws/src/crazyswarm/scripts/sim.py Outdated Show resolved Hide resolved
ros_ws/src/crazyswarm/scripts/sim.py Outdated Show resolved Hide resolved
ros_ws/src/crazyswarm/scripts/sim.py Outdated Show resolved Hide resolved
ros_ws/src/crazyswarm/scripts/sim.py Outdated Show resolved Hide resolved
ros_ws/src/crazyswarm/launch/hover_swarm.launch Outdated Show resolved Hide resolved
ros_ws/src/crazyswarm/scripts/sim.py Outdated Show resolved Hide resolved
ros_ws/src/crazyswarm/scripts/sim.py Outdated Show resolved Hide resolved
@whoenig
Copy link
Contributor Author

whoenig commented Oct 28, 2021

Thanks - good point on inheritance vs. composition. I didn't actively think about it and I do agree that it would be worth the (small) effort to change it to make the code more readable. I'll update the PR and let you know once I am done for a second round.

This also addresses some other minor points raised by jpreiss in
#504
@whoenig whoenig requested a review from jpreiss October 30, 2021 17:08
@jpreiss
Copy link
Collaborator

jpreiss commented Oct 30, 2021

Thanks, this looks good to me.

Re. documentation, I think this is a big feature that users might want for many reasons. If it's only mentioned in a distributed systems tutorial, some users might not be aware of it even though it would help them. I wonder if we need to add a top-level reference section or "system overview" where we can put reference-like material that is not part of the Python API?

Re. Python2, are we using some Python3 feature that would be annoying to give up? I don't see any f-strings... If there is no real cost, I feel we might as well continue supporting it. If it becomes a burden I agree we can start introducing Py3-only features.

Copy link
Collaborator

@jpreiss jpreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates for composition and other nits look good. Still a few points for discussion before merging

@jpreiss
Copy link
Collaborator

jpreiss commented Oct 30, 2021

This feature opens up the opportunity to test the ROS side of pycrazyswarm in CI. However, the unit tests currently force running in --sim. I will think about how to redesign the tests so they can run in this mode.

Adding the ability to run unit tests in this mode would also test the new code introduced in this PR, so I think it is worth the effort to get it working before merging into master.

whoenig added a commit that referenced this pull request Nov 5, 2021
This can be used as foundation to add more documentation for PR #504
whoenig added a commit that referenced this pull request Nov 6, 2021
…nts (#523)

* Add overview page in the documentation outlining the software components

This can be used as foundation to add more documentation for PR #504
@whoenig
Copy link
Contributor Author

whoenig commented Nov 6, 2021

The PR is updated and based on the latest master branch. This new version now also includes documentation for this new feature.

Re Python versions: I don't really use any specific Python3 features. However, I don't have Python2 installed and it did reach its end-of-life. So I'd vote to either discontinue support, or have it somehow part of the CI so that we don't manually have to test if we break it.

Re unit tests: Agreed - this would be nice. Could be either part of this PR, or a separate effort.

@jpreiss
Copy link
Collaborator

jpreiss commented Nov 7, 2021

I implemented testing on https://github.com/USC-ACTLab/crazyswarm/tree/dev-sim-ros-testing-rebase. Please merge into this branch if you agree with the design.

Currently a lot of tests are failing. There are many possible reasons for test failures:

  • tests that can only work in the pure-python simulator
  • tests with assumptions that are now invalid but can be fixed, e.g. assuming time starts at 0
  • tests where the crazyfliesim.py and crazyflie.py APIs are not consistent
  • errors in the ROS simulator itself
  • errors in the test fixture that spawns the ROS sim:=1 server

We also get a complete hang in Ubuntu 18 / Python 2, but fixing that is probably straightforward compared to the test failures.

I am not at a ROS PC right now but I can debug the tests in the next day or two. If any failures require significant engineering to fix, we can just use pytest marking to skip them in the ROS simulation for now, and fix them in separate commits.

@jpreiss
Copy link
Collaborator

jpreiss commented Nov 8, 2021

something odd is happening with the ROS tests.

When I run a single test like python3 -m pytest --simros -k test_goTo_relative it passes.

But if I run its whole containing file like python3 -m pytest --simros test_highLevel.py it fails when trying to look up the CF's transform.

I try to kill the whole ROS process tree in the test fixture, but it seems like something is persisting. Or, there is something persisting in the client (pytest) process.

@whoenig
Copy link
Contributor Author

whoenig commented Nov 8, 2021

I wonder if this is some strange ROS1 bug. From the logs in ~/.ros and also when running the tests with -s, I can see that the ROS master and all child processes actually get killed properly. By adding prints (and running with -s), I can also see that the TF broadcaster works correctly. Yet, the receiving side doesn't get any data...

@whoenig
Copy link
Contributor Author

whoenig commented Nov 8, 2021

Switching from tf to tf2 fixes those issues. There are still plenty of failures left, most of them seem to be weirdly numerical mismatches. Perhaps a delay caused by tf?

@jpreiss
Copy link
Collaborator

jpreiss commented Nov 8, 2021

yes it seems like the non-blocking nature of Crazyflie.position() might be causing problems.

@whoenig
Copy link
Contributor Author

whoenig commented Nov 8, 2021

It is blocking in the sense that it waits until a transform is available and then returns the latest one. It would be possible to also pass in the current timestamp, but that might just add latency in the position() call.

@jpreiss
Copy link
Collaborator

jpreiss commented Nov 8, 2021

Yes, I think some of the test failures happen because the timeHelper.sleep call is exactly the duration of the high level command, so any latency between the simulator node and the test process causes position() to return a position from the middle of the trajectory instead of the very end.

We could fix this by adding a little extra time to all sleep calls, but that could still lead to flaky tests. The cleaner thing would be to use simulated time and add a blocking option to position()...

@jpreiss
Copy link
Collaborator

jpreiss commented Nov 9, 2021

I skipped some tests (notably all collisionAvoidance) and modified some other ones to accommodate ROS. Saving simulator time for later. Now the test suite has no failures on my Ubuntu 20 machine, but it's still failing in CI...

@jpreiss
Copy link
Collaborator

jpreiss commented Nov 9, 2021

When using the debug shell on the Github runner, I was able to make the tests pass by increasing the wait timeout from 10s to 100s. This was quite strange because the wait call was only blocking for about 1s anyway. But it doesn't work in the real CI. I think everything is flaky, success/fail has something to do with the OS scheduler. We are not reliably terminating processes.

@whoenig
Copy link
Contributor Author

whoenig commented Nov 9, 2021

The wait just blocks until the process is killed, which takes on my machine less than 1s. According to the ROS logs, all the processes are terminated and relaunched (on a local machine that is). Perhaps we can run the tests in CI with the -s flag to see what's happening in which order?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants