-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robot Rolling in New Gazebo Environment and TensorBoard Axis Labels Issue #157
Comments
Hi, Please provide more detailed description of your issue. AFAIK there is no such way, but support for that is better asked on tensorboard forums. |
That loss curve is correct for different training methods, but it wouldn't work that way here. See the discussion in: #89 (comment) |
See any issue tagged with label convergence: https://github.com/reiniscimurs/DRL-robot-navigation/issues?q=is%3Aissue+label%3Aconvergence+is%3Aclosed |
I’m encountering an issue where my robot pauses for about 15 to 16 seconds at the end of each episode before the environment resets. I discovered that transformation errors are causing this delay, which in turn increases the training time. I’m unsure how to reduce these errors. |
Could it be that it is other way around? The delay of 13 seconds causes the tf issues due to de-syncing? My only other guess is checking that your gmapping (I suggest using slam_toolbox as it has better slam performance) is not affecting anything here. I would start there as that is the variable that has changed and could have a significant impact on tf tree and transforms. |
Thanks for the advice. I’ll look into |
I've found that the delays at the end of episode terminations are due to the train function taking 30 to 50 seconds to complete before the Gazebo world resets. Could you help me figure out how to reduce this time? |
After every episode we run training. This is the actual deep learning part, before that we simply collect samples. Usually it would take a couple of seconds if you are using cuda so 30 to 50 seconds seems quite a long time to me. However, during this time gazebo is paused and no desync should happen. Have you increased your state size/batch size/ number of training iterations so that 50 seconds would be necessary? Are you using GPU to train and does your computer have enough resources to maintain sync and train at the same time? If I recall correctly you would have to call a slam as a separate service. If you are not pausing the slam service while pausing everything else, this could lead to timing issues. |
I've noticed that the TD3 model tends to rotate excessively near the goal position and struggles to reach it, as if it's getting stuck in a local minimum. How can this behavior be prevented? |
You should provide full description of the problem you are facing. It is difficult to say anything without extra information. |
I have trained the TD3 model with a dense reward function and SLAM on your TD3.world and tested it in a narrow corridor environment. In most cases, it reaches the goal position easily, but it struggles in two situations: TD3_rotation_near_goal.1.mp4I have trained TD3 model for only 14k epochs. I tested it in local minima environments. Its performance in narrow corridors was quite good, but it faced significant difficulty in local minima environments. TD3_local_minima-2024-09-06_10.03.36.online-video-cutter.com.1.mp4Could you explain why the robot rotates in place? Is it because the robot gets stuck in local minima? How can I reduce this issue? how can I generalize this model to work effectively in unseen environments? |
What do you mean with "safe distance around the walls"? We do not specify any safety distance. We do have a term in the reward function (r3) that gives a slight negative reward if a robot is too close to an object. This gives a bit of a potential field around obstacles. However, changing anything in the reward will not do anything in a testing stage as a reward is not used there for anything anymore. It is entirely based on Q-value the estimation which is trained based on the reward during training. The reason for goals not being reached near walls is because the Q-value for going forward in such states is lower than staying still. Think of how the Q-value is learned. Lets assume that robot has collected 100 experiences where it is close to the wall at 1 meter distance. In only one of these experiences there was a goal point between the robot and the wall. If the robot goes forward it will hit the wall 99 times out of 100 and receive a reward of -100 and only once a reward of +100. So your Q value for such a state will be (-100 * 99 + 100 * 1) / 100 = -98. If the robot stands still, it will get a reward of (0 * 100) / 100 = 0. Since 0 > -98, the robot chooses the best Q-value for the state-action pair and chooses to stand still. I am heavily simplifying it here but it should give an overview that it is entirely based on the learned Q-value how the model chooses to solve this issue. So the issue is not that the model is not suitable for unseen environments, as it actually is, and it is solving the problem as it has learned it in an unseen environment. It is just that your scenario is not suitable for the current model and you need to find a way to form a Q-value in such a way as to solve your environment where goals are very close to the walls. |
Which reported reward do you mean here? The average reward at the end of the evaluation? |
yes, I mean the average reward at the end of the evaluation. |
You would have to look at the actual individual values of each reward element. If the accumulated immediate reward is around 0 or even just slightly positive then this would make sense. You w1 termed reward could be around 0 on average. W2 will be 0 or negative. In w3 I don't know what dist_reward is but the term should be negative or else you would be encouraging the robot to stay away from the goal so this term will be negative and it even has a very high weight. No clue what values would be for w4. So looks to me that the immediate reward would not accumulate a lot of positive values over the episode length. Then if you have some collision, then average reward would be below 100. I suggest simply debugging into this and checking the actual values you get. |
I've started training in a different Gazebo environment, but my robot slips and rolls uncontrollably and doesn't reach the goal position. What could be the issue? Also, how can I set the x-axis and y-axis labels for a TensorBoard ?
The text was updated successfully, but these errors were encountered: