You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the example from the readme, and I tried to train a PPO2 model, with 6e5 training steps but unfortunately, this is the result (the values printed on the terminal are the reward).
I believe that the training affects only one joint and not also the others and as a result, the arm stretches only.
Any idea on how to approach this problem?
Also, you mentioned that you tested your environments using other agents. Can you upload a functioning example (e.g. the TD3 that you mentioned) please?
The text was updated successfully, but these errors were encountered:
[UPDATE:]
I tried to use the TD3 algorithm with the from_readme environment, here's my attempt.
Unfortunately, it raises this error: MemoryError: Unable to allocate 119. GiB for an array with shape (100000, 160003) and data type float64
because the observation_space is too big.
So I tried with the ur_high_5 environment and the error this time is:
(...)
File "/home/p16325mr/diy-gym/diy_gym/utils.py", line 95, in pop
return self.arr[self.i - n:self.i]
IndexError: invalid index to scalar variable.
Hi @thomascent,
I've been trying to use your envs with a stable_baselines algo (here's the cleaned-up repository) but I had to do some adjustments in order to make them compatible :
1. normalize and make symmetric action space
2. flatten observation space and action space
3. sum the rewards
4. compress the terminal signal
5. vectorize the environment
also, I had to make a quick fix to the observation space boundaries because the reset() method would return an observation that is outside the observation space (?).
These adjustments were detected via the use of the methods check_env() and set_env() from stable_baselines.
I used the example from the readme, and I tried to train a PPO2 model, with 6e5 training steps but unfortunately, this is the result (the values printed on the terminal are the reward).
I believe that the training affects only one joint and not also the others and as a result, the arm stretches only.
Any idea on how to approach this problem?
Also, you mentioned that you tested your environments using other agents. Can you upload a functioning example (e.g. the TD3 that you mentioned) please?
The text was updated successfully, but these errors were encountered: