Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use w/ stable_baselines #33

Open
MartinaRuocco opened this issue Mar 10, 2021 · 1 comment
Open

Use w/ stable_baselines #33

MartinaRuocco opened this issue Mar 10, 2021 · 1 comment

Comments

@MartinaRuocco
Copy link

MartinaRuocco commented Mar 10, 2021

Hi @thomascent,

I've been trying to use your envs with a stable_baselines algo (here's the cleaned-up repository) but I had to do some adjustments in order to make them compatible :
1. normalize and make symmetric action space
2. flatten observation space and action space
3. sum the rewards
4. compress the terminal signal
5. vectorize the environment
also, I had to make a quick fix to the observation space boundaries because the reset() method would return an observation that is outside the observation space (?).
These adjustments were detected via the use of the methods check_env() and set_env() from stable_baselines.

I used the example from the readme, and I tried to train a PPO2 model, with 6e5 training steps but unfortunately, this is the result (the values printed on the terminal are the reward).
I believe that the training affects only one joint and not also the others and as a result, the arm stretches only.
Any idea on how to approach this problem?
Also, you mentioned that you tested your environments using other agents. Can you upload a functioning example (e.g. the TD3 that you mentioned) please?

@MartinaRuocco
Copy link
Author

[UPDATE:]
I tried to use the TD3 algorithm with the from_readme environment, here's my attempt.
Unfortunately, it raises this error:
MemoryError: Unable to allocate 119. GiB for an array with shape (100000, 160003) and data type float64
because the observation_space is too big.

So I tried with the ur_high_5 environment and the error this time is:

(...)
 File "/home/p16325mr/diy-gym/diy_gym/utils.py", line 95, in pop
    return self.arr[self.i - n:self.i]
IndexError: invalid index to scalar variable.

any help is very much appreciated :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant