Questions about the running results #1

2345as · 2023-05-04T14:27:12Z

Why are the training results of the policy.py file all nan, and then the results of DP.py seem to be unable to be used in the test files simulate.py and env.py

alabatie · 2023-05-06T12:39:45Z

Hi, thank you for using the repo. However, it's difficult to address your issue without further information.

Are you using the dynamic programming approach or the policy network approach? Note that the two approaches are different, and mutually exclusive.

Having NaNs can be the sign of numerical instability (either due to numerical format or optimizer's hyperparameters).

Best,
Antoine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the running results #1

Questions about the running results #1

2345as commented May 4, 2023

alabatie commented May 6, 2023

Questions about the running results #1

Questions about the running results #1

Comments

2345as commented May 4, 2023

alabatie commented May 6, 2023