Stage is now include in name of model directory.

tomasvr · May 1, 2023 · 8fe4bd3 · 8fe4bd3
1 parent afbc0d0
commit 8fe4bd3
Show file tree

Hide file tree

Showing 37 changed files with 51 additions and 46 deletions.
diff --git a/README.md b/README.md
@@ -196,18 +196,18 @@ You should see the gazebo GUI come up with the robot model loaded and two moving
 
 In a second terminal run
 ```
-ros2 run turtlebot3_drl drl_gazebo
+ros2 run turtlebot3_drl gazebo_goals
 ```
 
 In a third terminal run
 ```
-ros2 run turtlebot3_drl drl_environment
+ros2 run turtlebot3_drl environment
 ```
 
 And lastly, in the fourth terminal run the ddpg agent
 For DDPG:
 ```
-ros2 run turtlebot3_drl drl_agent ddpg 1
+ros2 run turtlebot3_drl train_agent ddpg
 ```
 
 The first argument indicates whether we are testing or training (0 = testing, 1 = training)
@@ -218,12 +218,12 @@ The first argument indicates whether we are testing or training (0 = testing, 1
 
 for TD3:
 ```
-ros2 run turtlebot3_drl drl_agent td3 1
+ros2 run turtlebot3_drl train_agent td3
 ```
 
 for DQN:
 ```
-ros2 run turtlebot3_drl drl_agent dqn 1
+ros2 run turtlebot3_drl train_agent dqn
 ```
 
 Your robot should now be moving and training progress is being printed to the terminals!
@@ -239,21 +239,16 @@ The current state of the agent (weights, parameters, replay buffer and graphs) w
 In order to load a model for testing (e.g. ddpg_0 at episode 500) the following command should be used:
 
 ```
-ros2 run turtlebot3_drl drl_agent ddpg 0 "ddpg_0" 500
+ros2 run turtlebot3_drl test_agent ddpg "ddpg_0" 500
 ```
 
 In order to load a model to continue training (e.g. ddpg_0 at episode 500) the following command should be used:
 
 ```
-ros2 run turtlebot3_drl drl_agent ddpg 1 "ddpg_0" 500
+ros2 run turtlebot3_drl train_agent ddpg "ddpg_0" 500
 ```
 
-**Note:** If you are loading a model on a different stage than it was trained on (e.g. for transfer learning or testing generalizabilty) you have to add a 4th argument specifying the current stage. For example, model ddpg_0 which was trained on stage 4 can be evaluated in stage 3 using the following command
-```
-ros2 run turtlebot3_drl drl_agent ddpg 0 "ddpg_0" 500 3
-```
-
-(the original training stage is specified in training logfile (e.g _train_**stage2**_*.txt)
+**Note:** You can also test (or continue training) a model on a different stage than where it was originally trained on.
 
 ### Loading one of the included example models
 
@@ -266,28 +261,26 @@ ros2 launch turtlebot3_gazebo turtlebot3_drl_stage9.launch.py
 
 Terminal 2:
 ```
-ros2 run turtlebot3_drl drl_gazebo
+ros2 run turtlebot3_drl gazebo_goals
 ```
 
 Terminal 3:
 ```
-ros2 run turtlebot3_drl drl_environment
+ros2 run turtlebot3_drl environment
 ```
 
 Terminal 4:
 For DDPG:
 ```
-ros2 run turtlebot3_drl drl_agent ddpg 0 'examples/ddpg_0' 8000
+ros2 run turtlebot3_drl test_agent ddpg 'examples/ddpg_0' 8000
 ```
 
 Or, for TD3
 ```
-ros2 run turtlebot3_drl drl_agent td3 0 'examples/td3_0' 7400
+ros2 run turtlebot3_drl test_agent td3 'examples/td3_0' 7400
 ```
 
-The pretrained model should then start to navigate successfully.
-
-Note: Do not include 'examples/' in the command when running models trained on your own machine.
+You should then see the example model navigate successfully towards the goal
 
 ### Switching environments
 
@@ -356,15 +349,22 @@ The visual should mainly be used during evaluation as it can slow down training
 
 ## Command Specification
 
-**drl_agent:**
+**train_agent:**
+
+```ros2 run turtlebot3_drl train_agent [algorithm=dqn/ddpg/td3] [loadmodel=\path\to\model] [loadepisode=episode] ```
+
+* `algorithm`: algorithm to run, one of either: `dqn`, `ddpg`, `td3`
+* `modelpath`: path to the model to be loaded to continue training
+* `loadepisode`: is the episode to load from `modelpath`
+
+**test_agent:**
+
+```ros2 run turtlebot3_drl test_agent [algorithm=dqn/ddpg/td3] [loadmodel=\path\to\model] [loadepisode=episode] ```
 
-```ros2 run turtlebot3_drl drl_agent [algorithm=dqn/ddpg/td3] [mode=0/1] [loadmodel=\path\to\model] [loadepisode=episode] [trainingstage=stage]```
+* `algorithm`: algorithm to run, one of either: `dqn`, `ddpg`, `td3`
+* `modelpath`: path to model to be loaded for testing
+* `loadepisode`: is the episode to load from `modelpath`
 
-`algorithm` can be either: `dqn`, `ddpg`, `td3`
-`mode` is either: `0` (training) or `1` (evaluating)
-`modelpath` is the path to the model to load
-`loadepisode` is the episode to load from `modelpath`
-`trainingstage` is the original training stage of `modelpath` (if different from current stage)
 
 ## Physical Robot
 

diff --git a/...ot3_drl/model/examples/ddpg_0/_figure.png → .../model/examples/ddpg_0_stage9/_figure.png b/...ot3_drl/model/examples/ddpg_0/_figure.png → .../model/examples/ddpg_0_stage9/_figure.png
diff --git a/...s/ddpg_0/_hyperparams_20220808-030634.txt → ...0_stage9/_hyperparams_20220808-030634.txt b/...s/ddpg_0/_hyperparams_20220808-030634.txt → ...0_stage9/_hyperparams_20220808-030634.txt
diff --git a/src/turtlebot3_drl/model/examples/ddpg_0_stage9/_test_stage1_eps8000_20230501-183843.txt b/src/turtlebot3_drl/model/examples/ddpg_0_stage9/_test_stage1_eps8000_20230501-183843.txt
@@ -0,0 +1,3 @@
+episode, outcome, step, episode_duration, distance, s/cw/co/t
+1, 1, 1974, 1.432716391998838, 5.1544623374938965, 1/0/0/0/0
+2, 1, 11785, 8.245651064997219, 1.9160521030426025, 2/0/0/0/0
diff --git a/.../_test_stage4_eps8000_20221214-143249.txt → .../_test_stage4_eps8000_20221214-143249.txt b/.../_test_stage4_eps8000_20221214-143249.txt → .../_test_stage4_eps8000_20221214-143249.txt
diff --git a/.../_test_stage8_eps8000_20221214-134026.txt → .../_test_stage8_eps8000_20221214-134026.txt b/.../_test_stage8_eps8000_20221214-134026.txt → .../_test_stage8_eps8000_20221214-134026.txt
diff --git a/.../_test_stage8_eps8000_20221214-134411.txt → .../_test_stage8_eps8000_20221214-134411.txt b/.../_test_stage8_eps8000_20221214-134411.txt → .../_test_stage8_eps8000_20221214-134411.txt
diff --git a/.../_test_stage8_eps8000_20221214-134609.txt → .../_test_stage8_eps8000_20221214-134609.txt b/.../_test_stage8_eps8000_20221214-134609.txt → .../_test_stage8_eps8000_20221214-134609.txt
diff --git a/.../_test_stage8_eps8000_20221214-135143.txt → .../_test_stage8_eps8000_20221214-135143.txt b/.../_test_stage8_eps8000_20221214-135143.txt → .../_test_stage8_eps8000_20221214-135143.txt
diff --git a/.../_test_stage9_eps8000_20221214-135936.txt → .../_test_stage9_eps8000_20221214-135936.txt b/.../_test_stage9_eps8000_20221214-135936.txt → .../_test_stage9_eps8000_20221214-135936.txt
diff --git a/.../_test_stage9_eps8000_20221214-140423.txt → .../_test_stage9_eps8000_20221214-140423.txt b/.../_test_stage9_eps8000_20221214-140423.txt → .../_test_stage9_eps8000_20221214-140423.txt
diff --git a/.../ddpg_0/_train_stage9_20220808-030634.txt → ..._stage9/_train_stage9_20220808-030634.txt b/.../ddpg_0/_train_stage9_20220808-030634.txt → ..._stage9/_train_stage9_20220808-030634.txt
diff --git a/...amples/ddpg_0/actor_stage9_episode8000.pt → ...ddpg_0_stage9/actor_stage9_episode8000.pt b/...amples/ddpg_0/actor_stage9_episode8000.pt → ...ddpg_0_stage9/actor_stage9_episode8000.pt
diff --git a/...mples/ddpg_0/critic_stage9_episode8000.pt → ...dpg_0_stage9/critic_stage9_episode8000.pt b/...mples/ddpg_0/critic_stage9_episode8000.pt → ...dpg_0_stage9/critic_stage9_episode8000.pt
diff --git a/...rl/model/examples/ddpg_0/stage9_agent.pkl → ...l/examples/ddpg_0_stage9/stage9_agent.pkl b/...rl/model/examples/ddpg_0/stage9_agent.pkl → ...l/examples/ddpg_0_stage9/stage9_agent.pkl
diff --git a/...el/examples/ddpg_0/stage9_episode8000.pkl → ...ples/ddpg_0_stage9/stage9_episode8000.pkl b/...el/examples/ddpg_0/stage9_episode8000.pkl → ...ples/ddpg_0_stage9/stage9_episode8000.pkl
diff --git a/...ddpg_0/target_actor_stage9_episode8000.pt → ...stage9/target_actor_stage9_episode8000.pt b/...ddpg_0/target_actor_stage9_episode8000.pt → ...stage9/target_actor_stage9_episode8000.pt
diff --git a/...dpg_0/target_critic_stage9_episode8000.pt → ...tage9/target_critic_stage9_episode8000.pt b/...dpg_0/target_critic_stage9_episode8000.pt → ...tage9/target_critic_stage9_episode8000.pt
diff --git a/...bot3_drl/model/examples/td3_0/_figure.png → ...l/model/examples/td3_0_stage9/_figure.png b/...bot3_drl/model/examples/td3_0/_figure.png → ...l/model/examples/td3_0_stage9/_figure.png
diff --git a/...es/td3_0/_hyperparams_20221003-130835.txt → ...0_stage9/_hyperparams_20221003-130835.txt b/...es/td3_0/_hyperparams_20221003-130835.txt → ...0_stage9/_hyperparams_20221003-130835.txt
diff --git a/.../_test_stage4_eps7400_20221214-153620.txt → .../_test_stage4_eps7400_20221214-153620.txt b/.../_test_stage4_eps7400_20221214-153620.txt → .../_test_stage4_eps7400_20221214-153620.txt
diff --git a/.../_test_stage8_eps7400_20221214-135334.txt → .../_test_stage8_eps7400_20221214-135334.txt b/.../_test_stage8_eps7400_20221214-135334.txt → .../_test_stage8_eps7400_20221214-135334.txt
diff --git a/.../_test_stage8_eps7400_20221214-135415.txt → .../_test_stage8_eps7400_20221214-135415.txt b/.../_test_stage8_eps7400_20221214-135415.txt → .../_test_stage8_eps7400_20221214-135415.txt
diff --git a/.../_test_stage8_eps7400_20221214-135517.txt → .../_test_stage8_eps7400_20221214-135517.txt b/.../_test_stage8_eps7400_20221214-135517.txt → .../_test_stage8_eps7400_20221214-135517.txt
diff --git a/.../_test_stage8_eps7400_20221214-135530.txt → .../_test_stage8_eps7400_20221214-135530.txt b/.../_test_stage8_eps7400_20221214-135530.txt → .../_test_stage8_eps7400_20221214-135530.txt
diff --git a/...s/td3_0/_train_stage9_20221003-130835.txt → ..._stage9/_train_stage9_20221003-130835.txt b/...s/td3_0/_train_stage9_20221003-130835.txt → ..._stage9/_train_stage9_20221003-130835.txt
diff --git a/...xamples/td3_0/actor_stage9_episode7400.pt → .../td3_0_stage9/actor_stage9_episode7400.pt b/...xamples/td3_0/actor_stage9_episode7400.pt → .../td3_0_stage9/actor_stage9_episode7400.pt
diff --git a/...amples/td3_0/critic_stage9_episode7100.pt → ...td3_0_stage9/critic_stage9_episode7100.pt b/...amples/td3_0/critic_stage9_episode7100.pt → ...td3_0_stage9/critic_stage9_episode7100.pt
diff --git a/...amples/td3_0/critic_stage9_episode7400.pt → ...td3_0_stage9/critic_stage9_episode7400.pt b/...amples/td3_0/critic_stage9_episode7400.pt → ...td3_0_stage9/critic_stage9_episode7400.pt
diff --git a/...drl/model/examples/td3_0/stage9_agent.pkl → ...el/examples/td3_0_stage9/stage9_agent.pkl b/...drl/model/examples/td3_0/stage9_agent.pkl → ...el/examples/td3_0_stage9/stage9_agent.pkl
diff --git a/...del/examples/td3_0/stage9_episode7400.pkl → ...mples/td3_0_stage9/stage9_episode7400.pkl b/...del/examples/td3_0/stage9_episode7400.pkl → ...mples/td3_0_stage9/stage9_episode7400.pkl
diff --git a/.../td3_0/target_actor_stage9_episode7400.pt → ...stage9/target_actor_stage9_episode7400.pt b/.../td3_0/target_actor_stage9_episode7400.pt → ...stage9/target_actor_stage9_episode7400.pt
diff --git a/...td3_0/target_critic_stage9_episode7400.pt → ...tage9/target_critic_stage9_episode7400.pt b/...td3_0/target_critic_stage9_episode7400.pt → ...tage9/target_critic_stage9_episode7400.pt
diff --git a/src/turtlebot3_drl/turtlebot3_drl/common/settings.py b/src/turtlebot3_drl/turtlebot3_drl/common/settings.py
@@ -3,7 +3,7 @@
 ENABLE_STACKING          = False
 ENABLE_VISUAL            = False    # Meant to be used only during evaluation/testing phase
 ENABLE_TRUE_RANDOM_GOALS = False    # If false, goals are taken randomly from a list of known valid goal positions
-MODEL_STORE_INTERVAL = 100          # Store the model weights every N episodes
+MODEL_STORE_INTERVAL = 3          # Store the model weights every N episodes
 
 # DRL parameters
 ACTION_SIZE     = 2         # Not used for DQN, see DQN_ACTION_SIZE
@@ -15,7 +15,7 @@
 LEARNING_RATE   = 0.003
 TAU             = 0.003
 
-OBSERVE_STEPS   = 25000     # At training start random actions are taken for N steps for better exploration
+OBSERVE_STEPS   = 0     # At training start random actions are taken for N steps for better exploration
 STEP_TIME       = 0.01      # Delay between steps, can be set to 0
 EPSILON_DECAY   = 0.9995    # Epsilon decay per step
 EPSILON_MINIMUM = 0.05

diff --git a/src/turtlebot3_drl/turtlebot3_drl/common/storagemanager.py b/src/turtlebot3_drl/turtlebot3_drl/common/storagemanager.py
@@ -6,7 +6,7 @@
 import torch
 
 class StorageManager:
-    def __init__(self, name, stage, load_session, load_episode, device):
+    def __init__(self, name, load_session, load_episode, device, stage):
         if load_session and name not in load_session:
             print(f"ERROR: wrong combination of command and model! make sure command is: {name}_agent")
             while True:
@@ -15,18 +15,18 @@ def __init__(self, name, stage, load_session, load_episode, device):
         if 'examples' in load_session:
             self.machine_dir = (os.getenv('DRLNAV_BASE_PATH') + '/src/turtlebot3_drl/model/')
         self.name = name
-        self.stage = stage
+        self.stage = load_session[-1] if load_session else stage
         self.session = load_session
         self.load_episode = load_episode
         self.session_dir = os.path.join(self.machine_dir, self.session)
         self.map_location = device
 
-    def new_session_dir(self):
+    def new_session_dir(self, stage):
         i = 0
-        session_dir = os.path.join(self.machine_dir, f"{self.name}_{i}")
+        session_dir = os.path.join(self.machine_dir, f"{self.name}_{i}_stage{stage}")
         while(os.path.exists(session_dir)):
             i += 1
-            session_dir = os.path.join(self.machine_dir, f"{self.name}_{i}")
+            session_dir = os.path.join(self.machine_dir, f"{self.name}_{i}_stage{stage}")
         self.session = f"{self.name}_{i}"
         print(f"making new model dir: {self.session}")
         os.makedirs(session_dir)

diff --git a/src/turtlebot3_drl/turtlebot3_drl/common/utilities.py b/src/turtlebot3_drl/turtlebot3_drl/common/utilities.py
@@ -12,7 +12,7 @@
 import xml.etree.ElementTree as ET
 
 with open('/tmp/drlnav_current_stage.txt', 'r') as f:
-    test_stage = int(f.read())
+    stage = int(f.read())
 
 def check_gpu():
     print("gpu torch available: ", torch.cuda.is_available())

diff --git a/src/turtlebot3_drl/turtlebot3_drl/drl_agent/drl_agent.py b/src/turtlebot3_drl/turtlebot3_drl/drl_agent/drl_agent.py
@@ -43,19 +43,17 @@
 from ..common.replaybuffer import ReplayBuffer
 
 class DrlAgent(Node):
-    def __init__(self, training, algorithm, load_session="", load_episode=0, train_stage=util.test_stage):
+    def __init__(self, training, algorithm, load_session="", load_episode=0):
         super().__init__(algorithm + '_agent')
         self.algorithm = algorithm
         self.training = int(training)
         self.load_session = load_session
         self.episode = int(load_episode)
-        self.train_stage = train_stage
         if (not self.training and not self.load_session):
-            quit("ERROR no test agent specified")
+            quit("Invalid command: Testing but no model to load specified (example format: ros2 run turtlebot3_drl test_agent ddpg ddpg_0_stage4 1)")
         self.device = util.check_gpu()
-        self.sim_speed = util.get_simulation_speed(self.train_stage)
-        print(f"{'training' if (self.training) else 'testing' } on stage: {util.test_stage}")
-
+        self.sim_speed = util.get_simulation_speed(util.stage)
+        print(f"{'training' if (self.training) else 'testing' } on stage: {util.stage}")
         self.total_steps = 0
         self.observe_steps = OBSERVE_STEPS
 
@@ -66,7 +64,7 @@ def __init__(self, training, algorithm, load_session="", load_episode=0, train_s
         elif self.algorithm == 'td3':
             self.model = TD3(self.device, self.sim_speed)
         else:
-            quit(f"invalid algorithm specified: {self.algorithm}, chose one of: ddpg, td3, td3conv")
+            quit(f"invalid algorithm specified: {self.algorithm}, choose one of: dqn, ddpg, td3")
 
         self.replay_buffer = ReplayBuffer(self.model.buffer_size)
         self.graph = Graph()
@@ -75,24 +73,24 @@ def __init__(self, training, algorithm, load_session="", load_episode=0, train_s
         #                             Model loading                             #
         # ===================================================================== #
 
-        self.sm = StorageManager(self.algorithm, self.train_stage, self.load_session, self.episode, self.device)
+        self.sm = StorageManager(self.algorithm, self.load_session, self.episode, self.device, util.stage)
 
         if self.load_session:
             del self.model
             self.model = self.sm.load_model()
             self.model.device = self.device
             self.sm.load_weights(self.model.networks)
             if self.training:
-                self.replay_buffer.buffer = self.sm.load_replay_buffer(self.model.buffer_size, os.path.join(self.load_session, 'stage'+str(self.train_stage)+'_latest_buffer.pkl'))
+                self.replay_buffer.buffer = self.sm.load_replay_buffer(self.model.buffer_size, os.path.join(self.load_session, 'stage'+str(self.sm.stage)+'_latest_buffer.pkl'))
             self.total_steps = self.graph.set_graphdata(self.sm.load_graphdata(), self.episode)
             print(f"global steps: {self.total_steps}")
             print(f"loaded model {self.load_session} (eps {self.episode}): {self.model.get_model_parameters()}")
         else:
-            self.sm.new_session_dir(util.test_stage)
+            self.sm.new_session_dir(util.stage)
             self.sm.store_model(self.model)
 
         self.graph.session_dir = self.sm.session_dir
-        self.logger = Logger(self.training, self.sm.machine_dir, self.sm.session_dir, self.sm.session, self.model.get_model_parameters(), self.model.get_model_configuration(), str(util.test_stage), self.algorithm, self.episode)
+        self.logger = Logger(self.training, self.sm.machine_dir, self.sm.session_dir, self.sm.session, self.model.get_model_parameters(), self.model.get_model_configuration(), str(util.stage), self.algorithm, self.episode)
         if ENABLE_VISUAL:
             self.visual = DrlVisual(self.model.state_size, self.model.hidden_size)
             self.model.attach_visual(self.visual)
@@ -202,5 +200,9 @@ def main_test(args=sys.argv[1:]):
     args = ['0'] + args
     main(args)
 
+def main_real(args=sys.argv[1:]):
+    args = ['0'] + args
+    main(args)
+
 if __name__ == '__main__':
     main()