Action space, State space, Reward function Locations #118

naa1824 · 2025-02-18T08:40:38Z

sorry to bother you again,

I was studying the project in the last 2 weeks, and I was trying to find the Action space, State space, and Reward function to understand how they work so I can create my own.

But I am getting lost with files,

could you guide me with this?
which files contain these spaces.. ?

I found the class which create the space, but I didn't understand it

is this the action space? env->task->grasp->grasp.py

    def set_action(self, action: Action):

        if self.__preload_replay_buffer:
            action = self._demonstrate_action()

        self.get_logger().debug(f"action: {action}")

        # Gripper action
        gripper_action = action[0]
        if gripper_action < -self.__gripper_dead_zone:
            self.gripper.close()
        elif gripper_action > self.__gripper_dead_zone:
            self.gripper.open()
        else:
            # No-op for the gripper if action is within the dead zone
            pass

        # Motion
        if self._use_servo:
            linear = action[1:4]
            if self._restrict_position_goal_to_workspace:
                linear = self.restrict_servo_translation_to_workspace(linear)
            if self.__full_3d_orientation:
                angular = action[4:7]
            else:
                angular = [0.0, 0.0, action[4]]
            self.servo(linear=linear, angular=angular)
        else:
            position = self.get_relative_ee_position(action[1:4])
            if self.__full_3d_orientation:
                quat_xyzw = self.get_relative_ee_orientation(
                    rotation=action[4:10], representation="6d"
                )
            else:
                quat_xyzw = self.get_relative_ee_orientation(
                    rotation=action[4], representation="z"
                )
            self.moveit2.move_to_pose(position=position, quat_xyzw=quat_xyzw)

It looks too small!!

The text was updated successfully, but these errors were encountered:

AndrejOrsula · 2025-02-18T10:11:55Z

Hello @naa1824,

When taking GraspOctree as an example,

observation space is defined here:

drl_grasping/drl_grasping/envs/tasks/grasp/grasp_octree.py

Lines 79 to 98 in b228354

    
           def create_observation_space(self) -> ObservationSpace: 
        
               # 0:n - octree 
        
               # Note: octree is currently padded with zeros to have constant size 
        
               # TODO: Customize replay buffer to support variable sized observations 
        
               # If enabled, proprieceptive observations will be embedded inside octree in a hacky way 
        
               # (replace with Dict once https://github.com/DLR-RM/stable-baselines3/pull/243 is merged) 
        
               # 0   - (gripper) Gripper state 
        
               #       - 1.0: opened 
        
               #       - -1.0: closed 
        
               # 1:4 - (x, y, z) displacement 
        
               #       - metric units, unbound 
        
               # 4:10 - (v1_x, v1_y, v1_z, v2_x, v2_y, v2_z) 3D orientation in "6D representation" 
        
               #       - normalised 
        
               return gym.spaces.Box( 
        
                   low=0, 
        
                   high=255, 
        
                   shape=(self._octree_n_stacked, self._octree_max_size), 
        
                   dtype=np.uint8, 
        
               )

action space is defined here:

drl_grasping/drl_grasping/envs/tasks/grasp/grasp.py

Lines 55 to 81 in b228354

    
           def create_action_space(self) -> ActionSpace: 
        
               if self.__full_3d_orientation: 
        
                   if self._use_servo: 
        
                       # 0   - (gripper) Gripper action 
        
                       #       - Open if positive (i.e. increase width) 
        
                       #       - Close if negative (i.e. decrease width) 
        
                       # 1:4 - (x, y, z) displacement 
        
                       #       - rescaled to metric units before use 
        
                       # 4:7 - (angular_x, angular_y, angular_z) relative rotation for moveit_servo 
        
                       return gym.spaces.Box(low=-1.0, high=1.0, shape=(7,), dtype=np.float32) 
        
                   else: 
        
                       # 0   - (gripper) Gripper action 
        
                       #       - Open if positive (i.e. increase width) 
        
                       #       - Close if negative (i.e. decrease width) 
        
                       # 1:4 - (x, y, z) displacement 
        
                       #       - rescaled to metric units before use 
        
                       # 4:10 - (v1_x, v1_y, v1_z, v2_x, v2_y, v2_z) relative 3D orientation in "6D representation" 
        
                       return gym.spaces.Box(low=-1.0, high=1.0, shape=(10,), dtype=np.float32) 
        
               else: 
        
                   # 0   - (gripper) Gripper action 
        
                   #       - Open if positive (i.e. increase width) 
        
                   #       - Close if negative (i.e. decrease width) 
        
                   # 1:4 - (x, y, z) displacement 
        
                   #       - rescaled to metric units before use 
        
                   # 4   - (yaw) relative orientation around Z 
        
                   return gym.spaces.Box(low=-1.0, high=1.0, shape=(5,), dtype=np.float32)

and the reward function comes from the Curriculum:

drl_grasping/drl_grasping/envs/tasks/grasp/grasp.py

Lines 220 to 222 in b228354

    
           def get_reward(self) -> Reward: 
        
               return self.curriculum.get_reward()

The function you posted above is for actually setting/applying the actions to the robots.

naa1824 · 2025-02-18T10:33:14Z

I see

then are you using the default SB3 polices like MlpPolicy or did you created your own?

because I could not find it (I mean the policy code)

AndrejOrsula · 2025-02-18T10:49:07Z

All custom policies for the octree observations are located here

AndrejOrsula · 2025-02-18T10:51:16Z

For reference, there is also the Directory Structure described in README.

Directory Structure

.
├── drl_grasping/        # [dir] Primary Python module of this project
│   ├── drl_octree/      # [dir] Submodule for end-to-end learning from 3D octree observations
│   ├── envs/            # [dir] Submodule for environments
│   │   ├── control/     # [dir] Interfaces for the control of agents
│   │   ├── models/      # [dir] Functional models for simulation environments
│   │   ├── perception/  # [dir] Interfaces for the perception of agents
│   │   ├── randomizers/ # [dir] Domain randomization of the simulated environments
│   │   ├── runtimes/    # [dir] Runtime implementations of the task (sim/real)
│   │   ├── tasks/       # [dir] Implementation of tasks
│   │   ├── utils/       # [dir] Environment-specific utilities used across the submodule
│   │   └── worlds/      # [dir] Minimal templates of worlds for simulation environments
│   └── utils/           # [dir] Submodule for training and evaluation scripts boilerplate (using SB3)
├── examples/            # [dir] Examples for training and evaluating RL agents
├── hyperparams/         # [dir] Default hyperparameters for training RL agents
├── launch/              # [dir] ROS 2 launch scripts that can be used to interact with this repository
├── pretrained_agents/   # [dir] Collection of pre-trained agents
├── rviz/                # [dir] RViz2 config for visualization
├── scripts/             # [dir] Helpful scripts for training, evaluation and other utilities
├── CMakeLists.txt       # Colcon-enabled CMake recipe
└── package.xml          # ROS 2 package metadata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Action space, State space, Reward function Locations #118

Action space, State space, Reward function Locations #118

naa1824 commented Feb 18, 2025

AndrejOrsula commented Feb 18, 2025

naa1824 commented Feb 18, 2025

AndrejOrsula commented Feb 18, 2025

AndrejOrsula commented Feb 18, 2025 •

edited

Loading

Action space, State space, Reward function Locations #118

Action space, State space, Reward function Locations #118

Comments

naa1824 commented Feb 18, 2025

AndrejOrsula commented Feb 18, 2025

naa1824 commented Feb 18, 2025

AndrejOrsula commented Feb 18, 2025

AndrejOrsula commented Feb 18, 2025 • edited Loading

Directory Structure

AndrejOrsula commented Feb 18, 2025 •

edited

Loading