Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action space, State space, Reward function Locations #118

Open
naa1824 opened this issue Feb 18, 2025 · 4 comments
Open

Action space, State space, Reward function Locations #118

naa1824 opened this issue Feb 18, 2025 · 4 comments

Comments

@naa1824
Copy link

naa1824 commented Feb 18, 2025

Hello @AndrejOrsula

sorry to bother you again,

I was studying the project in the last 2 weeks, and I was trying to find the Action space, State space, and Reward function to understand how they work so I can create my own.

But I am getting lost with files,

could you guide me with this?
which files contain these spaces.. ?

I found the class which create the space, but I didn't understand it

is this the action space? env->task->grasp->grasp.py

    def set_action(self, action: Action):

        if self.__preload_replay_buffer:
            action = self._demonstrate_action()

        self.get_logger().debug(f"action: {action}")

        # Gripper action
        gripper_action = action[0]
        if gripper_action < -self.__gripper_dead_zone:
            self.gripper.close()
        elif gripper_action > self.__gripper_dead_zone:
            self.gripper.open()
        else:
            # No-op for the gripper if action is within the dead zone
            pass

        # Motion
        if self._use_servo:
            linear = action[1:4]
            if self._restrict_position_goal_to_workspace:
                linear = self.restrict_servo_translation_to_workspace(linear)
            if self.__full_3d_orientation:
                angular = action[4:7]
            else:
                angular = [0.0, 0.0, action[4]]
            self.servo(linear=linear, angular=angular)
        else:
            position = self.get_relative_ee_position(action[1:4])
            if self.__full_3d_orientation:
                quat_xyzw = self.get_relative_ee_orientation(
                    rotation=action[4:10], representation="6d"
                )
            else:
                quat_xyzw = self.get_relative_ee_orientation(
                    rotation=action[4], representation="z"
                )
            self.moveit2.move_to_pose(position=position, quat_xyzw=quat_xyzw)

It looks too small!!

@AndrejOrsula
Copy link
Owner

Hello @naa1824,

When taking GraspOctree as an example,

observation space is defined here:

def create_observation_space(self) -> ObservationSpace:
# 0:n - octree
# Note: octree is currently padded with zeros to have constant size
# TODO: Customize replay buffer to support variable sized observations
# If enabled, proprieceptive observations will be embedded inside octree in a hacky way
# (replace with Dict once https://github.com/DLR-RM/stable-baselines3/pull/243 is merged)
# 0 - (gripper) Gripper state
# - 1.0: opened
# - -1.0: closed
# 1:4 - (x, y, z) displacement
# - metric units, unbound
# 4:10 - (v1_x, v1_y, v1_z, v2_x, v2_y, v2_z) 3D orientation in "6D representation"
# - normalised
return gym.spaces.Box(
low=0,
high=255,
shape=(self._octree_n_stacked, self._octree_max_size),
dtype=np.uint8,
)

action space is defined here:

def create_action_space(self) -> ActionSpace:
if self.__full_3d_orientation:
if self._use_servo:
# 0 - (gripper) Gripper action
# - Open if positive (i.e. increase width)
# - Close if negative (i.e. decrease width)
# 1:4 - (x, y, z) displacement
# - rescaled to metric units before use
# 4:7 - (angular_x, angular_y, angular_z) relative rotation for moveit_servo
return gym.spaces.Box(low=-1.0, high=1.0, shape=(7,), dtype=np.float32)
else:
# 0 - (gripper) Gripper action
# - Open if positive (i.e. increase width)
# - Close if negative (i.e. decrease width)
# 1:4 - (x, y, z) displacement
# - rescaled to metric units before use
# 4:10 - (v1_x, v1_y, v1_z, v2_x, v2_y, v2_z) relative 3D orientation in "6D representation"
return gym.spaces.Box(low=-1.0, high=1.0, shape=(10,), dtype=np.float32)
else:
# 0 - (gripper) Gripper action
# - Open if positive (i.e. increase width)
# - Close if negative (i.e. decrease width)
# 1:4 - (x, y, z) displacement
# - rescaled to metric units before use
# 4 - (yaw) relative orientation around Z
return gym.spaces.Box(low=-1.0, high=1.0, shape=(5,), dtype=np.float32)

and the reward function comes from the Curriculum:

def get_reward(self) -> Reward:
return self.curriculum.get_reward()


The function you posted above is for actually setting/applying the actions to the robots.

@naa1824
Copy link
Author

naa1824 commented Feb 18, 2025

I see

then are you using the default SB3 polices like MlpPolicy or did you created your own?

because I could not find it (I mean the policy code)

@AndrejOrsula
Copy link
Owner

All custom policies for the octree observations are located here

@AndrejOrsula
Copy link
Owner

AndrejOrsula commented Feb 18, 2025

For reference, there is also the Directory Structure described in README.

Directory Structure

.
├── drl_grasping/        # [dir] Primary Python module of this project
│   ├── drl_octree/      # [dir] Submodule for end-to-end learning from 3D octree observations
│   ├── envs/            # [dir] Submodule for environments
│   │   ├── control/     # [dir] Interfaces for the control of agents
│   │   ├── models/      # [dir] Functional models for simulation environments
│   │   ├── perception/  # [dir] Interfaces for the perception of agents
│   │   ├── randomizers/ # [dir] Domain randomization of the simulated environments
│   │   ├── runtimes/    # [dir] Runtime implementations of the task (sim/real)
│   │   ├── tasks/       # [dir] Implementation of tasks
│   │   ├── utils/       # [dir] Environment-specific utilities used across the submodule
│   │   └── worlds/      # [dir] Minimal templates of worlds for simulation environments
│   └── utils/           # [dir] Submodule for training and evaluation scripts boilerplate (using SB3)
├── examples/            # [dir] Examples for training and evaluating RL agents
├── hyperparams/         # [dir] Default hyperparameters for training RL agents
├── launch/              # [dir] ROS 2 launch scripts that can be used to interact with this repository
├── pretrained_agents/   # [dir] Collection of pre-trained agents
├── rviz/                # [dir] RViz2 config for visualization
├── scripts/             # [dir] Helpful scripts for training, evaluation and other utilities
├── CMakeLists.txt       # Colcon-enabled CMake recipe
└── package.xml          # ROS 2 package metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants