Architecture:
- 3 Convolutional layers to extract hierarchical features from input data.
- 3 fully connected layers for the final classification.
- Batch Normalization layers to accelerate the training process.
- ReLU and sigmoid activation functions are employed for introducing non-linearity between the layers.
Used DAgger to improve my reward after regression models were applied.
My Dagger iteration algorithm:
- Policy Execution: The learned policy (infer_action) is executed in the environment, generating an episode of behavior.
- Expert Correction: An expert provides corrective actions for “each state” encountered during policy execution. All the states in the episode are saved. Expert data is collected using a timer.
- Data Aggregation: The new observations and expert actions are added to the training data.
- Policy Refinement: The policy is re-trained using the aggregated dataset.
- Model Saving: The updated model is saved after each iteration.
Dagger implementation can be found in dagger.py file.
Packages required:
- pytorch
- opencv
- box2d
- numpy