Overview
- Rotary Pendulum with PPO and Domain Randomization is a personal project.
Goal
- To achieve robust control of a rotary (Furuta) pendulum using PPO and Domain Randomization (DR) in simulation, in preparation for real-world implementation where physical inaccuracies may occur.
Description
The source code is written in Python language and the modeling of rotary pendulum in XML format is from macstepien’s furuta_pendulum repository.
The project utilizes PPO algorithm from the Stable-Baselines3 library as DRL framework.
Simulations are conducted using Mujoco.
Domain Randomization (DR) is a method that considers the real-world environment as one of many possible random variations, enabling the simulation to learn under diverse physical conditions.
Three physical properties are randomized: the mass of the pendulum, the length of the pendulum, and the mass of the rod (arm1) connecting the pendulum to the central cylinder.
The state space includes the angles and angular velocities of arm1 and the pendulum.
The action space consists of actuator torques ranging from -0.4 to 0.4.
The reward function is designed to encourage the pendulum to maintain the upright position.
References
- rotary_pendulum_ppo [Github Link]