Home Rotary Pendulum with PPO and Domain Randomization
Post
Cancel

Rotary Pendulum with PPO and Domain Randomization

Overview


  • Rotary Pendulum with PPO and Domain Randomization is a personal project.

Goal


  • To achieve robust control of a rotary (Furuta) pendulum using PPO and Domain Randomization (DR) in simulation, in preparation for real-world implementation where physical inaccuracies may occur.

Description


  • The source code is written in Python language and the modeling of rotary pendulum in XML format is from macstepien’s furuta_pendulum repository.

  • The project utilizes PPO algorithm from the Stable-Baselines3 library as DRL framework.

  • Simulations are conducted using Mujoco.

  • Domain Randomization (DR) is a method that considers the real-world environment as one of many possible random variations, enabling the simulation to learn under diverse physical conditions.

  • Three physical properties are randomized: the mass of the pendulum, the length of the pendulum, and the mass of the rod (arm1) connecting the pendulum to the central cylinder.

  • The state space includes the angles and angular velocities of arm1 and the pendulum.

  • The action space consists of actuator torques ranging from -0.4 to 0.4.

  • The reward function is designed to encourage the pendulum to maintain the upright position.

rotary_1 rotary_2 rotary_3

References


This post is licensed under CC BY 4.0 by the author.