Gym is a toolkit for developing and comparing reinforcement learning algorithms. The Gym library is a collection of test problems - environments - that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent if from falling over. Areward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.
Openai Gym CartPole-v1
This environment corresponds to the version of the Reinforcement Learningcart-pole problem described by Barto, Sutton, and Anderson.
Type: Box(4)Reinforcement Learning
0 Push cart to the left
1 Push cart to the right
Note: The amount the velocity that is reduced or increased is not fixed; it depends on the angle the pole is pointing. This is because the center of gravity of the pole increases the amount of energy needed to move the cart underneath it
Reward is 1 for every step taken, including the termination step
All observations are assigned a uniform random value in [-0.05..0.05]
Pole Angle is more than 12 degrees
Cart Position is more than 2.4 (center of the cart reaches the edge of the display)
Episode length is greater than 200
Considered solved when the average reward is greater than or equal to 195.0 over 100 consecutive trials.
[Openai Gym CartPole-v1 Github](https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py)
Here's a minimum example of getting something running. This will running an instance of the `CartPole-v0` environment for 1000 timesteps, rendering the environment at each step. You should see a window pop up rendering the classic cart-poel problem:
env = gym.make('CartPole-v0')
for _ in range(1000):
env.step(env.action_space.sample()) # take a random aciton
step function returns exactly what we need. In fact, step returns four values. These are:
an environment-specific object representing your observation of the environment. For example, pixel data from a camera, joint angles and joint velocities of a robot, or the board state in a board game.
amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.
whether it’s time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)
diagnostic information useful for debugging. It can sometimes be useful for learning (for example, it might contain the raw probabilities behind the environment’s last state change). However, official evaluations of your agent are not allowed to use this for learning.
The following is an implementation of the classic “agent-environment loop”. Each timestep, the agent choosees an
acion, and the environment returns an
observation and a
In the examples above, we’have been sampling random actions from the environment’s action space. But what actually are those actions? Every environment comes with an
action_space and an
observation_space. These attributes are of type
Space, and they describe the format of valid acitons and observations:
[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]