Play with CartPole
Gym is a toolkit for developing and comparing reinforcement learning algorithms. The Gym library is a collection of test problems - environments - that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.
CartPole-v1
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent if from falling over. Areward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.
Source:
This environment corresponds to the version of the Reinforcement Learningcart-pole problem described by Barto, Sutton, and Anderson.
Observation:
1 | Type: Box(4)Reinforcement Learning |
Actions:
1 | Type: Discrete(2) |
Reward:
1 | Reward is 1 for every step taken, including the termination step |
Starting State:
1 | All observations are assigned a uniform random value in [-0.05..0.05] |
Observations
The environment's step
function returns exactly what we need. In fact, step returns four values. These are:
observation(object):
an environment-specific object representing your observation of the environment. For example, pixel data from a camera, joint angles and joint velocities of a robot, or the board state in a board game.
reward(float):
amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.
done(boolean):
whether it’s time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)
info(dict):
diagnostic information useful for debugging. It can sometimes be useful for learning (for example, it might contain the raw probabilities behind the environment’s last state change). However, official evaluations of your agent are not allowed to use this for learning.
The following is an implementation of the classic "agent-environment loop". Each timestep, the agent choosees an acion
, and the environment returns an observation
and a reward
.
1 | import gym |
Spaces
In the examples above, we'have been sampling random actions from the environment's action space. But what actually are those actions? Every environment comes with an action_space
and an observation_space
. These attributes are of type Space
, and they describe the format of valid acitons and observations:
1 | import gym |
Discrete(2)
Box(4,)
1 | print(env.observation_space.high) |
[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
1 | print(env.observation_space.low) |
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]
Play by yourself
1 | import gym |
- 标题: Play with CartPole
- 作者: Oliver xu
- 创建于 : 2019-10-25 18:30:21
- 更新于 : 2024-11-20 21:07:04
- 链接: https://blog.oliverxu.cn/2019/10/25/Play-with-CartPole/
- 版权声明: 本文章采用 CC BY-NC-SA 4.0 进行许可。