Oliver xu's Blog

使用强化学习求解鲁棒控制器
问题背景
不确定动态系统的鲁棒控制近年来受到了控制界的广泛关注[1]，对于很多场景，例如化学过程，电力系统，机器人和航天工程中，被控系统往往不能够得到精确的数学模型或被控系统存在不确定性，因此获得系统的鲁棒性能对系统的精准控制起着重要作用。
鲁棒稳定性和最优控制器的设计具有一定的关系[2]，求解鲁棒控制器在某些条件下可以转化为求解一个最优控制器的问题，对于离散时间线性系统来说，求解最优控制器，就是求解代数Riccatic方程，对于非线性系统，就是求解HJB方程。然而，对于一般的非线性系统，HJB方程的解析解可能不存在，通常，使用迭代的算法进行求解，可以采用ADP（approximate dynamic programming）的方法进行求解。
近些年，强化学习在解决不确定环境下的决策问题上取得了巨大的成功[3]，通常，强化学习算法可以分为on-policy和off-policy两类算法，on-policy算法将每次迭代后的策略运用到被控对象上，而off-policy优化的策略和与环境交互的策略不一定是一个策略，即策略的更新可以在多步迭代以后。
目前，对于不确定性离散时间线性系统的鲁棒控制问题，已经有一些团队采用强化学习方法，在系统动力学信息完全未知或部分未知的情况下，使用自适应动态规划的算法，对贝尔曼方程进行求解。
本文章对使用强化学习求解线性离散时间不确定性鲁棒控制器的问题进行综述。
2020-06-28
Reinforcement Learning
Reinforcement Learning
阅读全文使用强化学习求解鲁棒控制器
Word与Latex
当阅读完一篇论文后，可能会对论文做一些整理，论文中的公式有时候非常多且复杂，本篇博客记录一些提高生产力的工具
Mathpix：将论文中的公式识别成latex或者mathml格式
有一款软件：Mathpix，能够将公式识别成latex或者mathml格式。
使用该软件需要使用邮箱注册账户，免费版一个月只能识别50次，也有一个开发版，一个月能识别1000次，但是需要信用卡验证，对于国内用户来说，比较麻烦。
这里给出一个解决办法：使用临时邮箱，待免费套餐使用完后，重新注册账户，测试了几个免费邮箱服务，大多被mathpix识别了，https://temp-mail.org/en/ 这个我测试了一下，没有被mathpix识别，可以用于注册。
2020-06-27
Latex
Latex
阅读全文Word与Latex
Playing Cartpole with natural deep reinforcement learning
Introduction
There are two kinds of methods in reinforcement learning, tabular methods and approximate methods. The purpose of RL is to get an optimal policy, which tells you to choose what action A when you at state S. If the state and aciton spaces are small enough, value fucntion can be represented as arrays, or tables. The problem with large state space is not just the memory needed for large tables, but the time and data needed to fill them accurately. If the state and aciton spaces are too large, due to the limitions of time and data, value functions need to be approximated with limited computational resources. In this case, out goal instead is to find a good enough approximate solution compared to optimal solution.
2019-12-01
Reinforcement Learning
Reinforcement Learning
阅读全文Playing Cartpole with natural deep reinforcement learning
Play with MountainCar
MountainCar is a task of driving an underpowered car up a steep mountain road. The difficulty is that gravity is stronger than the car's engine, and even at full throttle the car cannot accelerate up the steep slope. The only solution is to first move away from the goal and up the opposite slope on the left. Then, by applying full throttle the car can build up enough inertia to carry it up the steep slope even though it is slowing down the whole way.
State
Its state space has only two entries. The first is the position of the car. The second is the speed of the car. They are limited in:
The negative speed means that the direction of the movement of the car is opposite to the right, which is left in this environement.
2019-11-15
Reinforcement Learning
Reinforcement Learning
阅读全文Play with MountainCar
Mysql Command
This post will record all the mysql command that i used in my projects.
2019-11-05
Mysql
Mysql
阅读全文Mysql Command
Tensorflow tutorial
This tutorial will provide a brief overview of the core concepts and functionality of Tensorflow. This tutorial will cover the following:
1. What is Tensorflow
2. How to input data
3. How to perform computations
4. How to create variables
5. How to train a neural network for a simple regression problem
6. Tips and tricks
2019-11-01
Tensorflow
Tensorflow
阅读全文Tensorflow tutorial
Play with CartPole
Gym is a toolkit for developing and comparing reinforcement learning algorithms. The Gym library is a collection of test problems - environments - that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.
CartPole-v1
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent if from falling over. Areward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.
Openai Gym CartPole-v1
2019-10-25
Reinforcement Learning
Reinforcement Learning
阅读全文Play with CartPole
Monte Carlo Methods
Unlike Dynamic Programming Methods, Monte Carlo Methods do not assume complete knowledge of the environment. MC only requires experience--sample sequences of states, actions, and rewards from actual or simulated interaction with an environment.
Monte Carlo Prediction
The idea underlies all Monte Carlo methods is that as more returns are observed the average should converge to the expected value. So we begin by considerng Monte Carlo methods for learning the state-value function for a given policy. A way to estimate the value of a state from experience is simply to average the returns observed after visits to that state.
2019-10-21
阅读全文Monte Carlo Methods
Simplify: Python VS Matlab
Background
The homework of Linear System Theory is challenging. Here is Problem 1:
Consider a linear system with a state transition matrix
Compute A(t).
Since the given system is linear time variant system, by using some properties of , we can easily compute . However, the expression is so complicate that i could not simplify it by hand. I call for some tools for help. The tools are Python and Matlab.
This post compare the difference of simplify function in Python and Matlab.
2019-10-20
Python
Python
阅读全文Simplify: Python VS Matlab
controllability and observability
1
2
3
from sympy import *
import numpy as np
Problem 4.1
Determine whether the following continuous-time linear time-invariant system is fully controllable
2019-08-13
control theory
control theory
阅读全文controllability and observability

1…4 567 8…10

问题背景

Mathpix：将论文中的公式识别成latex或者mathml格式

Introduction

State

CartPole-v1

Monte Carlo Prediction

Background

Problem 4.1