• 使用强化学习求解鲁棒控制器

    问题背景

    不确定动态系统的鲁棒控制近年来受到了控制界的广泛关注[1],对于很多场景,例如化学过程,电力系统,机器人和航天工程中,被控系统往往不能够得到精确的数学模型或被控系统存在不确定性,因此获得系统的鲁棒性能对系统的精准控制起着重要作用。

    鲁棒稳定性和最优控制器的设计具有一定的关系[2],求解鲁棒控制器在某些条件下可以转化为求解一个最优控制器的问题,对于离散时间线性系统来说,求解最优控制器,就是求解代数Riccatic方程,对于非线性系统,就是求解HJB方程。然而,对于一般的非线性系统,HJB方程的解析解可能不存在,通常,使用迭代的算法进行求解,可以采用ADP(approximate dynamic programming)的方法进行求解。

    近些年,强化学习在解决不确定环境下的决策问题上取得了巨大的成功[3],通常,强化学习算法可以分为on-policy和off-policy两类算法,on-policy算法将每次迭代后的策略运用到被控对象上,而off-policy优化的策略和与环境交互的策略不一定是一个策略,即策略的更新可以在多步迭代以后。

    目前,对于不确定性离散时间线性系统的鲁棒控制问题,已经有一些团队采用强化学习方法,在系统动力学信息完全未知或部分未知的情况下,使用自适应动态规划的算法,对贝尔曼方程进行求解。

    本文章对使用强化学习求解线性离散时间不确定性鲁棒控制器的问题进行综述。

  • Word与Latex

    当阅读完一篇论文后,可能会对论文做一些整理,论文中的公式有时候非常多且复杂,本篇博客记录一些提高生产力的工具

    Mathpix:将论文中的公式识别成latex或者mathml格式

    有一款软件:Mathpix,能够将公式识别成latex或者mathml格式。

    使用该软件需要使用邮箱注册账户,免费版一个月只能识别50次,也有一个开发版,一个月能识别1000次,但是需要信用卡验证,对于国内用户来说,比较麻烦。

    这里给出一个解决办法:使用临时邮箱,待免费套餐使用完后,重新注册账户,测试了几个免费邮箱服务,大多被mathpix识别了,https://temp-mail.org/en/ 这个我测试了一下,没有被mathpix识别,可以用于注册。

  • Playing Cartpole with natural deep reinforcement learning

    Introduction

    There are two kinds of methods in reinforcement learning, tabular methods and approximate methods. The purpose of RL is to get an optimal policy, which tells you to choose what action A when you at state S. If the state and aciton spaces are small enough, value fucntion can be represented as arrays, or tables. The problem with large state space is not just the memory needed for large tables, but the time and data needed to fill them accurately. If the state and aciton spaces are too large, due to the limitions of time and data, value functions need to be approximated with limited computational resources. In this case, out goal instead is to find a good enough approximate solution compared to optimal solution.

  • Play with MountainCar

    MountainCar is a task of driving an underpowered car up a steep mountain road. The difficulty is that gravity is stronger than the car's engine, and even at full throttle the car cannot accelerate up the steep slope. The only solution is to first move away from the goal and up the opposite slope on the left. Then, by applying full throttle the car can build up enough inertia to carry it up the steep slope even though it is slowing down the whole way.

    State

    Its state space has only two entries. The first is the position of the car. The second is the speed of the car. They are limited in:

    The negative speed means that the direction of the movement of the car is opposite to the right, which is left in this environement.

  • Mysql Command

    This post will record all the mysql command that i used in my projects.

  • Tensorflow tutorial

    This tutorial will provide a brief overview of the core concepts and functionality of Tensorflow. This tutorial will cover the following:

    1. What is Tensorflow
    2. How to input data
    3. How to perform computations
    4. How to create variables
    5. How to train a neural network for a simple regression problem
    6. Tips and tricks

  • Play with CartPole

    Gym is a toolkit for developing and comparing reinforcement learning algorithms. The Gym library is a collection of test problems - environments - that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.

    CartPole-v1

    A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent if from falling over. Areward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.

    Openai Gym CartPole-v1

  • Monte Carlo Methods

    Unlike Dynamic Programming Methods, Monte Carlo Methods do not assume complete knowledge of the environment. MC only requires experience--sample sequences of states, actions, and rewards from actual or simulated interaction with an environment.

    Monte Carlo Prediction

    The idea underlies all Monte Carlo methods is that as more returns are observed the average should converge to the expected value. So we begin by considerng Monte Carlo methods for learning the state-value function for a given policy. A way to estimate the value of a state from experience is simply to average the returns observed after visits to that state.

  • Simplify: Python VS Matlab

    Background

    The homework of Linear System Theory is challenging. Here is Problem 1:

    Consider a linear system with a state transition matrix

    Compute A(t).

    Since the given system is linear time variant system, by using some properties of , we can easily compute . However, the expression is so complicate that i could not simplify it by hand. I call for some tools for help. The tools are Python and Matlab.

    This post compare the difference of simplify function in Python and Matlab.

  • controllability and observability

    1
    2
    3
    from sympy import *
    import numpy as np

    Problem 4.1

    Determine whether the following continuous-time linear time-invariant system is fully controllable

14567810