0%

This tutorial will provide a brief overview of the core concepts and functionality of Tensorflow. This tutorial will cover the following:

  1. What is Tensorflow
  2. How to input data
  3. How to perform computations
  4. How to create variables
  5. How to train a neural network for a simple regression problem
  6. Tips and tricks
    Read more »

Gym is a toolkit for developing and comparing reinforcement learning algorithms. The Gym library is a collection of test problems - environments - that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.

CartPole-v1

A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent if from falling over. Areward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.

Openai Gym CartPole-v1

Read more »

Unlike Dynamic Programming Methods, Monte Carlo Methods do not assume complete knowledge of the environment. MC only requires experience–sample sequences of states, actions, and rewards from actual or simulated interaction with an environment.

Monte Carlo Prediction

The idea underlies all Monte Carlo methods is that as more returns are observed the average should converge to the expected value. So we begin by considerng Monte Carlo methods for learning the state-value function for a given policy. A way to estimate the value of a state from experience is simply to average the returns observed after visits to that state.

Read more »

Background

The homework of Linear System Theory is challenging. Here is Problem 1:

Consider a linear system with a state transition matrix $\phi(t,t_0)$

$$\phi(t,t_0)=\displaystyle \left[\begin{matrix} e^{t} \cos{\left(2 t \right)} & e^{- 2 t} \sin{\left(2 t \right)}\\ -e^{t} \sin{\left(2 t \right)} & e^{- 2 t} \cos{\left(2 t \right)} \end{matrix}\right]$$

Compute A(t).

Since the given system is linear time variant system, by using some properties of $\phi$, we can easily compute $A(t)$. However, the expression is so complicate that i could not simplify it by hand. I call for some tools for help. The tools are Python and Matlab.

This post compare the difference of simplify function in Python and Matlab.

Read more »

本文是reinforcement learning:An introduction书第三章的翻译.

在本章中我们将介绍有限马尔科夫决策过程形式的问题(finite MDPs),本书的剩余部分都将解决此类问题.该问题包括可评估的反馈,就像和bandits问题一样,但是同时也有一个相关性的方面-在不同的情形下选择不同的行为.MDPs是序列决策问题的一种经典的形式.行为不光影响立即的回报,同时影响接下去的状态.因此,MDPs包括未来的奖励并且需要权衡立即的奖励与未来的奖励.在bandit问题中,我们估计每个行为$a$的值$q_*(a)$.

在MDPs中我们估计在$s$状态下,在$a$行为的条件下的值$q_*(s,a)$.

Read more »

**CBDict:一个专门为Linux环境下的学术党设计的文献取词翻译器**

ClipBoardDictionary

This program monitors the clipboard of the system and translates the word from English to Chinese by YouDao API, specially designed for Students who are working under Linux environment where there is no simple translator when you read papers.

Read more »

减肥中的热量单位

在了解减肥的原理之前,需要首先了解常用的热量单位,卡,卡路里,千卡,大卡,千焦。我们买来的食品的包装袋上一般都标注着每100克该食品的营养成分,使用的就是这些单位。

  • 卡 = 卡路里 = 4.184焦耳
  • 1千卡 = 1大卡 = 1000卡 = 4184焦耳 = 4.184千焦
Read more »