
Oliver xu


这篇博客将长期更新,分为5个部分,包括abstract, Introduction, method, simulation, conclusion,争取做到每天更新,每天看一篇论文,在学习其内容的同时,整理其值得学习借鉴的语法句式,分类整理。





We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.


The hyper-parameters have intuitive interpretations and typically require little tuning.


Some connections to related algorithms, on which Adam was inspired, are discussed.


We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.


Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.


Because the available data typically only covers a small manifold of the possible space of inputs, a principal challenge is to be able to construct algorithms that can reason about uncertainty and out-of-distribution values, since a naive optimizer can easily exploit an estimated model to return adversarial inputs.


We propose to tackle this problem by leveraging the normalized maximum-likelihood (NML) estimator, which provides a principled approach to handling uncertainty and out-of-distribution inputs.


We demonstrate that our method can effectively optimize high-dimensional design problems in a variety of disciplines such as chemistry, biology, and materials engineering.


The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.



Stochastic gradient-based optimization is of core practical importance in many fields of science and engineering


Many problems in these fields can be cast as the optimization of some scalar parameterized objective function requiring maximization or minimization with respect to its parameters.


If the function is differentiable w.r.t. its parameters, gradient descent is a relatively efficient optimization method, since the computation of first-order partial derivatives w.r.t. all the parameters is of the same computational complexity as just evaluating the function.

如果函数是可微的 由于其一阶偏导数w.r.t的计算,梯度下降是一种相对有效的优化方法。 所有参数的计算复杂度与评估函数相同。

The focus of this paper is on the optimization of stochastic objectives with high-dimensional parameters spaces.


Some of Adam’s advantages are that the magnitudes of parameter updates are invariant to rescaling of the gradient, its stepsizes are approximately bounded by the stepsize hyperparameter, it does not require a stationary objective, it works with sparse gradients, and it naturally performs aform of step size annealing.

Adam的一些优点是参数更新的大小对于梯度的重新缩放是不变的,其步长大约由步长超参数界定,它不需要固定的目标函数,它适用于稀疏梯度,并且自然地执行阶跃形式 尺寸退火。

Many real-world optimization problems involve function evaluations that are the result of expensive or time-consuming process.


Rather than settling for a slow and expensive optimization process through repeated function evaluations, one may instead adopt a data-driven approach, where a large dataset of previously collected input-output pairs is given in lieu of running expensive function queries.


A straightforward method to solving offline MBO problems would be to estimate a proxy of the ground truth function f^θ using supervised learning, and to optimize the input x with respect to this proxy.


The main contribution of this work is to develop an offline MBO algorithm that utilizes a novel approximation to the NML distribution to obtain an uncertainty-aware forward model for optimization, which we call NEMO (Normalized maximum likelihood Estimation for Model-based Optimization).


Designing new deep reinforcement learning algorithms that can efficiently solve across a wide variety of problems generally requires a tremendous amount of manual effort.


Learning to design reinforcement learning algorithms or even small sub-components of algorithms would help ease this burden and could result in better algorithms than researchers could design manually.


While learning from scratch is generally less biased, encoding existing human knowledge into the learning process can speed up the optimization and also make the learned algorithm more interpretable.


We learn two new RL algorithms which outperform existing algorithms in both sample efficiency and final performance on the training and test environments.


The contribution of this paper is a method for searching over the space of RL algorithms, which we instantiate by developing a formal language that describes a broad class of value-based model-free reinforcement learning methods.


However, all of the aforementioned methods focus on the active or online setting, whereas in this work, we are concerned with the offline setting where additional function evaluations are not available.


Bibas et al. (2019) apply this framework for prediction using deep neural networks, but require an expensive fine tuning process for every input.


The goal of our work is to provide a scalable and tractable method to approximate the CNML distribution, and we apply this framework to offline optimization problems.


The estimation of distribution algorithm (Bengoetxea et al., 2001) alternates between searching in the input space and model space using a maximum likelihood objective.


One is in contextual bandits under the batch learning from bandit feedback setting, where learning is often done on logged experience (Swaminathan & Joachims, 2015; Joachims et al., 2018), or offline reinforcement learning (Levine et al., 2020), where model-based methods construct estimates of the MDP parameters.

需要修改:一种是在从匪徒反馈设置中进行批处理学习的情境匪徒中,学习通常是基于记录的经验(Swaminathan&Joachims,2015; Joachims等,2018)或离线强化学习(Levine等,2020), 其中基于模型的方法构造了MDP参数的估计值。


This can be understood as establishing a trust region around the current parameter value, beyond which the current gradient estimate does not provide sufficient information.


For many machine learning models, for instance, we often know in advance that good optima are with high probability within some set region in parameter space


This is a desirable property, since a smaller SNR means that there is greater uncertainty about whether the direction of mb t corresponds to the direction of the true gradient.

这是一个理想的特性,因为SNR越小意味着mb t的方向是否对应于真实梯度的方向的不确定性就越大。

Let g be the gradient of the stochastic objective f, and we wish to estimate its second raw moment (uncentered variance) using an exponential moving average of the squared gradient, with decay rate β2. Let g1, ..., gT be the gradients at subsequent timesteps, each a draw from an underlying gradient distribution gt ∼ p(gt).

令g为随机目标f的梯度,我们希望使用平方梯度的指数移动平均值(衰减率为β2)来估计其第二原始矩(无中心方差)。 令g1,...,gT为后续时间步长的梯度,每个梯度都是从基础梯度分布gt〜p(gt)中得出的。

We wish to know how E[vt], the expected value of the exponential moving average at timestep t, relates to the true second moment E[gt2], so we can correct for the discrepancy between the two.

我们希望知道E [vt],即时间步长t处的指数移动平均值的期望值与真实的第二矩E [gt2]之间的关系,因此我们可以校正两者之间的差异。

Since the nature of the sequence is unknown in advance, we evaluate our algorithm using the regret, that is the sum of all the previous difference between the online prediction ft(θt) and the best fixed point parameter ft(θ∗) from a feasible set X for all the previous steps.

由于序列的性质事先未知,因此我们遗憾地评估了我们的算法,即在线预测ft(θt)和最佳定点参数ft(θ∗)之间所有先前差值的总和。 为前面的所有步骤设置X。

However, in offline MBO, the algorithm is not allowed to query the true function f(yjx), and must find the best possible point x∗ using only the guidance of a fixed dataset D = fx1:N ; y1:Ng.


While adversarial ground truth functions can easily be constructed where this is the best one can do (e.g., if f(x) = −1 on any x = 2 D), in many reasonable domains it should be possible to perform better than the best point in the dataset.

需要修改:尽管可以轻松地构造出最佳的对抗性地面真理函数(例如,如果在任意x = 2 D上f(x)= -1),但在许多合理的域中,应该有可能比最佳的表现更好点在数据集中。

One of the primary contributions of this paper is to discuss how to approximate this intractable computation with a tractable one that is sufficient for optimization on challenging problems, which we discuss in Section 4.


To remedy this problem, we propose to amortize the learning process by incrementally learning the NML distribution while optimizing the iterate xt.


While quantization has potential to induce additional rounding errors to the optimization process, we find in our experiments in Section 5 that using moderate value such as K = 20 or K = 40 provides both a reasonably accurate solution while not being excessively demanding on computation.

尽管量化有可能在优化过程中引起额外的舍入误差,但我们在第5节的实验中发现,使用中等值(例如K = 20或K = 40)既可以提供合理的精度,又不会对计算产生过多的要求。

We now highlight some theoretical motivation for using CNML in the MBO setting, and show that estimating the true function with the CNML distribution is close to an expert even if the test label is chosen adversarially, which makes it difficult for an optimizer to exploit the model.

需要修改:现在,我们重点介绍了在MBO设置中使用CNML的一些理论动机,并表明,即使测试标签是经过逆向选择的,使用CNML分布来估计真实函数也很接近专家,这使得优化器难以利用模型 。


To empirically evaluate the proposed method, we investigated different popular machine learning models, including logistic regression, multilayer fully connected neural networks and deep convolutional neural networks.


Logistic regression has a well-studied convex objective, making it suitable for comparison of different optimizers without worrying about local minimum issues.


In our experiments, we made model choices that are consistent with previous publications in the area; a neural network model with two fully connected hidden layers with 1000 hidden units each and ReLU activation are used for this experiment with minibatch size of 128.

在我们的实验中,我们做出了与该领域以前的出版物一致的模型选择。 该神经网络模型具有两个完全连接的隐藏层,每个隐藏层各具有1000个隐藏单元,并且ReLU激活用于最小批量为128的实验。

Due to the cost of updating curvature information, SFO is 5-10x slower per iteration compared to Adam, and has a memory requirement that is linear in the number minibatches.


Whereas, reducing the minibatch variance through the first moment is more important in CNNs and contributes to the speed-up.


Though Adam shows marginal improvement over SGD with momentum, it adapts learning rate scale for different layers instead of hand picking manually as in SGD.


The details for the tasks, baselines, and experimental setup are as follows, and hyperparameter choices with additional implementation details can be found in Appendix A.2


Because we do not have access to a real physical process for evaluating the material and molecule design tasks, Design-bench follows experimental protocol used in prior work (Brookes et al., 2019; Fannjiang & Listgarten, 2020) which obtains a ground truth evaluation function by training a separate regressor model to evaluate the performance of designs.

由于我们无法使用真实的物理过程来评估材料和分子设计任务,因此设计台遵循先前工作中使用的实验方案(Brookes等人,2019年; Fannjiang&Listgarten,2020年),该方法可获取地面真实性评估 通过训练一个单独的回归模型来评估设计性能来发挥作用。

CbAS uses a generative model of p(x) as a trust region to prevent model exploitation, and autofocused oracles expands upon CbAS by iteratively updating the learned proxy function and iterates within a minimax game based on a quantity known as the oracle gap.


NEMO outperforms all methods on the Superconductor task by a very large margin, under both the 100th and 50th percentile metrics, and in the HopperController task under the 100th percentile metric.


These results are promising in that NEMO performs consistently well across all 6 domains evaluated, and indicates a significant number of designs found in the GFP and Superconductor task were better than the best performing design in the dataset.



Our method is aimed towards machine learning problems with large datasets and/or high-dimensional parameter spaces.


Overall, we found Adam to be robust and well-suited to a wide range of non-convex optimization problems in the field machine learning.


We have presented NEMO (Normalized Maximum Likelihood Estimation for Model-Based Optimization), an algorithm that mitigates model exploitation on MBO problems by constructing a conservative model of the true function.


We evaluated NEMO on a number of design problems in materials science, robotics, biology, and chemistry, where we show that it attains very large improvements on two tasks, while performing competitively with respect to prior methods on the other four.


We design a general language for representing algorithms which compute the loss function for value-based model-free RL agents to optimize.


We highlight two learned algorithms which although relatively simple, can obtain good generalization performance over a wide range of environments.


Our analysis of the learned algorithms sheds insight on their benefit as regularization terms which are similar to recently proposed algorithms.


  • 标题: 科研论文常用句式语法汇总整理
  • 作者: Oliver xu
  • 创建于 : 2021-01-31 19:09:00
  • 更新于 : 2025-02-28 21:06:43
  • 链接: https://blog.oliverxu.cn/2021/01/31/科研论文常用句式语法汇总整理/
  • 版权声明: 本文章采用 CC BY-NC-SA 4.0 进行许可。