科研论文常用句式语法汇总整理

科研论文写作和平常的写作不太一样，从我自己写下来的感觉，发现自己写的英文句式很死板，读上去没有那些native的作者的语言的感觉，因此需要进行专业训练。训练方式就是模仿那些作者的文字。

这篇博客将长期更新，分为5个部分，包括abstract, Introduction, method, simulation, conclusion，争取做到每天更新，每天看一篇论文，在学习其内容的同时，整理其值得学习借鉴的语法句式，分类整理。

对于每一篇文章，每一个section找5句有特色的句子，将其中文意思和英文表单都整理一下，因为从实际写作的角度来看，很多情况下我们无法写出合适的英文表达，有一部分原因是因为我们使用中文都无法准确将我们想要表达的内容表达出来，所以英译汉也是一个十分重要的部分。

通过不断整理，相当于整理了一个可供中文查询的数据库，例如，我想搜"关系"，直接可以搜到整理到的相关的语句表达。

在阅读文献的时候，将好的语句表达用不同颜色标注一下。

Abstract

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

我们介绍了Adam，一种基于自适应低阶矩估计的用于随机目标函数的一阶梯度优化算法。

The hyper-parameters have intuitive interpretations and typically require little tuning.

超参数具有直观的解释，通常需要很少的调整。

Some connections to related algorithms, on which Adam was inspired, are discussed.

讨论了一些对Adam启发的相关算法的联系

We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

我们还分析了算法的理论收敛性，并提供了与在线凸优化框架下已知的最好的结果相当的收敛率的后悔界。

Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.

实证结果表明，Adam在实践中效果很好，并且可以与其他随机优化方法进行比较。

Because the available data typically only covers a small manifold of the possible space of inputs, a principal challenge is to be able to construct algorithms that can reason about uncertainty and out-of-distribution values, since a naive optimizer can easily exploit an estimated model to return adversarial inputs.

由于可用数据通常仅覆盖输入空间的一小部分，因此主要的挑战是能够构造出一个算法可以对不确定性和分布以外的数据进行推理，因为本地的优化器可以轻松利用估计的模型返回对抗性输入。

We propose to tackle this problem by leveraging the normalized maximum-likelihood (NML) estimator, which provides a principled approach to handling uncertainty and out-of-distribution inputs.

我们提出通过利用归一化的最大似然（ML）估计来解决此问题，该估计提供了一种处理不确定性和分布失调输入的原则方法。

We demonstrate that our method can effectively optimize high-dimensional design problems in a variety of disciplines such as chemistry, biology, and materials engineering.

我们证明了我们的方法可以有效地优化化学，生物学和材料工程等众多学科中的高维设计问题。

The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.

对学习到的算法行为的分析表明，它与最近提出的RL算法相似，后者解决了基于值的方法中的高估问题。

Introduction

Stochastic gradient-based optimization is of core practical importance in many fields of science and engineering

基于随机梯度的优化在科学和工程学的许多领域中具有核心的实践重要性

Many problems in these fields can be cast as the optimization of some scalar parameterized objective function requiring maximization or minimization with respect to its parameters.

这些领域中的许多问题都可以归结为某些标量参数化目标函数的优化，要求对其参数进行最大化或最小化。

If the function is differentiable w.r.t. its parameters, gradient descent is a relatively efficient optimization method, since the computation of first-order partial derivatives w.r.t. all the parameters is of the same computational complexity as just evaluating the function.

如果函数是可微的由于其一阶偏导数w.r.t的计算，梯度下降是一种相对有效的优化方法。所有参数的计算复杂度与评估函数相同。

The focus of this paper is on the optimization of stochastic objectives with high-dimensional parameters spaces.

本文的重点是优化具有高维参数空间的随机目标。

Some of Adam’s advantages are that the magnitudes of parameter updates are invariant to rescaling of the gradient, its stepsizes are approximately bounded by the stepsize hyperparameter, it does not require a stationary objective, it works with sparse gradients, and it naturally performs aform of step size annealing.

Adam的一些优点是参数更新的大小对于梯度的重新缩放是不变的，其步长大约由步长超参数界定，它不需要固定的目标函数，它适用于稀疏梯度，并且自然地执行阶跃形式尺寸退火。

Many real-world optimization problems involve function evaluations that are the result of expensive or time-consuming process.

许多实际的优化问题都涉及功能评估，这是昂贵或耗时的过程的结果。

Rather than settling for a slow and expensive optimization process through repeated function evaluations, one may instead adopt a data-driven approach, where a large dataset of previously collected input-output pairs is given in lieu of running expensive function queries.

与其通过重复的函数评估来解决缓慢而昂贵的优化过程，不如采用一种数据驱动的方法，其中使用大量先前收集的输入输出对数据集代替运行昂贵的函数查询。

A straightforward method to solving offline MBO problems would be to estimate a proxy of the ground truth function f^θ using supervised learning, and to optimize the input x with respect to this proxy.

解决离线MBO问题的一种直接方法是使用监督学习估计真实函数的替代函数，并且针对该替代函数优化输入x。

The main contribution of this work is to develop an offline MBO algorithm that utilizes a novel approximation to the NML distribution to obtain an uncertainty-aware forward model for optimization, which we call NEMO (Normalized maximum likelihood Estimation for Model-based Optimization).

这项工作的主要贡献是开发一种离线MBO算法，该算法利用一个新颖的近似算法来近似NML分布，来获得不确定性感知的正向模型以进行优化，我们将其称为NEMO（基于模型的优化的归一化最大似然估计）。

Designing new deep reinforcement learning algorithms that can efficiently solve across a wide variety of problems generally requires a tremendous amount of manual effort.

设计能够有效解决各种问题的新的深度强化学习算法通常需要大量的人工工作。

Learning to design reinforcement learning algorithms or even small sub-components of algorithms would help ease this burden and could result in better algorithms than researchers could design manually.

学习设计强化学习算法甚至算法的较小子组件将有助于减轻这种负担，并且可能会产生比研究人员可以手动设计的更好的算法。

While learning from scratch is generally less biased, encoding existing human knowledge into the learning process can speed up the optimization and also make the learned algorithm more interpretable.

虽然从头开始学习通常没有什么偏见，但是将现有的人类知识编码到学习过程中可以加快优化速度，并使学习的算法更具可解释性。

We learn two new RL algorithms which outperform existing algorithms in both sample efficiency and final performance on the training and test environments.

我们学习了两种新的RL算法，它们在训练和测试环境中的采样效率和最终性能均优于现有算法。

The contribution of this paper is a method for searching over the space of RL algorithms, which we instantiate by developing a formal language that describes a broad class of value-based model-free reinforcement learning methods.

本文的贡献是一种用于搜索RL算法空间的方法，我们通过开发一种形式语言来实例化该方法，该语言描述了一大类基于值的无模型强化学习方法。

However, all of the aforementioned methods focus on the active or online setting, whereas in this work, we are concerned with the offline setting where additional function evaluations are not available.

但是，所有上述方法都集中在在线环境中，然而在这项工作中，我们关注的是离线环境，在该设置下无法进行其他功能评估。

Bibas et al. (2019) apply this framework for prediction using deep neural networks, but require an expensive fine tuning process for every input.

Bibas等将这种框架应用于使用深度神经网络的预测，但需要为每个输入进行昂贵的微调过程。

The goal of our work is to provide a scalable and tractable method to approximate the CNML distribution, and we apply this framework to offline optimization problems.

我们工作的目标是提供一种可扩展且易于处理的方法来近似CNML分布，并将此框架应用于离线优化问题。

The estimation of distribution algorithm (Bengoetxea et al., 2001) alternates between searching in the input space and model space using a maximum likelihood objective.

分布算法的估计（Bengoetxea等，2001）使用最大似然目标在输入空间和模型空间中进行搜索之间交替。

One is in contextual bandits under the batch learning from bandit feedback setting, where learning is often done on logged experience (Swaminathan & Joachims, 2015; Joachims et al., 2018), or offline reinforcement learning (Levine et al., 2020), where model-based methods construct estimates of the MDP parameters.

需要修改：一种是在从匪徒反馈设置中进行批处理学习的情境匪徒中，学习通常是基于记录的经验（Swaminathan＆Joachims，2015； Joachims等，2018）或离线强化学习（Levine等，2020），其中基于模型的方法构造了MDP参数的估计值。

Method

This can be understood as establishing a trust region around the current parameter value, beyond which the current gradient estimate does not provide sufficient information.

这可以理解为在当前参数值周围建立信任区域，超过该范围当前梯度估计将无法提供足够的信息。

For many machine learning models, for instance, we often know in advance that good optima are with high probability within some set region in parameter space

例如，对于许多机器学习模型，我们通常会事先知道，在参数空间的某些设置区域内，良好的最优概率很高

This is a desirable property, since a smaller SNR means that there is greater uncertainty about whether the direction of mb t corresponds to the direction of the true gradient.

这是一个理想的特性，因为SNR越小意味着mb t的方向是否对应于真实梯度的方向的不确定性就越大。

Let g be the gradient of the stochastic objective f, and we wish to estimate its second raw moment (uncentered variance) using an exponential moving average of the squared gradient, with decay rate β2. Let g1, ..., gT be the gradients at subsequent timesteps, each a draw from an underlying gradient distribution gt ∼ p(gt).

令g为随机目标f的梯度，我们希望使用平方梯度的指数移动平均值（衰减率为β2）来估计其第二原始矩（无中心方差）。令g1，...，gT为后续时间步长的梯度，每个梯度都是从基础梯度分布gt〜p（gt）中得出的。

We wish to know how E[vt], the expected value of the exponential moving average at timestep t, relates to the true second moment E[gt2], so we can correct for the discrepancy between the two.

我们希望知道E [vt]，即时间步长t处的指数移动平均值的期望值与真实的第二矩E [gt2]之间的关系，因此我们可以校正两者之间的差异。

Since the nature of the sequence is unknown in advance, we evaluate our algorithm using the regret, that is the sum of all the previous difference between the online prediction ft(θt) and the best fixed point parameter ft(θ∗) from a feasible set X for all the previous steps.

由于序列的性质事先未知，因此我们遗憾地评估了我们的算法，即在线预测ft（θt）和最佳定点参数ft（θ∗）之间所有先前差值的总和。为前面的所有步骤设置X。

However, in offline MBO, the algorithm is not allowed to query the true function f(yjx), and must find the best possible point x∗ using only the guidance of a fixed dataset D = fx1:N ; y1:Ng.

但是，在离线MBO中，该算法不允许查询真实函数f，并且必须仅在固定数据集D的指导下找到最佳可能点x。

While adversarial ground truth functions can easily be constructed where this is the best one can do (e.g., if f(x) = −1 on any x = 2 D), in many reasonable domains it should be possible to perform better than the best point in the dataset.

需要修改：尽管可以轻松地构造出最佳的对抗性地面真理函数（例如，如果在任意x = 2 D上f（x）= -1），但在许多合理的域中，应该有可能比最佳的表现更好点在数据集中。

One of the primary contributions of this paper is to discuss how to approximate this intractable computation with a tractable one that is sufficient for optimization on challenging problems, which we discuss in Section 4.

需要修改：本文的主要贡献之一是讨论如何用一种足以解决难题的优化方法的易处理性来逼近这一难处理的计算，我们将在第4节中进行讨论。

To remedy this problem, we propose to amortize the learning process by incrementally learning the NML distribution while optimizing the iterate xt.

需要修改：为了解决这个问题，我们建议在优化迭代xt的同时通过逐步学习NML分布来摊销学习过程。

While quantization has potential to induce additional rounding errors to the optimization process, we find in our experiments in Section 5 that using moderate value such as K = 20 or K = 40 provides both a reasonably accurate solution while not being excessively demanding on computation.

尽管量化有可能在优化过程中引起额外的舍入误差，但我们在第5节的实验中发现，使用中等值（例如K = 20或K = 40）既可以提供合理的精度，又不会对计算产生过多的要求。

We now highlight some theoretical motivation for using CNML in the MBO setting, and show that estimating the true function with the CNML distribution is close to an expert even if the test label is chosen adversarially, which makes it difficult for an optimizer to exploit the model.

需要修改：现在，我们重点介绍了在MBO设置中使用CNML的一些理论动机，并表明，即使测试标签是经过逆向选择的，使用CNML分布来估计真实函数也很接近专家，这使得优化器难以利用模型。

Simulation

To empirically evaluate the proposed method, we investigated different popular machine learning models, including logistic regression, multilayer fully connected neural networks and deep convolutional neural networks.

为了从经验上评估所提出的方法，我们研究了各种流行的机器学习模型，包括逻辑回归，多层完全连接神经网络和深度卷积神经网络。

Logistic regression has a well-studied convex objective, making it suitable for comparison of different optimizers without worrying about local minimum issues.

Logistic回归具有经过充分研究的凸目标，使其适合比较不同的优化器，而不必担心局部最小问题。

In our experiments, we made model choices that are consistent with previous publications in the area; a neural network model with two fully connected hidden layers with 1000 hidden units each and ReLU activation are used for this experiment with minibatch size of 128.

在我们的实验中，我们做出了与该领域以前的出版物一致的模型选择。该神经网络模型具有两个完全连接的隐藏层，每个隐藏层各具有1000个隐藏单元，并且ReLU激活用于最小批量为128的实验。

Due to the cost of updating curvature information, SFO is 5-10x slower per iteration compared to Adam, and has a memory requirement that is linear in the number minibatches.

由于更新曲率信息的成本，SFO与Adam相比，每次迭代的速度慢5-10倍，并且内存需求在微型批处理中是线性的。

Whereas, reducing the minibatch variance through the first moment is more important in CNNs and contributes to the speed-up.

然而，在CNN中降低第一时刻的最小批量差异更为重要，并有助于提高速度。

Though Adam shows marginal improvement over SGD with momentum, it adapts learning rate scale for different layers instead of hand picking manually as in SGD.

尽管Adam在动量方面比SGD略有改善，但它会针对不同层次调整学习率范围，而不是像SGD中那样手动进行选择。

The details for the tasks, baselines, and experimental setup are as follows, and hyperparameter choices with additional implementation details can be found in Appendix A.2

任务，基准和实验设置的详细信息如下，附录A.2中提供了带有其他实现详细信息的超参数选择。

Because we do not have access to a real physical process for evaluating the material and molecule design tasks, Design-bench follows experimental protocol used in prior work (Brookes et al., 2019; Fannjiang & Listgarten, 2020) which obtains a ground truth evaluation function by training a separate regressor model to evaluate the performance of designs.

由于我们无法使用真实的物理过程来评估材料和分子设计任务，因此设计台遵循先前工作中使用的实验方案（Brookes等人，2019年; Fannjiang＆Listgarten，2020年），该方法可获取地面真实性评估通过训练一个单独的回归模型来评估设计性能来发挥作用。

CbAS uses a generative model of p(x) as a trust region to prevent model exploitation, and autofocused oracles expands upon CbAS by iteratively updating the learned proxy function and iterates within a minimax game based on a quantity known as the oracle gap.

CbAS使用p（x）的生成模型作为信任区域来防止模型被利用，并且自动聚焦的Oracle通过迭代更新所学习的代理函数来扩展CbAS，并基于称为Oracle缺口的数量在minimax游戏中进行迭代。

NEMO outperforms all methods on the Superconductor task by a very large margin, under both the 100th and 50th percentile metrics, and in the HopperController task under the 100th percentile metric.

在第100和第50个百分位数指标下，NEMO在超导体任务上的所有方法的性能都非常好，在第100个百分位指标下，其性能在HopperController任务中均胜过所有方法。

These results are promising in that NEMO performs consistently well across all 6 domains evaluated, and indicates a significant number of designs found in the GFP and Superconductor task were better than the best performing design in the dataset.

这些结果令人鼓舞，因为NEMO在所有评估的6个域中始终表现良好，并且表明在GFP和超导体任务中发现的大量设计都比数据集中表现最佳的设计要好。

Conclusion

Our method is aimed towards machine learning problems with large datasets and/or high-dimensional parameter spaces.

我们的方法旨在解决大型数据集和/或高维参数空间的机器学习问题。

Overall, we found Adam to be robust and well-suited to a wide range of non-convex optimization problems in the field machine learning.

总体而言，我们发现Adam非常强大，非常适合于现场机器学习中的各种非凸优化问题。

We have presented NEMO (Normalized Maximum Likelihood Estimation for Model-Based Optimization), an algorithm that mitigates model exploitation on MBO problems by constructing a conservative model of the true function.

我们介绍了NEMO（基于模型的优化的归一化最大似然估计），该算法通过构造真实函数的保守模型来减轻MBO问题上的模型开发。

We evaluated NEMO on a number of design problems in materials science, robotics, biology, and chemistry, where we show that it attains very large improvements on two tasks, while performing competitively with respect to prior methods on the other four.

我们在材料科学，机器人技术，生物学和化学领域的许多设计问题上对NEMO进行了评估，结果表明NEMO在两项任务上均取得了很大的改进，而在其他四项任务上却比以前的方法更具竞争力。

We design a general language for representing algorithms which compute the loss function for value-based model-free RL agents to optimize.

我们设计了一种通用语言来表示算法，该算法可为基于价值的无模型RL代理计算损失函数以进行优化。

We highlight two learned algorithms which although relatively simple, can obtain good generalization performance over a wide range of environments.

我们重点介绍了两种学习算法，这些算法虽然相对简单，但可以在广泛的环境中获得良好的泛化性能。

Our analysis of the learned algorithms sheds insight on their benefit as regularization terms which are similar to recently proposed algorithms.

我们对所学算法的分析揭示了它们作为正则化术语的好处，与最近提出的算法相似。