精品夜色国产国偷自产乱码_夜夜爽一区二区三区视频_三级小视频在线观看_国产真实交换3P视频_免费一级毛片在线播放视频_亚欧全黄高清一级_亚洲国产综合另类在线观看

In this note we compute the logarithmic energy of points in the unit interval $[-1,1]$ chosen from different Gegenbauer Determinantal Point Processes. We check that all the different families of Gegenbauer polynomials yield the same asymptotic result to third order, we compute exactly the value for Chebyshev polynomials and we give a closed expresion for the minimal possible logarithmic energy. The comparison suggests that DPPs cannot match the value of the minimum beyond the third asymptotic term.

相關內容

切比雪夫(fu)多項式

關注 0

切(qie)比(bi)(bi)雪(xue)(xue)夫(fu)多(duo)(duo)項(xiang)式(shi)(shi)是(shi)以(yi)俄國著(zhu)名數(shu)(shu)學(xue)家切(qie)比(bi)(bi)雪(xue)(xue)夫(fu)(Tschebyscheff，又譯契貝雪(xue)(xue)夫(fu)等(deng)，1821一1894)的(de)(de)(de)名字命名的(de)(de)(de)重要(yao)(yao)的(de)(de)(de)特殊函(han)數(shu)(shu)，第一類(lei)切(qie)比(bi)(bi)雪(xue)(xue)夫(fu)多(duo)(duo)項(xiang)式(shi)(shi)Tn和第二(er)類(lei)切(qie)比(bi)(bi)雪(xue)(xue)夫(fu)多(duo)(duo)項(xiang)式(shi)(shi)Un(簡(jian)稱切(qie)比(bi)(bi)雪(xue)(xue)夫(fu)多(duo)(duo)項(xiang)式(shi)(shi))。源起于多(duo)(duo)倍角的(de)(de)(de)余(yu)弦函(han)數(shu)(shu)和正弦函(han)數(shu)(shu)的(de)(de)(de)展開式(shi)(shi)，是(shi)與棣美(mei)弗定(ding)理(li)有(you)關(guan)、以(yi)遞(di)歸(gui)方式(shi)(shi)定(ding)義的(de)(de)(de)多(duo)(duo)項(xiang)式(shi)(shi)序列，是(shi)計算數(shu)(shu)學(xue)中(zhong)的(de)(de)(de)一類(lei)特殊函(han)數(shu)(shu)，對(dui)于注入連續(xu)函(han)數(shu)(shu)逼(bi)近問(wen)題(ti)(ti)，阻抗(kang)變(bian)換問(wen)題(ti)(ti)等(deng)等(deng)的(de)(de)(de)數(shu)(shu)學(xue)、物理(li)學(xue)、技術科(ke)學(xue)中(zhong)的(de)(de)(de)近似計算有(you)著(zhu)非(fei)常重要(yao)(yao)的(de)(de)(de)作用。

對抗學習 · 假設空間 · 數據生成分布 · GANs · 學成 ·

2021 年 12 月 3 日

A Convenient Infinite Dimensional Framework for Generative Adversarial Learning

Hayk Asatryan,Hanno Gottschalk,Marieke Lippert,Matthias Rottmann

In recent years, generative adversarial networks (GANs) have demonstrated impressive experimental results while there are only a few works that foster statistical learning theory for GANs. In this work, we propose an infinite dimensional theoretical framework for generative adversarial learning. Assuming the class of uniformly bounded $k$-times $\alpha$-H\"older differentiable and uniformly positive densities, we show that the Rosenblatt transformation induces an optimal generator, which is realizable in the hypothesis space of $\alpha$-H\"older differentiable generators. With a consistent definition of the hypothesis space of discriminators, we further show that in our framework the Jensen-Shannon divergence between the distribution induced by the generator from the adversarial learning procedure and the data generating distribution converges to zero. Under sufficiently strict regularity assumptions on the density of the data generating process, we also provide rates of convergence based on concentration and chaining.

Performer · 中國電子信息產業集團有限公司 · state-of-the-art · 示例 · 路徑 ·

2021 年 12 月 3 日

New formulations and branch-and-cut procedures for the longest induced path problem

Ruslán G. Marzo,Rafael A. Melo,Celso C. Ribeiro,Marcio C. Santos

Given an undirected graph $G=(V,E)$, the longest induced path problem (LIPP) consists of obtaining a maximum cardinality subset $W\subseteq V$ such that $W$ induces a simple path in $G$. In this paper, we propose two new formulations with an exponential number of constraints for the problem, together with effective branch-and-cut procedures for its solution. While the first formulation (cec) is based on constraints that explicitly eliminate cycles, the second one (cut) ensures connectivity via cutset constraints. We compare, both theoretically and experimentally, the newly proposed approaches with a state-of-the-art formulation recently proposed in the literature. More specifically, we show that the polyhedra defined by formulation cut and that of the formulation available in the literature are the same. Besides, we show that these two formulations are stronger in theory than cec. We also propose a new branch-and-cut procedure using the new formulations. Computational experiments show that the newly proposed formulation cec, although less strong from a theoretical point of view, is the best performing approach as it can solve all but one of the 1065 benchmark instances used in the literature within the given time limit. In addition, our newly proposed approaches outperform the state-of-the-art formulation when it comes to the median times to solve the instances to optimality. Furthermore, we perform extended computational experiments considering more challenging and hard-to-solve larger instances and evaluate the impacts on the results when offering initial feasible solutions (warm starts) to the formulations.

Networking · Neural Networks · 近似 · 激活函數 · 泛函 ·

2021 年 12 月 3 日

Quantitative approximation results for complex-valued neural networks

A. Caragea,D. G. Lee,J. Maly,G. Pfander,F. Voigtlaender

Until recently, applications of neural networks in machine learning have almost exclusively relied on real-valued networks. It was recently observed, however, that complex-valued neural networks (CVNNs) exhibit superior performance in applications in which the input is naturally complex-valued, such as MRI fingerprinting. While the mathematical theory of real-valued networks has, by now, reached some level of maturity, this is far from true for complex-valued networks. In this paper, we analyze the expressivity of complex-valued networks by providing explicit quantitative error bounds for approximating $C^n$ functions on compact subsets of $\mathbb{C}^d$ by complex-valued neural networks that employ the modReLU activation function, given by $\sigma(z) = \mathrm{ReLU}(|z| - 1) \, \mathrm{sgn} (z)$, which is one of the most popular complex activation functions used in practice. We show that the derived approximation rates are optimal (up to log factors) in the class of modReLU networks with weights of moderate growth.

ROC · 估計/估計量 · 推斷 · 受試者工作特征 · 估計誤差 ·

2021 年 12 月 3 日

Inference for ROC Curves Based on Estimated Predictive Indices

Yu-Chin Hsu,Robert P. Lieli

We provide a comprehensive theory of conducting in-sample statistical inference about receiver operating characteristic (ROC) curves that are based on predicted values from a first stage model with estimated parameters (such as a logit regression). The term "in-sample" refers to the practice of using the same data for model estimation (training) and subsequent evaluation, i.e., the construction of the ROC curve. We show that in this case the first stage estimation error has a generally non-negligible impact on the asymptotic distribution of the ROC curve and develop the appropriate pointwise and functional limit theory. We propose methods for simulating the distribution of the limit process and show how to use the results in practice in comparing ROC curves.

穩健性 · 貝葉斯最優分類器 · 替代損失 · 優化器 · 評論員 ·

2021 年 12 月 3 日

On the Existence of the Adversarial Bayes Classifier (Extended Version)

Pranjal Awasthi,Natalie S. Frank,Mehryar Mohri

from arxiv, 49 pages, 8 figures. Extended version of the paper "On the Existence of the Adversarial Bayes Classifier" published in NeurIPS

Adversarial robustness is a critical property in a variety of modern machine learning applications. While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open. In this work, we study a fundamental question regarding Bayes optimality for adversarial robustness. We provide general sufficient conditions under which the existence of a Bayes optimal classifier can be guaranteed for adversarial robustness. Our results can provide a useful tool for a subsequent study of surrogate losses in adversarial robustness and their consistency properties. This manuscript is the extended version of the paper "On the Existence of the Adversarial Bayes Classifier" published in NeurIPS. The results of the original paper did not apply to some non-strictly convex norms. Here we extend our results to all possible norms.

貪心逐層預訓練 · 貪心 · 正交 · 優化器 · Notability ·

2021 年 12 月 2 日

Optimal Convergence Rates for the Orthogonal Greedy Algorithm

Jonathan W. Siegel,Jinchao Xu

We analyze the orthogonal greedy algorithm when applied to dictionaries $\mathbb{D}$ whose convex hull has small entropy. We show that if the metric entropy of the convex hull of $\mathbb{D}$ decays at a rate of $O(n^{-\frac{1}{2}-\alpha})$ for $\alpha > 0$, then the orthogonal greedy algorithm converges at the same rate on the variation space of $\mathbb{D}$. This improves upon the well-known $O(n^{-\frac{1}{2}})$ convergence rate of the orthogonal greedy algorithm in many cases, most notably for dictionaries corresponding to shallow neural networks. These results hold under no additional assumptions on the dictionary beyond the decay rate of the entropy of its convex hull. In addition, they are robust to noise in the target function and can be extended to convergence rates on the interpolation spaces of the variation norm. Finally, we show that these improved rates are sharp and prove a negative result showing that the iterates generated by the orthogonal greedy algorithm cannot in general be bounded in the variation norm of $\mathbb{D}$.

contrastive · 推斷 · Performer · Better · 可約的 ·

2021 年 10 月 19 日

Contrastive Active Inference

Pietro Mazzaglia,Tim Verbelen,Bart Dhoedt

from arxiv, Accepted as a conference paper at 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Active inference is a unifying theory for perception and action resting upon the idea that the brain maintains an internal model of the world by minimizing free energy. From a behavioral perspective, active inference agents can be seen as self-evidencing beings that act to fulfill their optimistic predictions, namely preferred outcomes or goals. In contrast, reinforcement learning requires human-designed rewards to accomplish any desired outcome. Although active inference could provide a more natural self-supervised objective for control, its applicability has been limited because of the shortcomings in scaling the approach to complex environments. In this work, we propose a contrastive objective for active inference that strongly reduces the computational burden in learning the agent's generative model and planning future actions. Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train. We compare to reinforcement learning agents that have access to human-designed reward functions, showing that our approach closely matches their performance. Finally, we also show that contrastive methods perform significantly better in the case of distractors in the environment and that our method is able to generalize goals to variations in the background.

Continuity · 策略評估 · Principle · Extensibility · 學成 ·

2021 年 4 月 13 日

Learning and Planning in Complex Action Spaces

Thomas Hubert,Julian Schrittwieser,Ioannis Antonoglou,Mohammadamin Barekatain,Simon Schmitt,David Silver

Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

層規范化 · 規范化的 · 層 · 變換 · 學習率 ·

2020 年 2 月 12 日

On Layer Normalization in the Transformer Architecture

Ruibin Xiong,Yunchang Yang,Di He,Kai Zheng,Shuxin Zheng,Chen Xing,Huishuai Zhang,Yanyan Lan,Liwei Wang,Tie-Yan Liu

The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial to the final performance but will slow down the optimization and bring more hyper-parameter tunings. In this paper, we first study theoretically why the learning rate warm-up stage is essential and show that the location of layer normalization matters. Specifically, we prove with mean field theory that at initialization, for the original-designed Post-LN Transformer, which places the layer normalization between the residual blocks, the expected gradients of the parameters near the output layer are large. Therefore, using a large learning rate on those gradients makes the training unstable. The warm-up stage is practically helpful for avoiding this problem. On the other hand, our theory also shows that if the layer normalization is put inside the residual blocks (recently proposed as Pre-LN Transformer), the gradients are well-behaved at initialization. This motivates us to remove the warm-up stage for the training of Pre-LN Transformers. We show in our experiments that Pre-LN Transformers without the warm-up stage can reach comparable results with baselines while requiring significantly less training time and hyper-parameter tuning on a wide range of applications.

Networking · Neural Networks · 優化器 · contrastive · CASE ·

2018 年 8 月 3 日

A Dual Approach to Scalable Verification of Deep Networks

Krishnamurthy, Dvijotham,Robert Stanforth,Sven Gowal,Timothy Mann,Pushmeet Kohli

This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.