云南虫谷在线观看免费观看电视剧_18GAY国产小鲜肉可播放_亚洲欧洲视频图片_亚洲免费在线观看一区二区三区_国产成人精品一二区在线观_免费A级毛片无码免费视频免费_国产卡1卡2卡3麻豆精品免费

PCA-Net is a recently proposed neural operator architecture which combines principal component analysis (PCA) with neural networks to approximate operators between infinite-dimensional function spaces. The present work develops approximation theory for this approach, improving and significantly extending previous work in this direction: First, a novel universal approximation result is derived, under minimal assumptions on the underlying operator and the data-generating distribution. Then, two potential obstacles to efficient operator learning with PCA-Net are identified, and made precise through lower complexity bounds; the first relates to the complexity of the output distribution, measured by a slow decay of the PCA eigenvalues. The other obstacle relates to the inherent complexity of the space of operators between infinite-dimensional input and output spaces, resulting in a rigorous and quantifiable statement of the curse of dimensionality. In addition to these lower bounds, upper complexity bounds are derived. A suitable smoothness criterion is shown to ensure an algebraic decay of the PCA eigenvalues. Furthermore, it is shown that PCA-Net can overcome the general curse of dimensionality for specific operators of interest, arising from the Darcy flow and the Navier-Stokes equations.

相關內容

PCA

關注 3

在(zai)統計中，主成(cheng)分(fen)分(fen)析（PCA）是一(yi)種通過最大化每個維(wei)度(du)的(de)(de)(de)方(fang)差來將(jiang)較高(gao)維(wei)度(du)空(kong)間中的(de)(de)(de)數據投(tou)影(ying)到較低維(wei)度(du)空(kong)間中的(de)(de)(de)方(fang)法。給定(ding)(ding)二維(wei)，三維(wei)或更高(gao)維(wei)空(kong)間中的(de)(de)(de)點(dian)集合，可以將(jiang)“最佳擬合”線(xian)定(ding)(ding)義為最小化從點(dian)到線(xian)的(de)(de)(de)平(ping)均(jun)平(ping)方(fang)距離(li)的(de)(de)(de)線(xian)。可以從垂直(zhi)于第一(yi)條直(zhi)線(xian)的(de)(de)(de)方(fang)向類似地選擇下一(yi)條最佳擬合線(xian)。重復(fu)此過程會產生一(yi)個正交的(de)(de)(de)基(ji)礎，其中數據的(de)(de)(de)不同單(dan)個維(wei)度(du)是不相(xiang)關(guan)的(de)(de)(de)。這些(xie)基(ji)向量稱為主成(cheng)分(fen)。

估計/估計量 · 線性的 · 情景 · 優化器 · 線性回歸 ·

2023 年 7 月 13 日

A zero-estimator approach for estimating the signal level in a high-dimensional regression setting

Ilan Livne,David Azriel,Yair Goldberg

from arxiv, arXiv admin note: text overlap with arXiv:2205.05341, arXiv:2102.07203

Analysis of high-dimensional data, where the number of covariates is larger than the sample size, is a topic of current interest. In such settings, an important goal is to estimate the signal level $\tau^2$ and noise level $\sigma^2$, i.e., to quantify how much variation in the response variable can be explained by the covariates, versus how much of the variation is left unexplained. This thesis considers the estimation of these quantities in a semi-supervised setting, where for many observations only the vector of covariates $X$ is given with no responses $Y$. Our main research question is: how can one use the unlabeled data to better estimate $\tau^2$ and $\sigma^2$? We consider two frameworks: a linear regression model and a linear projection model in which linearity is not assumed. In the first framework, while linear regression is used, no sparsity assumptions on the coefficients are made. In the second framework, the linearity assumption is also relaxed and we aim to estimate the signal and noise levels defined by the linear projection. We first propose a naive estimator which is unbiased and consistent, under some assumptions, in both frameworks. We then show how the naive estimator can be improved by using zero-estimators, where a zero-estimator is a statistic arising from the unlabeled data, whose expected value is zero. In the first framework, we calculate the optimal zero-estimator improvement and discuss ways to approximate the optimal improvement. In the second framework, such optimality does no longer hold and we suggest two zero-estimators that improve the naive estimator although not necessarily optimally. Furthermore, we show that our approach reduces the variance for general initial estimators and we present an algorithm that potentially improves any initial estimator. Lastly, we consider four datasets and study the performance of our suggested methods.

線性的 · 優化器 · Analysis · BASIC · CASE ·

2023 年 7 月 12 日

Linearization Algorithms for Fully Composite Optimization

Maria-Luiza Vladarean,Nikita Doikov,Martin Jaggi,Nicolas Flammarion

This paper studies first-order algorithms for solving fully composite optimization problems over convex and compact sets. We leverage the structure of the objective by handling its differentiable and non-differentiable components separately, linearizing only the smooth parts. This provides us with new generalizations of the classical Frank-Wolfe method and the Conditional Gradient Sliding algorithm, that cater to a subclass of non-differentiable problems. Our algorithms rely on a stronger version of the linear minimization oracle, which can be efficiently implemented in several practical applications. We provide the basic version of our method with an affine-invariant analysis and prove global convergence rates for both convex and non-convex objectives. Furthermore, in the convex case, we propose an accelerated method with correspondingly improved complexity. Finally, we provide illustrative experiments to support our theoretical results.

可約的 · 估計/估計量 · Analysis · CASES · 值域 ·

2023 年 7 月 12 日

Reduced basis method for non-symmetric eigenvalue problems: application to the multigroup neutron diffusion equations

Yonah Conjungo Taumhas,Geneviève Dusson,Virginie Ehrlacher,Tony Lelièvre,Fran?ois Madiot

In this article, we propose a reduced basis method for parametrized non-symmetric eigenvalue problems arising in the loading pattern optimization of a nuclear core in neutronics. To this end, we derive a posteriori error estimates for the eigenvalue and left and right eigenvectors. The practical computation of these estimators requires the estimation of a constant called prefactor, which we can express as the spectral norm of some operator. We provide some elements of theoretical analysis which illustrate the link between the expression of the prefactor we obtain here and its well-known expression in the case of symmetric eigenvalue problems, either using the notion of numerical range of the operator, or via a perturbative analysis. Lastly, we propose a practical method in order to estimate this prefactor which yields interesting numerical results on actual test cases. We provide detailed numerical simulations on two-dimensional examples including a multigroup neutron diffusion equation.

contrastive · Learning · 對比學習 · Performer · 樣本 ·

2023 年 7 月 12 日

Contrastive Learning for Conversion Rate Prediction

Wentao Ouyang,Rui Dong,Xiuwu Zhang,Chaofeng Guo,Jinmei Luo,Xiangzheng Liu,Yanlong Du

from arxiv, SIGIR 2023

Conversion rate (CVR) prediction plays an important role in advertising systems. Recently, supervised deep neural network-based models have shown promising performance in CVR prediction. However, they are data hungry and require an enormous amount of training data. In online advertising systems, although there are millions to billions of ads, users tend to click only a small set of them and to convert on an even smaller set. This data sparsity issue restricts the power of these deep models. In this paper, we propose the Contrastive Learning for CVR prediction (CL4CVR) framework. It associates the supervised CVR prediction task with a contrastive learning task, which can learn better data representations exploiting abundant unlabeled data and improve the CVR prediction performance. To tailor the contrastive learning task to the CVR prediction problem, we propose embedding masking (EM), rather than feature masking, to create two views of augmented samples. We also propose a false negative elimination (FNE) component to eliminate samples with the same feature as the anchor sample, to account for the natural property in user behavior data. We further propose a supervised positive inclusion (SPI) component to include additional positive samples for each anchor sample, in order to make full use of sparse but precious user conversion events. Experimental results on two real-world conversion datasets demonstrate the superior performance of CL4CVR. The source code is available at //github.com/DongRuiHust/CL4CVR.

2023 年 7 月 11 日

Experimental designs for controlling the correlation of estimators in two parameter models

Edgar Benitez,Jesús López-Fidalgo

from arxiv, 30 pages, 8 figures, 5 tables

The state of the art related to parameter correlation in two-parameter models has been reviewed in this paper. The apparent contradictions between the different authors regarding the ability of D--optimality to simultaneously reduce the correlation and the area of the confidence ellipse in two-parameter models were analyzed. Two main approaches were found: 1) those who consider that the optimality criteria simultaneously control the precision and correlation of the parameter estimators; and 2) those that consider a combination of criteria to achieve the same objective. An analytical criterion combining in its structure both the optimality of the precision of the estimators of the parameters and the reduction of the correlation between their estimators is provided. The criterion was tested both in a simple linear regression model, considering all possible design spaces, and in a non-linear model with strong correlation of the estimators of the parameters (Michaelis--Menten) to show its performance. This criterion showed a superior behavior to all the strategies and criteria to control at the same time the precision and the correlation.

泛化誤差 · Learning · 泛化理論 · 統計量 · 優化器 ·

2023 年 7 月 11 日

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

Kevin Scaman,Mathieu Even,Laurent Massoulié

from arxiv, 18 pages, 0 figures

In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.

優化器 · Learning · Processing（編程語言） · 控制器 · 強化學習 ·

2023 年 7 月 11 日

Reinforcement Learning with Non-Cumulative Objective

Wei Cui,Wei Yu

from arxiv, 13 pages, 6 figures. To appear in IEEE Transactions on Machine Learning in Communications and Networking (TMLCN)

In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.

Learning · Neural Networks · Networking · 可約的 · Networks ·

2022 年 9 月 1 日

Learning with Differentiable Algorithms

Felix Petersen

from arxiv, PhD thesis (summa cum laude), University of Konstanz, 162 pages

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

小樣本學習 · 標注 · 學成 · Extensibility · 噪聲 ·

2022 年 4 月 12 日

Few-shot Learning with Noisy Labels

Kevin J Liang,Samrudhdhi B. Rangrej,Vladan Petrovic,Tal Hassner

from arxiv, Accepted to CVPR 2022

Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples when training on novel classes. This assumption can often be unrealistic: support sets, no matter how small, can still include mislabeled samples. Robustness to label noise is therefore essential for FSL methods to be practical, but this problem surprisingly remains largely unexplored. To address mislabeled samples in FSL settings, we make several technical contributions. (1) We offer simple, yet effective, feature aggregation methods, improving the prototypes used by ProtoNet, a popular FSL technique. (2) We describe a novel Transformer model for Noisy Few-Shot Learning (TraNFS). TraNFS leverages a transformer's attention mechanism to weigh mislabeled versus correct samples. (3) Finally, we extensively test these methods on noisy versions of MiniImageNet and TieredImageNet. Our results show that TraNFS is on-par with leading FSL methods on clean support sets, yet outperforms them, by far, in the presence of label noise.

Vision · 模型評估 · 可約的 · 計算機視覺 · DNN ·

2020 年 3 月 24 日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Abhinav Goel,Caleb Tung,Yung-Hsiang Lu,George K. Thiruvathukal

from arxiv, Accepted for publication at 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA 2020

Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.