国产欧美日韩综合在线,亚洲丁香婷婷久久综合激情综合,538在线播放视频

This paper concerns the approximation of smooth, high-dimensional functions from limited samples using polynomials. This task lies at the heart of many applications in computational science and engineering -- notably, those arising from parametric modelling and uncertainty quantification. It is common to use Monte Carlo (MC) sampling in such applications, so as not to succumb to the curse of dimensionality. However, it is well known this strategy is theoretically suboptimal. There are many polynomial spaces of dimension $n$ for which the sample complexity scales log-quadratically in $n$. This well-documented phenomenon has led to a concerted effort to design improved, in fact, near-optimal strategies, whose sample complexities scale log-linearly, or even linearly in $n$. Paradoxically, in this work we show that MC is actually a perfectly good strategy in high dimensions. We first document this phenomenon via several numerical examples. Next, we present a theoretical analysis that resolves this paradox for holomorphic functions of infinitely-many variables. We show that there is a least-squares scheme based on $m$ MC samples whose error decays algebraically fast in $m/\log(m)$, with a rate that is the same as that of the best $n$-term polynomial approximation. This result is non-constructive, since it assumes knowledge of a suitable polynomial space in which to perform the approximation. We next present a compressed sensing-based scheme that achieves the same rate, except for a larger polylogarithmic factor. This scheme is practical, and numerically it performs as well as or better than well-known adaptive least-squares schemes. Overall, our findings demonstrate that MC sampling is eminently suitable for smooth function approximation when the dimension is sufficiently high. Hence the benefits of improved sampling strategies are generically limited to lower-dimensional settings.

相關內容

蒙特卡羅

關注 1

平滑 · FAST · 后向 · 在線 · 極小點 ·

2022 年 10 月 21 日

Fast and numerically stable particle-based online additive smoothing: the AdaSmooth algorithm

Alessandro Mastrototaro,Jimmy Olsson,Johan Alenl?v

from arxiv, 67 pages, 5 figures, 2 tables. Added initial note: "This is an original manuscript of an article published by Taylor & Francis in the Journal of the American Statistical Association (JASA) on 10 October 2022, available online: //www.tandfonline.com/doi/full/10.1080/01621459.2022.2118602"

We present a novel sequential Monte Carlo approach to online smoothing of additive functionals in a very general class of path-space models. Hitherto, the solutions proposed in the literature suffer from either long-term numerical instability due to particle-path degeneracy or, in the case that degeneracy is remedied by particle approximation of the so-called backward kernel, high computational demands. In order to balance optimally computational speed against numerical stability, we propose to furnish a (fast) naive particle smoother, propagating recursively a sample of particles and associated smoothing statistics, with an adaptive backward-sampling-based updating rule which allows the number of (costly) backward samples to be kept at a minimum. This yields a new, function-specific additive smoothing algorithm, AdaSmooth, which is computationally fast, numerically stable and easy to implement. The algorithm is provided with rigorous theoretical results guaranteeing its consistency, asymptotic normality and long-term stability as well as numerical results demonstrating empirically the clear superiority of AdaSmooth to existing algorithms.

機器翻譯 · 模型評估 · Better · 在線 · Extensibility ·

2022 年 10 月 21 日

A Semi-supervised Approach for a Better Translation of Sentiment in Dialectical Arabic UGT

Hadeel Saadany,Constantin Orasan,Emad Mohamed,Ashraf Tantawy

from arxiv, WANLP2022 at EMNLP 2022

In the online world, Machine Translation (MT) systems are extensively used to translate User-Generated Text (UGT) such as reviews, tweets, and social media posts, where the main message is often the author's positive or negative attitude towards the topic of the text. However, MT systems still lack accuracy in some low-resource languages and sometimes make critical translation errors that completely flip the sentiment polarity of the target word or phrase and hence delivers a wrong affect message. This is particularly noticeable in texts that do not follow common lexico-grammatical standards such as the dialectical Arabic (DA) used on online platforms. In this research, we aim to improve the translation of sentiment in UGT written in the dialectical versions of the Arabic language to English. Given the scarcity of gold-standard parallel data for DA-EN in the UGT domain, we introduce a semi-supervised approach that exploits both monolingual and parallel data for training an NMT system initialised by a cross-lingual language model trained with supervised and unsupervised modeling objectives. We assess the accuracy of sentiment translation by our proposed system through a numerical 'sentiment-closeness' measure as well as human evaluation. We will show that our semi-supervised MT system can significantly help with correcting sentiment errors detected in the online translation of dialectical Arabic UGT.

秩 · 核化 · 核函數 · 泛函 · 計算成本 ·

2022 年 10 月 21 日

Numerical rank of kernel functions

Ritesh Khan,V A Kandappan,Sivaram Ambikasaran

from arxiv, 23 pages, 23 figures, 14 tables

We study the rank of sub-matrices arising out of kernel functions, $F(\pmb{x},\pmb{y}): \mathbb{R}^d \times \mathbb{R}^d \mapsto \mathbb{R}$, where $\pmb{x},\pmb{y} \in \mathbb{R}^d$ with $F(\pmb{x},\pmb{y})$ is smooth everywhere except along the line $\pmb{x}=\pmb{y}$. Such kernel functions are frequently encountered in a wide range of applications such as $N$ body problems, Green's functions, integral equations, geostatistics, kriging, Gaussian processes, etc. One of the challenges in dealing with these kernel functions is that the corresponding matrix associated with these kernels is large and dense and thereby, the computational cost of matrix operations is high. In this article, we prove new theorems bounding the numerical rank of sub-matrices arising out of these kernel functions. Under reasonably mild assumptions, we prove that the rank of certain sub-matrices is rank-deficient in finite precision. This rank depends on the dimension of the ambient space and also on the type of interaction between the hyper-cubes containing the corresponding set of particles. This rank structure can be leveraged to reduce the computational cost of certain matrix operations such as matrix-vector products, solving linear systems, etc. We also present numerical results on the growth of rank of certain sub-matrices in $1$D, $2$D, $3$D and $4$D, which, not surprisingly, agrees with the theoretical results.

Learning · 優化器 · 穩健性 · Facebook AI Research · Extensibility ·

2022 年 10 月 19 日

On Tilted Losses in Machine Learning: Theory and Applications

Tian Li,Ahmad Beirami,Maziar Sanjabi,Virginia Smith

from arxiv, arXiv admin note: substantial text overlap with arXiv:2007.01162

Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to the tail probability of losses. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.

Color · 邊 · JACM · SICOMP · 相互獨立的 ·

2022 年 10 月 19 日

Improved Distributed Algorithms for the Lovász Local Lemma and Edge Coloring

Peter Davies

from arxiv, Accepted at SODA 2023

The Lov\'asz Local Lemma is a classic result in probability theory that is often used to prove the existence of combinatorial objects via the probabilistic method. In its simplest form, it states that if we have $n$ `bad events', each of which occurs with probability at most $p$ and is independent of all but $d$ other events, then under certain criteria on $p$ and $d$, all of the bad events can be avoided with positive probability. While the original proof was existential, there has been much study on the algorithmic Lov\'asz Local Lemma: that is, designing an algorithm which finds an assignment of the underlying random variables such that all the bad events are indeed avoided. Notably, the celebrated result of Moser and Tardos [JACM '10] also implied an efficient distributed algorithm for the problem, running in $O(\log^2 n)$ rounds. For instances with low $d$, this was improved to $O(d^2+\log^{O(1)}\log n)$ by Fischer and Ghaffari [DISC '17], a result that has proven highly important in distributed complexity theory (Chang and Pettie [SICOMP '19]). We give an improved algorithm for the Lov\'asz Local Lemma, providing a trade-off between the strength of the criterion relating $p$ and $d$, and the distributed round complexity. In particular, in the same regime as Fischer and Ghaffari's algorithm, we improve the round complexity to $O(\frac{d}{\log d}+\log^{O(1)}\log n)$. At the other end of the trade-off, we obtain a $\log^{O(1)}\log n$ round complexity for a substantially wider regime than previously known. As our main application, we also give the first $\log^{O(1)}\log n$-round distributed algorithm for the problem of $\Delta+o(\Delta)$-edge coloring a graph of maximum degree $\Delta$. This is an almost exponential improvement over previous results: no prior $\log^{o(1)} n$-round algorithm was known even for $2\Delta-2$-edge coloring.

APN · UniFormer · 泛函 · INFORMS · TransE ·

2022 年 10 月 19 日

Two low differentially uniform power permutations over odd characteristic finite fields: APN and differentially $4$-uniform functions

Haode Yan,Sihem Mesnager,Xiantong Tan

Permutation polynomials over finite fields are fundamental objects as they are used in various theoretical and practical applications in cryptography, coding theory, combinatorial design, and related topics. This family of polynomials constitutes an active research area in which advances are being made constantly. In particular, constructing infinite classes of permutation polynomials over finite fields with good differential properties (namely, low) remains an exciting problem despite much research in this direction for many years. This article exhibits low differentially uniform power permutations over finite fields of odd characteristic. Specifically, its objective is twofold concerning the power functions $F(x)=x^{\frac{p^n+3}{2}}$ defined over the finite field $F_{p^n}$ of order $p^n$, where $p$ is an odd prime, and $n$ is a positive integer. The first is to complement some former results initiated by Helleseth and Sandberg in \cite{HS} by solving the open problem left open for more than twenty years concerning the determination of the differential spectrum of $F$ when $p^n\equiv3\pmod 4$ and $p\neq 3$. The second is to determine the exact value of its differential uniformity. Our achievements are obtained firstly by evaluating some exponential sums over $F_{p^n}$ (which amounts to evaluating the number of $F_{p^n}$-rational points on some related curves and secondly by computing the number of solutions in $(F_{p^n})^4$ of a system of equations presented by Helleseth, Rong, and Sandberg in ["New families of almost perfect nonlinear power mappings," IEEE Trans. Inform. Theory, vol. 45. no. 2, 1999], naturally appears while determining the differential spectrum of $F$. We show that in the considered case ($p^n\equiv3\pmod 4$ and $p\neq 3$), $F$ is an APN power permutation when $p^n=11$, and a differentially $4$-uniform power permutation otherwise.

Performer · 估計/估計量 · Machine Learning · Learning · Better ·

2022 年 10 月 19 日

Performance of different machine learning methods on activity recognition and pose estimation datasets

Love Trivedi,Raviit Vij

from arxiv, 14

With advancements in computer vision taking place day by day, recently a lot of light is being shed on activity recognition. With the range for real-world applications utilizing this field of study increasing across a multitude of industries such as security and healthcare, it becomes crucial for businesses to distinguish which machine learning methods perform better than others in the area. This paper strives to aid in this predicament i.e. building upon previous related work, it employs both classical and ensemble approaches on rich pose estimation (OpenPose) and HAR datasets. Making use of appropriate metrics to evaluate the performance for each model, the results show that overall, random forest yields the highest accuracy in classifying ADLs. Relatively all the models have excellent performance across both datasets, except for logistic regression and AdaBoost perform poorly in the HAR one. With the limitations of this paper also discussed in the end, the scope for further research is vast, which can use this paper as a base in aims of producing better results.

高斯過程回歸 · 數據點 · 核化 · 模型評估 · Processing（編程語言） ·

2022 年 10 月 18 日

Equispaced Fourier representations for efficient Gaussian process regression from a billion data points

Philip Greengard,Manas Rachh,Alex Barnett

We introduce a Fourier-based fast algorithm for Gaussian process regression. It approximates a translationally-invariant covariance kernel by complex exponentials on an equispaced Cartesian frequency grid of $M$ nodes. This results in a weight-space $M\times M$ system matrix with Toeplitz structure, which can thus be applied to a vector in ${\mathcal O}(M \log{M})$ operations via the fast Fourier transform (FFT), independent of the number of data points $N$. The linear system can be set up in ${\mathcal O}(N + M \log{M})$ operations using nonuniform FFTs. This enables efficient massive-scale regression via an iterative solver, even for kernels with fat-tailed spectral densities (large $M$). We include a rigorous error analysis of the kernel approximation, the resulting accuracy (relative to "exact" GP regression), and the condition number. Numerical experiments for squared-exponential and Mat\'ern kernels in one, two and three dimensions often show 1-2 orders of magnitude acceleration over state-of-the-art rank-structured solvers at comparable accuracy. Our method allows 2D Mat\'ern-${\small \frac{3}{2}}$ regression from $N=10^9$ data points to be performed in 2 minutes on a standard desktop, with posterior mean accuracy $10^{-3}$. This opens up spatial statistics applications 100 times larger than previously possible.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

樣本 · 類別 · 損失 · Performer · SimPLe ·

2019 年 1 月 16 日

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie

from arxiv, Code is available at: //github.com/richardaecn/class-balanced-loss

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.