国产欧美日韩综合在线,亚洲AV永久无码精品九之

Given $n$ noisy samples with $p$ dimensions, where $n \ll p$, we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\it Thresholded Lasso}, can accurately estimate a sparse vector $\beta \in {\mathbb R}^p$ in a linear model $Y = X \beta + \epsilon$, where $X_{n \times p}$ is a design matrix normalized to have column $\ell_2$-norm $\sqrt{n}$, and $\epsilon \sim N(0, \sigma^2 I_n)$. We show that under the restricted eigenvalue (RE) condition, it is possible to achieve the $\ell_2$ loss within a logarithmic factor of the ideal mean square error one would achieve with an {\em oracle } while selecting a sufficiently sparse model -- hence achieving {\it sparse oracle inequalities}; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. We also show for the Gauss-Dantzig selector (Cand\`{e}s-Tao 07), if $X$ obeys a uniform uncertainty principle, one will achieve the sparse oracle inequalities as above, while allowing at most $s_0$ irrelevant variables in the model in the worst case, where $s_0 \leq s$ is the smallest integer such that for $\lambda = \sqrt{2 \log p/n}$, $\sum_{i=1}^p \min(\beta_i^2, \lambda^2 \sigma^2) \leq s_0 \lambda^2 \sigma^2$. Our simulation results on the Thresholded Lasso match our theoretical analysis excellently.

相關內容

閾(yu)值

關注 0

線性的 · Vision · motivation · 蒙特卡羅 · 計算機視覺 ·

2023 年 11 月 10 日

Average degree of the essential variety

Paul Breiding,Samantha Fairchild,Pierpaola Santarsiero,Elima Shehu

from arxiv, 18 pages, 2 figures, code included in source files

The essential variety is an algebraic subvariety of dimension $5$ in real projective space $\mathbb R\mathrm P^{8}$ which encodes the relative pose of two calibrated pinhole cameras. The $5$-point algorithm in computer vision computes the real points in the intersection of the essential variety with a linear space of codimension $5$. The degree of the essential variety is $10$, so this intersection consists of 10 complex points in general. We compute the expected number of real intersection points when the linear space is random. We focus on two probability distributions for linear spaces. The first distribution is invariant under the action of the orthogonal group $\mathrm{O}(9)$ acting on linear spaces in $\mathbb R\mathrm P^{8}$. In this case, the expected number of real intersection points is equal to $4$. The second distribution is motivated from computer vision and is defined by choosing 5 point correspondences in the image planes $\mathbb R\mathrm P^2\times \mathbb R\mathrm P^2$ uniformly at random. A Monte Carlo computation suggests that with high probability the expected value lies in the interval $(3.95 - 0.05,\ 3.95 + 0.05)$.

近似 · Performer · Extensibility · 相互獨立的 · 近似誤差 ·

2023 年 11 月 9 日

Randomized low-rank approximation of parameter-dependent matrices

Daniel Kressner,Hei Yin Lam

This work considers the low-rank approximation of a matrix $A(t)$ depending on a parameter $t$ in a compact set $D \subset \mathbb{R}^d$. Application areas that give rise to such problems include computational statistics and dynamical systems. Randomized algorithms are an increasingly popular approach for performing low-rank approximation and they usually proceed by multiplying the matrix with random dimension reduction matrices (DRMs). Applying such algorithms directly to $A(t)$ would involve different, independent DRMs for every $t$, which is not only expensive but also leads to inherently non-smooth approximations. In this work, we propose to use constant DRMs, that is, $A(t)$ is multiplied with the same DRM for every $t$. The resulting parameter-dependent extensions of two popular randomized algorithms, the randomized singular value decomposition and the generalized Nystr\"{o}m method, are computationally attractive, especially when $A(t)$ admits an affine linear decomposition with respect to $t$. We perform a probabilistic analysis for both algorithms, deriving bounds on the expected value as well as failure probabilities for the approximation error when using Gaussian random DRMs. Both, the theoretical results and numerical experiments, show that the use of constant DRMs does not impair their effectiveness; our methods reliably return quasi-best low-rank approximations.

binary · Extensibility · Performer · 規范化的 · 類別 ·

2023 年 11 月 9 日

An extension of the Unified Skew-Normal family of distributions and application to Bayesian binary regression

Paolo Onorati,Brunero Liseo

We consider the general problem of Bayesian binary regression and we introduce a new class of distributions, the Perturbed Unified Skew Normal (pSUN, henceforth), which generalizes the Unified Skew-Normal (SUN) class. We show that the new class is conjugate to any binary regression model, provided that the link function may be expressed as a scale mixture of Gaussian densities. We discuss in detail the popular logit case, and we show that, when a logistic regression model is combined with a Gaussian prior, posterior summaries such as cumulants and normalizing constants can be easily obtained through the use of an importance sampling approach, opening the way to straightforward variable selection procedures. For more general priors, the proposed methodology is based on a simple Gibbs sampler algorithm. We also claim that, in the p > n case, the proposed methodology shows better performances - both in terms of mixing and accuracy - compared to the existing methods. We illustrate the performance through several simulation studies and two data analyses.

Extensibility · 控制器 · 粵港澳大灣區數字經濟研究院 · 展開 · 編程語言 ·

2023 年 11 月 9 日

Three non-cubical applications of extension types

Tesla Zhang

from arxiv, 11 pages

The development of cubical type theory inspired the idea of "extension types" which has been found to have applications in other type theories that are unrelated to homotopy type theory or cubical type theory. This article describes these applications, including on records, metaprogramming, controlling unfolding, and some more exotic ones.

極小點 · 特征選擇 · 閾值 · 貝葉斯誤差 · binary ·

2023 年 11 月 8 日

Distribution-free tests for lossless feature selection in classification and regression

László Gy?rfi,Tamás Linder,Harro Walk

from arxiv, 22 pages

We study the problem of lossless feature selection for a $d$-dimensional feature vector $X=(X^{(1)},\dots ,X^{(d)})$ and label $Y$ for binary classification as well as nonparametric regression. For an index set $S\subset \{1,\dots ,d\}$, consider the selected $|S|$-dimensional feature subvector $X_S=(X^{(i)}, i\in S)$. If $L^*$ and $L^*(S)$ stand for the minimum risk based on $X$ and $X_S$, respectively, then $X_S$ is called lossless if $L^*=L^*(S)$. For classification, the minimum risk is the Bayes error probability, while in regression, the minimum risk is the residual variance. We introduce nearest-neighbor based test statistics to test the hypothesis that $X_S$ is lossless. For the threshold $a_n=\log n/\sqrt{n}$, the corresponding tests are proved to be consistent under conditions on the distribution of $(X,Y)$ that are significantly milder than in previous work. Also, our threshold is dimension-independent, in contrast to earlier methods where for large $d$ the threshold becomes too large to be useful in practice.

多峰值 · Learning · DAG · 表示 · 無監督 ·

2023 年 11 月 8 日

Causal disentanglement of multimodal data

Elise Walker,Jonas A. Actor,Carianne Martinez,Nathaniel Trask

Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.

多樣性 · 閾值 · 估計/估計量 · 變換 · Learning ·

2023 年 11 月 8 日

Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression

Allan Raventós,Mansheej Paul,Feng Chen,Surya Ganguli

from arxiv, The first two authors contributed equally

Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating any weights. This raises a foundational question: can ICL solve fundamentally $\textit{new}$ tasks that are very different from those seen during pretraining? To probe this question, we examine ICL's performance on linear regression while varying the diversity of tasks in the pretraining dataset. We empirically demonstrate a $\textit{task diversity threshold}$ for the emergence of ICL. Below this threshold, the pretrained transformer cannot solve unseen regression tasks, instead behaving like a Bayesian estimator with the $\textit{non-diverse pretraining task distribution}$ as the prior. Beyond this threshold, the transformer significantly outperforms this estimator; its behavior aligns with that of ridge regression, corresponding to a Gaussian prior over $\textit{all tasks}$, including those not seen during pretraining. Thus, when pretrained on data with task diversity greater than the threshold, transformers $\textit{can}$ optimally solve fundamentally new tasks in-context. Importantly, this capability hinges on it deviating from the Bayes optimal estimator with the pretraining distribution as the prior. This study also explores the effect of regularization, model capacity and task structure and underscores, in a concrete example, the critical role of task diversity, alongside data and model scale, in the emergence of ICL. Code is available at //github.com/mansheej/icl-task-diversity.

圖 · Performer · CASES · CC · 情景 ·

2023 年 11 月 8 日

Computing pivot-minors

Konrad K. Dabrowski,Fran?ois Dross,Jisu Jeong,Mamadou Moustapha Kanté,O-joung Kwon,Sang-il Oum,Dani?l Paulusma

from arxiv, 33 pages, 9 figures. An extended abstract appeared in the proceedings of WG2018

A graph $G$ contains a graph $H$ as a pivot-minor if $H$ can be obtained from $G$ by applying a sequence of vertex deletions and edge pivots. Pivot-minors play an important role in the study of rank-width. Pivot-minors have mainly been studied from a structural perspective. In this paper we perform the first systematic computational complexity study of pivot-minors. We first prove that the Pivot-Minor problem, which asks if a given graph $G$ contains a pivot-minor isomorphic to a given graph $H$, is NP-complete. If $H$ is not part of the input, we denote the problem by $H$-Pivot-Minor. We give a certifying polynomial-time algorithm for $H$-Pivot-Minor when (1) $H$ is an induced subgraph of $P_3+tP_1$ for some integer $t\geq 0$, (2) $H=K_{1,t}$ for some integer $t\geq 1$, or (3) $|V(H)|\leq 4$ except when $H \in \{K_4,C_3+ P_1\}$. Let ${\cal F}_H$ be the set of induced-subgraph-minimal graphs that contain a pivot-minor isomorphic to $H$. To prove the above statement, we either show that there is an integer $c_H$ such that all graphs in ${\cal F}_H$ have at most $c_H$ vertices, or we determine ${\cal F}_H$ precisely, for each of the above cases.

有向 · GROUP · 向量化 · 情景 · 概率密度函數 ·

2023 年 11 月 8 日

Multivariate generalized Pareto distributions along extreme directions

Anas Mourahib,Anna Kiriliouk,Johan Segers

When modeling a vector of risk variables, extreme scenarios are often of special interest. The peaks-over-thresholds method hinges on the notion that, asymptotically, the excesses over a vector of high thresholds follow a multivariate generalized Pareto distribution. However, existing literature has primarily concentrated on the setting when all risk variables are always large simultaneously. In reality, this assumption is often not met, especially in high dimensions. In response to this limitation, we study scenarios where distinct groups of risk variables may exhibit joint extremes while others do not. These discernible groups are derived from the angular measure inherent in the corresponding max-stable distribution, whence the term extreme direction. We explore such extreme directions within the framework of multivariate generalized Pareto distributions, with a focus on their probability density functions in relation to an appropriate dominating measure. Furthermore, we provide a stochastic construction that allows any prespecified set of risk groups to constitute the distribution's extreme directions. This construction takes the form of a smoothed max-linear model and accommodates the full spectrum of conceivable max-stable dependence structures. Additionally, we introduce a generic simulation algorithm tailored for multivariate generalized Pareto distributions, offering specific implementations for extensions of the logistic and H\"usler-Reiss families capable of carrying arbitrary extreme directions.

數據拆分 · 推斷 · 數據點 · 樣本 · 易處理的 ·

2023 年 11 月 8 日

Data fission: splitting a single data point

James Leiner,Boyan Duan,Larry Wasserman,Aaditya Ramdas

from arxiv, 57 pages, 35 figures

Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2022) offers an alternative route of accomplishing this task through randomization of $X$ with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.