亚洲综合蜜桃久久丁香婷,美女自拍理论视频

The sign-constrained Stiefel manifold in $\mathbb{R}^{n\times r}$ is a segment of the Stiefel manifold with fixed signs (nonnegative or nonpositive) for some columns of the matrices. It includes the nonnegative Stiefel manifold as a special case. We present global and local error bounds that provide an inequality with easily computable residual functions and explicit coefficients to bound the distance from matrices in $\mathbb{R}^{n\times r}$ to the sign-constrained Stiefel manifold. Moreover, we show that the error bounds cannot be improved except for the multiplicative constants under some mild conditions, which explains why two square-root terms are necessary in the bounds when $1< r <n$ and why the $\ell_1$ norm can be used in the bounds when $r = n$ or $r = 1$ for the sign constraints and orthogonality, respectively. The error bounds are applied to derive exact penalty methods for minimizing a Lipschitz continuous function with orthogonality and sign constraints.

相關內容

流形

關注 3

線性的 · Oracle · INFORMS · state-of-the-art · 分離的 ·

2024 年 6 月 28 日

Tight Bounds for Sorting Under Partial Information

Ivor van der Hoog,Daniel Rutschmann

from arxiv, To appear at FOCS 2024

Sorting has a natural generalization where the input consists of: (1) a ground set $X$ of size $n$, (2) a partial oracle $O_P$ specifying some fixed partial order $P$ on $X$ and (3) a linear oracle $O_L$ specifying a linear order $L$ that extends $P$. The goal is to recover the linear order $L$ on $X$ using the fewest number of linear oracle queries. In this problem, we measure algorithmic complexity through three metrics: oracle queries to $O_L$, oracle queries to $O_P$, and the time spent. Any algorithm requires worst-case $\log_2 e(P)$ linear oracle queries to recover the linear order on $X$. Kahn and Saks presented the first algorithm that uses $\Theta(\log e(P))$ linear oracle queries (using $O(n^2)$ partial oracle queries and exponential time). The state-of-the-art for the general problem is by Cardinal, Fiorini, Joret, Jungers and Munro who at STOC'10 manage to separate the linear and partial oracle queries into a preprocessing and query phase. They can preprocess $P$ using $O(n^2)$ partial oracle queries and $O(n^{2.5})$ time. Then, given $O_L$, they uncover the linear order on $X$ in $\Theta(\log e(P))$ linear oracle queries and $O(n + \log e(P))$ time -- which is worst-case optimal in the number of linear oracle queries but not in the time spent. For $c \geq 1$, our algorithm can preprocess $O_P$ using $O(n^{1 + \frac{1}{c}})$ queries and time. Given $O_L$, we uncover $L$ using $\Theta(c \log e(P))$ queries and time. We show a matching lower bound, as there exist positive constants $(\alpha, \beta)$ where for any constant $c \geq 1$, any algorithm that uses at most $\alpha \cdot n^{1 + \frac{1}{c}}$ preprocessing must use worst-case at least $\beta \cdot c \log e(P)$ linear oracle queries. Thus, we solve the problem of sorting under partial information through an algorithm that is asymptotically tight across all three metrics.

樣本 · 線性的 · 線性回歸 · 多峰值 · 稀疏 ·

2024 年 6 月 27 日

Provably Efficient Posterior Sampling for Sparse Linear Regression via Measure Decomposition

Andrea Montanari,Yuchen Wu

from arxiv, 29 pages, 10 figures

We consider the problem of sampling from the posterior distribution of a $d$-dimensional coefficient vector $\boldsymbol{\theta}$, given linear observations $\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\theta}+\boldsymbol{\varepsilon}$. In general, such posteriors are multimodal, and therefore challenging to sample from. This observation has prompted the exploration of various heuristics that aim at approximating the posterior distribution. In this paper, we study a different approach based on decomposing the posterior distribution into a log-concave mixture of simple product measures. This decomposition allows us to reduce sampling from a multimodal distribution of interest to sampling from a log-concave one, which is tractable and has been investigated in detail. We prove that, under mild conditions on the prior, for random designs, such measure decomposition is generally feasible when the number of samples per parameter $n/d$ exceeds a constant threshold. We thus obtain a provably efficient (polynomial time) sampling algorithm in a regime where this was previously not known. Numerical simulations confirm that the algorithm is practical, and reveal that it has attractive statistical properties compared to state-of-the-art methods.

分離的 · 確切的 · 穩健性 · 講稿 · 線性的 ·

2024 年 6 月 27 日

Robust Classification of Dynamic Bichromatic point Sets in R2

Erwin Glazenburg,Frank Staals,Marc van Kreveld

from arxiv, 43 pages, 32 figures

Let $R \cup B$ be a set of $n$ points in $\mathbb{R}^2$, and let $k \in 1..n$. Our goal is to compute a line that "best" separates the "red" points $R$ from the "blue" points $B$ with at most $k$ outliers. We present an efficient semi-online dynamic data structure that can maintain whether such a separator exists. Furthermore, we present efficient exact and approximation algorithms that compute a linear separator that is guaranteed to misclassify at most $k$, points and minimizes the distance to the farthest outlier. Our exact algorithm runs in $O(nk + n \log n)$ time, and our $(1+\varepsilon)$-approximation algorithm runs in $O(\varepsilon^{-1/2}((n + k^2) \log n))$ time. Based on our $(1+\varepsilon)$-approximation algorithm we then also obtain a semi-online data structure to maintain such a separator efficiently.

MoDELS · HTTPS · 層 · 聯想記憶 · Attention ·

2024 年 6 月 26 日

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Jerry Yao-Chieh Hu,Pei-Hsuan Chang,Robin Luo,Hong-Yu Chen,Weijian Li,Wei-Po Wang,Han Liu

from arxiv, Accepted at ICML 2024; v2 updated to camera-ready version; Code available at //github.com/MAGICS-LAB/OutEffHop; Models are on Hugging Face: //huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f

We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{//github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{//huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{//arxiv.org/abs/2404.03828}{arXiv}.

情景 · 相互獨立的 · MoDELS · 在線 · SimPLe ·

2024 年 6 月 26 日

Random-Order Online Independent Set of Intervals and Hyperrectangles

Mohit Garg,Debajyoti Kar,Arindam Khan

from arxiv, 31 pages, Full version of ESA 2024 paper

In the Maximum Independent Set of Hyperrectangles problem, we are given a set of $n$ (possibly overlapping) $d$-dimensional axis-aligned hyperrectangles, and the goal is to find a subset of non-overlapping hyperrectangles of maximum cardinality. For $d=1$, this corresponds to the classical Interval Scheduling problem, where a simple greedy algorithm returns an optimal solution. In the offline setting, for $d$-dimensional hyperrectangles, polynomial time $(\log n)^{O(d)}$-approximation algorithms are known. However, the problem becomes notably challenging in the online setting, where the input objects (hyperrectangles) appear one by one in an adversarial order, and on the arrival of an object, the algorithm needs to make an immediate and irrevocable decision whether or not to select the object while maintaining the feasibility. Even for interval scheduling, an $\Omega(n)$ lower bound is known on the competitive ratio. To circumvent these negative results, in this work, we study the online maximum independent set of axis-aligned hyperrectangles in the random-order arrival model, where the adversary specifies the set of input objects which then arrive in a uniformly random order. Starting from the prototypical secretary problem, the random-order model has received significant attention to study algorithms beyond the worst-case competitive analysis. Surprisingly, we show that the problem in the random-order model almost matches the best-known offline approximation guarantees, up to polylogarithmic factors. In particular, we give a simple $(\log n)^{O(d)}$-competitive algorithm for $d$-dimensional hyperrectangles in this model, which runs in $\tilde{O_d}(n)$ time. Our approach also yields $(\log n)^{O(d)}$-competitive algorithms in the random-order model for more general objects such as $d$-dimensional fat objects and ellipsoids. Furthermore, our guarantees hold with high probability.

圖 · 相互獨立的 · 分解的 · 情景 · 估計/估計量 ·

2024 年 6 月 26 日

Sum-of-Squares Lower Bounds for Independent Set in Ultra-Sparse Random Graphs

Pravesh Kothari,Aaron Potechin,Jeff Xu

We prove that for every $D \in \N$, and large enough constant $d \in \N$, with high probability over the choice of $G \sim G(n,d/n)$, the \Erdos-\Renyi random graph distribution, the canonical degree $2D$ Sum-of-Squares relaxation fails to certify that the largest independent set in $G$ is of size $o(\frac{n}{\sqrt{d} D^4})$. In particular, degree $D$ sum-of-squares strengthening can reduce the integrality gap of the classical \Lovasz theta SDP relaxation by at most a $O(D^4)$ factor. This is the first lower bound for $>4$-degree Sum-of-Squares (SoS) relaxation for any problems on \emph{ultra sparse} random graphs (i.e. average degree of an absolute constant). Such ultra-sparse graphs were a known barrier for previous methods and explicitly identified as a major open direction (e.g.,~\cite{deshpande2019threshold, kothari2021stressfree}). Indeed, the only other example of an SoS lower bound on ultra-sparse random graphs was a degree-4 lower bound for Max-Cut. Our main technical result is a new method to obtain spectral norm estimates on graph matrices (a class of low-degree matrix-valued polynomials in $G(n,d/n)$) that are accurate to within an absolute constant factor. All prior works lose $\poly log n$ factors that trivialize any lower bound on $o(\log n)$-degree random graphs. We combine these new bounds with several upgrades on the machinery for analyzing lower-bound witnesses constructed by pseudo-calibration so that our analysis does not lose any $\omega(1)$-factors that would trivialize our results. In addition to other SoS lower bounds, we believe that our methods for establishing spectral norm estimates on graph matrices will be useful in the analyses of numerical algorithms on average-case inputs.

類別 · Performer · 情景 · INFORMS · 標注 ·

2024 年 6 月 26 日

Exclusive Style Removal for Cross Domain Novel Class Discovery

Yicheng Wang,Feng Liu,Junmin Liu,Zhen Fang,Kai Sun

As a promising field in open-world learning, \textit{Novel Class Discovery} (NCD) is usually a task to cluster unseen novel classes in an unlabeled set based on the prior knowledge of labeled data within the same domain. However, the performance of existing NCD methods could be severely compromised when novel classes are sampled from a different distribution with the labeled ones. In this paper, we explore and establish the solvability of NCD in cross domain setting with the necessary condition that style information must be removed. Based on the theoretical analysis, we introduce an exclusive style removal module for extracting style information that is distinctive from the baseline features, thereby facilitating inference. Moreover, this module is easy to integrate with other NCD methods, acting as a plug-in to improve performance on novel classes with different distributions compared to the seen labeled set. Additionally, recognizing the non-negligible influence of different backbones and pre-training strategies on the performance of the NCD methods, we build a fair benchmark for future NCD research. Extensive experiments on three common datasets demonstrate the effectiveness of our proposed module.

貪心 · 貪心逐層預訓練 · Extensibility · Performer · 正則化項 ·

2024 年 6 月 26 日

Outperforming the Best 1D Low-Discrepancy Constructions with a Greedy Algorithm

Fran?ois Clément

The design of uniformly spread sequences on $[0,1)$ has been extensively studied since the work of Weyl and van der Corput in the early $20^{\text{th}}$ century. The current best sequences are based on the Kronecker sequence with golden ratio and a permutation of the van der Corput sequence by Ostromoukhov. Despite extensive efforts, it is still unclear if it is possible to improve these constructions further. We show, using numerical experiments, that a radically different approach introduced by Kritzinger in seems to perform better than the existing methods. In particular, this construction is based on a \emph{greedy} approach, and yet outperforms very delicate number-theoretic constructions. Furthermore, we are also able to provide the first numerical results in dimensions 2 and 3, and show that the sequence remains highly regular in this new setting.

樣本 · 線性的 · 近似 · 得分 · 均勻采樣 ·

2024 年 6 月 25 日

Gradient Coding with Iterative Block Leverage Score Sampling

Neophytos Charalambides,Mert Pilanci,Alfred Hero

from arxiv, 26 pages, 6 figures, 1 table

We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.

Minimax · 極大似然 · 似然 · 最大似然估計 · 估計/估計量 ·

2024 年 6 月 25 日

Asymptotically Minimax Regret by Bayes Mixtures

Jun'ichi Takeuchi,Andrew R. Barron

We study the problems of data compression, gambling and prediction of a sequence $x^n=x_1x_2...x_n$ from an alphabet ${\cal X}$, in terms of regret and expected regret (redundancy) with respect to various smooth families of probability distributions. We evaluate the regret of Bayes mixture distributions compared to maximum likelihood, under the condition that the maximum likelihood estimate is in the interior of the parameter space. For general exponential families (including the non-i.i.d.\ case) the asymptotically mimimax value is achieved when variants of the prior of Jeffreys are used. %under the condition that the maximum likelihood estimate is in the interior of the parameter space. Interestingly, we also obtain a modification of Jeffreys prior which has measure outside the given family of densities, to achieve minimax regret with respect to non-exponential type families. This modification enlarges the family using local exponential tilting (a fiber bundle). Our conditions are confirmed for certain non-exponential families, including curved families and mixture families (where either the mixture components or their weights of combination are parameterized) as well as contamination models. Furthermore for mixture families we show how to deal with the full simplex of parameters. These results also provide characterization of Rissanen's stochastic complexity.