亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The sign-constrained Stiefel manifold in $\mathbb{R}^{n\times r}$ is a segment of the Stiefel manifold with fixed signs (nonnegative or nonpositive) for some columns of the matrices. It includes the nonnegative Stiefel manifold as a special case. We present global and local error bounds that provide an inequality with easily computable residual functions and explicit coefficients to bound the distance from matrices in $\mathbb{R}^{n\times r}$ to the sign-constrained Stiefel manifold. Moreover, we show that the error bounds cannot be improved except for the multiplicative constants under some mild conditions, which explains why two square-root terms are necessary in the bounds when $1< r <n$ and why the $\ell_1$ norm can be used in the bounds when $r = n$ or $r = 1$ for the sign constraints and orthogonality, respectively. The error bounds are applied to derive exact penalty methods for minimizing a Lipschitz continuous function with orthogonality and sign constraints.

相關內容

Sorting has a natural generalization where the input consists of: (1) a ground set $X$ of size $n$, (2) a partial oracle $O_P$ specifying some fixed partial order $P$ on $X$ and (3) a linear oracle $O_L$ specifying a linear order $L$ that extends $P$. The goal is to recover the linear order $L$ on $X$ using the fewest number of linear oracle queries. In this problem, we measure algorithmic complexity through three metrics: oracle queries to $O_L$, oracle queries to $O_P$, and the time spent. Any algorithm requires worst-case $\log_2 e(P)$ linear oracle queries to recover the linear order on $X$. Kahn and Saks presented the first algorithm that uses $\Theta(\log e(P))$ linear oracle queries (using $O(n^2)$ partial oracle queries and exponential time). The state-of-the-art for the general problem is by Cardinal, Fiorini, Joret, Jungers and Munro who at STOC'10 manage to separate the linear and partial oracle queries into a preprocessing and query phase. They can preprocess $P$ using $O(n^2)$ partial oracle queries and $O(n^{2.5})$ time. Then, given $O_L$, they uncover the linear order on $X$ in $\Theta(\log e(P))$ linear oracle queries and $O(n + \log e(P))$ time -- which is worst-case optimal in the number of linear oracle queries but not in the time spent. For $c \geq 1$, our algorithm can preprocess $O_P$ using $O(n^{1 + \frac{1}{c}})$ queries and time. Given $O_L$, we uncover $L$ using $\Theta(c \log e(P))$ queries and time. We show a matching lower bound, as there exist positive constants $(\alpha, \beta)$ where for any constant $c \geq 1$, any algorithm that uses at most $\alpha \cdot n^{1 + \frac{1}{c}}$ preprocessing must use worst-case at least $\beta \cdot c \log e(P)$ linear oracle queries. Thus, we solve the problem of sorting under partial information through an algorithm that is asymptotically tight across all three metrics.

We consider the problem of sampling from the posterior distribution of a $d$-dimensional coefficient vector $\boldsymbol{\theta}$, given linear observations $\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\theta}+\boldsymbol{\varepsilon}$. In general, such posteriors are multimodal, and therefore challenging to sample from. This observation has prompted the exploration of various heuristics that aim at approximating the posterior distribution. In this paper, we study a different approach based on decomposing the posterior distribution into a log-concave mixture of simple product measures. This decomposition allows us to reduce sampling from a multimodal distribution of interest to sampling from a log-concave one, which is tractable and has been investigated in detail. We prove that, under mild conditions on the prior, for random designs, such measure decomposition is generally feasible when the number of samples per parameter $n/d$ exceeds a constant threshold. We thus obtain a provably efficient (polynomial time) sampling algorithm in a regime where this was previously not known. Numerical simulations confirm that the algorithm is practical, and reveal that it has attractive statistical properties compared to state-of-the-art methods.

Let $R \cup B$ be a set of $n$ points in $\mathbb{R}^2$, and let $k \in 1..n$. Our goal is to compute a line that "best" separates the "red" points $R$ from the "blue" points $B$ with at most $k$ outliers. We present an efficient semi-online dynamic data structure that can maintain whether such a separator exists. Furthermore, we present efficient exact and approximation algorithms that compute a linear separator that is guaranteed to misclassify at most $k$, points and minimizes the distance to the farthest outlier. Our exact algorithm runs in $O(nk + n \log n)$ time, and our $(1+\varepsilon)$-approximation algorithm runs in $O(\varepsilon^{-1/2}((n + k^2) \log n))$ time. Based on our $(1+\varepsilon)$-approximation algorithm we then also obtain a semi-online data structure to maintain such a separator efficiently.

We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{//github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{//huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{//arxiv.org/abs/2404.03828}{arXiv}.

In the Maximum Independent Set of Hyperrectangles problem, we are given a set of $n$ (possibly overlapping) $d$-dimensional axis-aligned hyperrectangles, and the goal is to find a subset of non-overlapping hyperrectangles of maximum cardinality. For $d=1$, this corresponds to the classical Interval Scheduling problem, where a simple greedy algorithm returns an optimal solution. In the offline setting, for $d$-dimensional hyperrectangles, polynomial time $(\log n)^{O(d)}$-approximation algorithms are known. However, the problem becomes notably challenging in the online setting, where the input objects (hyperrectangles) appear one by one in an adversarial order, and on the arrival of an object, the algorithm needs to make an immediate and irrevocable decision whether or not to select the object while maintaining the feasibility. Even for interval scheduling, an $\Omega(n)$ lower bound is known on the competitive ratio. To circumvent these negative results, in this work, we study the online maximum independent set of axis-aligned hyperrectangles in the random-order arrival model, where the adversary specifies the set of input objects which then arrive in a uniformly random order. Starting from the prototypical secretary problem, the random-order model has received significant attention to study algorithms beyond the worst-case competitive analysis. Surprisingly, we show that the problem in the random-order model almost matches the best-known offline approximation guarantees, up to polylogarithmic factors. In particular, we give a simple $(\log n)^{O(d)}$-competitive algorithm for $d$-dimensional hyperrectangles in this model, which runs in $\tilde{O_d}(n)$ time. Our approach also yields $(\log n)^{O(d)}$-competitive algorithms in the random-order model for more general objects such as $d$-dimensional fat objects and ellipsoids. Furthermore, our guarantees hold with high probability.

We prove that for every $D \in \N$, and large enough constant $d \in \N$, with high probability over the choice of $G \sim G(n,d/n)$, the \Erdos-\Renyi random graph distribution, the canonical degree $2D$ Sum-of-Squares relaxation fails to certify that the largest independent set in $G$ is of size $o(\frac{n}{\sqrt{d} D^4})$. In particular, degree $D$ sum-of-squares strengthening can reduce the integrality gap of the classical \Lovasz theta SDP relaxation by at most a $O(D^4)$ factor. This is the first lower bound for $>4$-degree Sum-of-Squares (SoS) relaxation for any problems on \emph{ultra sparse} random graphs (i.e. average degree of an absolute constant). Such ultra-sparse graphs were a known barrier for previous methods and explicitly identified as a major open direction (e.g.,~\cite{deshpande2019threshold, kothari2021stressfree}). Indeed, the only other example of an SoS lower bound on ultra-sparse random graphs was a degree-4 lower bound for Max-Cut. Our main technical result is a new method to obtain spectral norm estimates on graph matrices (a class of low-degree matrix-valued polynomials in $G(n,d/n)$) that are accurate to within an absolute constant factor. All prior works lose $\poly log n$ factors that trivialize any lower bound on $o(\log n)$-degree random graphs. We combine these new bounds with several upgrades on the machinery for analyzing lower-bound witnesses constructed by pseudo-calibration so that our analysis does not lose any $\omega(1)$-factors that would trivialize our results. In addition to other SoS lower bounds, we believe that our methods for establishing spectral norm estimates on graph matrices will be useful in the analyses of numerical algorithms on average-case inputs.

As a promising field in open-world learning, \textit{Novel Class Discovery} (NCD) is usually a task to cluster unseen novel classes in an unlabeled set based on the prior knowledge of labeled data within the same domain. However, the performance of existing NCD methods could be severely compromised when novel classes are sampled from a different distribution with the labeled ones. In this paper, we explore and establish the solvability of NCD in cross domain setting with the necessary condition that style information must be removed. Based on the theoretical analysis, we introduce an exclusive style removal module for extracting style information that is distinctive from the baseline features, thereby facilitating inference. Moreover, this module is easy to integrate with other NCD methods, acting as a plug-in to improve performance on novel classes with different distributions compared to the seen labeled set. Additionally, recognizing the non-negligible influence of different backbones and pre-training strategies on the performance of the NCD methods, we build a fair benchmark for future NCD research. Extensive experiments on three common datasets demonstrate the effectiveness of our proposed module.

The design of uniformly spread sequences on $[0,1)$ has been extensively studied since the work of Weyl and van der Corput in the early $20^{\text{th}}$ century. The current best sequences are based on the Kronecker sequence with golden ratio and a permutation of the van der Corput sequence by Ostromoukhov. Despite extensive efforts, it is still unclear if it is possible to improve these constructions further. We show, using numerical experiments, that a radically different approach introduced by Kritzinger in seems to perform better than the existing methods. In particular, this construction is based on a \emph{greedy} approach, and yet outperforms very delicate number-theoretic constructions. Furthermore, we are also able to provide the first numerical results in dimensions 2 and 3, and show that the sequence remains highly regular in this new setting.

We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.

We study the problems of data compression, gambling and prediction of a sequence $x^n=x_1x_2...x_n$ from an alphabet ${\cal X}$, in terms of regret and expected regret (redundancy) with respect to various smooth families of probability distributions. We evaluate the regret of Bayes mixture distributions compared to maximum likelihood, under the condition that the maximum likelihood estimate is in the interior of the parameter space. For general exponential families (including the non-i.i.d.\ case) the asymptotically mimimax value is achieved when variants of the prior of Jeffreys are used. %under the condition that the maximum likelihood estimate is in the interior of the parameter space. Interestingly, we also obtain a modification of Jeffreys prior which has measure outside the given family of densities, to achieve minimax regret with respect to non-exponential type families. This modification enlarges the family using local exponential tilting (a fiber bundle). Our conditions are confirmed for certain non-exponential families, including curved families and mixture families (where either the mixture components or their weights of combination are parameterized) as well as contamination models. Furthermore for mixture families we show how to deal with the full simplex of parameters. These results also provide characterization of Rissanen's stochastic complexity.

北京阿比特科技有限公司