亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper investigates the spectral norm version of the column subset selection problem. Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a positive integer $k\leq\text{rank}(\mathbf{A})$, the objective is to select exactly $k$ columns of $\mathbf{A}$ that minimize the spectral norm of the residual matrix after projecting $\mathbf{A}$ onto the space spanned by the selected columns. We use the method of interlacing polynomials introduced by Marcus-Spielman-Srivastava to derive an asymptotically sharp upper bound on the minimal approximation error, and propose a deterministic polynomial-time algorithm that achieves this error bound (up to a computational error). Furthermore, we extend our result to a column partition problem in which the columns of $\mathbf{A}$ can be partitioned into $r\geq 2$ subsets such that $\mathbf{A}$ can be well approximated by subsets from various groups. We show that the machinery of interlacing polynomials also works in this context, and establish a connection between the relevant expected characteristic polynomials and the $r$-characteristic polynomials introduced by Ravichandran and Leake. As a consequence, we prove that the columns of a rank-$d$ matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ can be partitioned into $r$ subsets $S_1,\ldots S_r$, such that the column space of $\mathbf{A}$ can be well approximated by the span of the columns in the complement of $S_i$ for each $1\leq i\leq r$.

相關內容

We study the problem of detecting the correlation between two Gaussian databases $\mathsf{X}\in\mathbb{R}^{n\times d}$ and $\mathsf{Y}^{n\times d}$, each composed of $n$ users with $d$ features. This problem is relevant in the analysis of social media, computational biology, etc. We formulate this as a hypothesis testing problem: under the null hypothesis, these two databases are statistically independent. Under the alternative, however, there exists an unknown permutation $\sigma$ over the set of $n$ users (or, row permutation), such that $\mathsf{X}$ is $\rho$-correlated with $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$. We determine sharp thresholds at which optimal testing exhibits a phase transition, depending on the asymptotic regime of $n$ and $d$. Specifically, we prove that if $\rho^2d\to0$, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, irrespectively of the value of $n$. This compliments the performance of a simple test that thresholds the sum all entries of $\mathsf{X}^T\mathsf{Y}$. Furthermore, when $d$ is fixed, we prove that strong detection (vanishing error probability) is impossible for any $\rho<\rho^\star$, where $\rho^\star$ is an explicit function of $d$, while weak detection is again impossible as long as $\rho^2d\to0$. These results close significant gaps in current recent related studies.

The support vector machine (SVM) is a supervised learning algorithm that finds a maximum-margin linear classifier, often after mapping the data to a high-dimensional feature space via the kernel trick. Recent work has demonstrated that in certain sufficiently overparameterized settings, the SVM decision function coincides exactly with the minimum-norm label interpolant. This phenomenon of support vector proliferation (SVP) is especially interesting because it allows us to understand SVM performance by leveraging recent analyses of harmless interpolation in linear and kernel models. However, previous work on SVP has made restrictive assumptions on the data/feature distribution and spectrum. In this paper, we present a new and flexible analysis framework for proving SVP in an arbitrary reproducing kernel Hilbert space with a flexible class of generative models for the labels. We present conditions for SVP for features in the families of general bounded orthonormal systems (e.g. Fourier features) and independent sub-Gaussian features. In both cases, we show that SVP occurs in many interesting settings not covered by prior work, and we leverage these results to prove novel generalization results for kernel SVM classification.

A minimum chain cover (MCC) of a $k$-width directed acyclic graph (DAG) $G = (V, E)$ is a set of $k$ chains (paths in the transitive closure) of $G$ such that every vertex appears in at least one chain in the cover. The state-of-the-art solutions for MCC run in time $\tilde{O}(k(|V|+|E|))$ [M\"akinen et at., TALG], $O(T_{MF}(|E|) + k|V|)$, $O(k^2|V| + |E|)$ [C\'aceres et al., SODA 2022], $\tilde{O}(|V|^{3/2} + |E|)$ [Kogan and Parter, ICALP 2022] and $\tilde{O}(T_{MCF}(|E|) + \sqrt{k}|V|)$ [Kogan and Parter, SODA 2023], where $T_{MF}(|E|)$ and $T_{MCF}(|E|)$ are the running times for solving maximum flow (MF) and minimum-cost flow (MCF), respectively. In this work we present an algorithm running in time $O(T_{MF}(|E|) + (|V|+|E|)\log{k})$. By considering the recent result for solving MF [Chen et al., FOCS 2022] our algorithm is the first running in almost linear time. Moreover, our techniques are deterministic and derive a deterministic near-linear time algorithm for MCC if the same is provided for MF. At the core of our solution we use a modified version of the mergeable dictionaries [Farach and Thorup, Algorithmica], [Iacono and \"Ozkan, ICALP 2010] data structure boosted with the SIZE-SPLIT operation and answering queries in amortized logarithmic time, which can be of independent interest.

In [Meurant, Pape\v{z}, Tich\'y; Numerical Algorithms 88, 2021], we presented an adaptive estimate for the energy norm of the error in the conjugate gradient (CG) method. In this paper, we extend the estimate to algorithms for solving linear approximation problems with a general, possibly rectangular matrix that are based on applying CG to a system with a positive (semi-)definite matrix build from the original matrix. We show that the resulting estimate preserves its key properties: it can be very cheaply evaluated, and it is numerically reliable in finite-precision arithmetic under some mild assumptions. We discuss algorithms based on Hestenes-Stiefel-like implementation (often called CGLS and CGNE in the literature) as well as on bidiagonalization (LSQR and CRAIG), and both unpreconditioned and preconditioned variants. The numerical experiments confirm the robustness and very satisfactory behaviour of the estimate.

Given a graph $G$, an edge-coloring is an assignment of colors to edges of $G$ such that any two edges sharing an endpoint receive different colors. By Vizing's celebrated theorem, any graph of maximum degree $\Delta$ needs at least $\Delta$ and at most $(\Delta + 1)$ colors to be properly edge colored. In this paper, we study edge colorings in the streaming setting. The edges arrive one by one in an arbitrary order. The algorithm takes a single pass over the input and must output a solution using a much smaller space than the input size. Since the output of edge coloring is as large as its input, the assigned colors should also be reported in a streaming fashion. The streaming edge coloring problem has been studied in a series of works over the past few years. The main challenge is that the algorithm cannot "remember" all the color assignments that it returns. To ensure the validity of the solution, existing algorithms use many more colors than Vizing's bound. Namely, in $n$-vertex graphs, the state-of-the-art algorithm with $\widetilde{O}(n s)$ space requires $O(\Delta^2/s + \Delta)$ colors. Note, in particular, that for an asymptotically optimal $O(\Delta)$ coloring, this algorithm requires $\Omega(n\Delta)$ space which is as large as the input. Whether such a coloring can be achieved with sublinear space has been left open. In this paper, we answer this question in the affirmative. We present a randomized algorithm that returns an asymptotically optimal $O(\Delta)$ edge coloring using $\widetilde{O}(n \sqrt{\Delta})$ space. More generally, our algorithm returns a proper $O(\Delta^{1.5}/s + \Delta)$ edge coloring with $\widetilde{O}(n s)$ space, improving prior algorithms for the whole range of $s$.

Regression can be really difficult in case of big datasets, since we have to dealt with huge volumes of data. The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper we consider an approach based on leverages scores, already existing in the current literature. The aforementioned approach proposed in order to select subdata for linear model discrimination. However, we highlight its importance on the selection of data points that are the most informative for estimating unknown parameters. We conclude that the approach based on leverage scores improves existing approaches, providing simulation experiments as well as a real data application.

The conic bundle implementation of the spectral bundle method for large scale semidefinite programming solves in each iteration a semidefinite quadratic subproblem by an interior point approach. For larger cutting model sizes the limiting operation is collecting and factorizing a Schur complement of the primal-dual KKT system. We explore possibilities to improve on this by an iterative approach that exploits structural low rank properties. Two preconditioning approaches are proposed and analyzed. Both might be of interest for rank structured positive definite systems in general. The first employs projections onto random subspaces, the second projects onto a subspace that is chosen deterministically based on structural interior point properties. For both approaches theoretic bounds are derived for the associated condition number. In the instances tested the deterministic preconditioner provides surprisingly efficient control on the actual condition number. The results suggest that for large scale instances the iterative solver is usually the better choice if precision requirements are moderate or if the size of the Schur complemented system clearly exceeds the active dimension within the subspace giving rise to the cutting model of the bundle method.

In the online bisection problem one has to maintain a partition of $n$ elements into two clusters of cardinality $n/2$. During runtime, an online algorithm is given a sequence of requests, each being a pair of elements: an inter-cluster request costs one unit while an intra-cluster one is free. The algorithm may change the partition, paying a unit cost for each element that changes its cluster. This natural problem admits a simple deterministic $O(n^2)$-competitive algorithm [Avin et al., DISC 2016]. While several significant improvements over this result have been obtained since the original work, all of them either limit the generality of the input or assume some form of resource augmentation (e.g., larger clusters). Moreover, the algorithm of Avin et al. achieves the best known competitive ratio even if randomization is allowed. In this paper, we present a first randomized online algorithm that breaks this natural barrier and achieves a competitive ratio of $\tilde{O}(n^{27/14})$ without resource augmentation and for an arbitrary sequence of requests.

In this paper, we present a low-diameter decomposition algorithm in the LOCAL model of distributed computing that succeeds with probability $1 - 1/poly(n)$. Specifically, we show how to compute an $\left(\epsilon, O\left(\frac{\log n}{\epsilon}\right)\right)$ low-diameter decomposition in $O\left(\frac{\log^3(1/\epsilon)\log n}{\epsilon}\right)$ round Further developing our techniques, we show new distributed algorithms for approximating general packing and covering integer linear programs in the LOCAL model. For packing problems, our algorithm finds an $(1-\epsilon)$-approximate solution in $O\left(\frac{\log^3 (1/\epsilon) \log n}{\epsilon}\right)$ rounds with probability $1 - 1/poly(n)$. For covering problems, our algorithm finds an $(1+\epsilon)$-approximate solution in $O\left(\frac{\left(\log \log n + \log (1/\epsilon)\right)^3 \log n}{\epsilon}\right)$ rounds with probability $1 - 1/poly(n)$. These results improve upon the previous $O\left(\frac{\log^3 n}{\epsilon}\right)$-round algorithm by Ghaffari, Kuhn, and Maus [STOC 2017] which is based on network decompositions. Our algorithms are near-optimal for many fundamental combinatorial graph optimization problems in the LOCAL model, such as minimum vertex cover and minimum dominating set, as their $(1\pm \epsilon)$-approximate solutions require $\Omega\left(\frac{\log n}{\epsilon}\right)$ rounds to compute.

Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.

北京阿比特科技有限公司