亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted Isometry Property (RIP), leading to a wealth of results on the geometric landscape of the problem and the convergence rate of common algorithms. However, the existing results can handle the problem in the case with a general objective function subject to noisy data only when the RIP constant is close to 0. In this paper, we develop a new mathematical framework to solve the above-mentioned problem with a far less restrictive RIP constant. We prove that as long as the RIP constant of the noiseless objective is less than $1/3$, any spurious local solution of the noisy optimization problem must be close to the ground truth solution. By working through the strict saddle property, we also show that an approximate solution can be found in polynomial time. We characterize the geometry of the spurious local minima of the problem in a local region around the ground truth in the case when the RIP constant is greater than $1/3$. Compared to the existing results in the literature, this paper offers the strongest RIP bound and provides a complete theoretical analysis on the global and local optimization landscapes of general low-rank optimization problems under random corruptions from any finite-variance family.

相關內容

Practitioners are often left tuning Metropolis-Hastings algorithms by trial and error or using optimal scaling guidelines to avoid poor empirical performance. We develop lower bounds on the convergence rates of geometrically ergodic accept-reject-based Markov chains (i.e. Metropolis-Hastings, non-reversible Metropolis-Hastings) to study their computational complexity. If the target density concentrates with a parameter $n$ (e.g. Bayesian posterior concentration, Laplace approximations), we show the convergence rate can tend to $1$ exponentially fast if the tuning parameters do not depend carefully on $n$. We show this is the case for random-walk Metropolis in Bayesian logistic regression with Zellner's g-prior when the dimension and sample size $d/n \to \gamma \in (0, 1)$. We focus on more general target densities using a special class of Metropolis-Hastings algorithms with a Gaussian proposal (e.g. random walk and Metropolis-adjusted Langevin algorithms) where we give more general conditions. An application to flat prior Bayesian logistic regression as $n \to \infty$ is studied. We also develop lower bounds in the Wasserstein distances which have become popular in the convergence analysis of high-dimensional MCMC algorithms with similar conclusions.

The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.

We improve the current best running time value to invert sparse matrices over finite fields, lowering it to an expected $O\big(n^{2.2131}\big)$ time for the current values of fast rectangular matrix multiplication. We achieve the same running time for the computation of the rank and nullspace of a sparse matrix over a finite field. This improvement relies on two key techniques. First, we adopt the decomposition of an arbitrary matrix into block Krylov and Hankel matrices from Eberly et al. (ISSAC 2007). Second, we show how to recover the explicit inverse of a block Hankel matrix using low displacement rank techniques for structured matrices and fast rectangular matrix multiplication algorithms. We generalize our inversion method to block structured matrices with other displacement operators and strengthen the best known upper bounds for explicit inversion of block Toeplitz-like and block Hankel-like matrices, as well as for explicit inversion of block Vandermonde-like matrices with structured blocks. As a further application, we improve the complexity of several algorithms in topological data analysis and in finite group theory.

Modern network datasets are often composed of multiple layers, either as different views, time-varying observations, or independent sample units, resulting in collections of networks over the same set of vertices but with potentially different connectivity patterns on each network. These data require models and methods that are flexible enough to capture local and global differences across the networks, while at the same time being parsimonious and tractable to yield computationally efficient and theoretically sound solutions that are capable of aggregating information across the networks. This paper considers the multilayer degree-corrected stochastic blockmodel, where a collection of networks share the same community structure, but degree-corrections and block connection probability matrices are permitted to be different. We establish the identifiability of this model and propose a spectral clustering algorithm for community detection in this setting. Our theoretical results demonstrate that the misclustering error rate of the algorithm improves exponentially with multiple network realizations, even in the presence of significant layer heterogeneity with respect to degree corrections, signal strength, and spectral properties of the block connection probability matrices. Simulation studies show that this approach improves on existing multilayer community detection methods in this challenging regime. Furthermore, in a case study of US airport data through January 2016 -- September 2021, we find that this methodology identifies meaningful community structure and trends in airport popularity influenced by pandemic impacts on travel.

In this work, we study the numerical solution of inverse eigenvalue problems from a machine learning perspective. Two different problems are considered: the inverse Strum-Liouville eigenvalue problem for symmetric potentials and the inverse transmission eigenvalue problem for spherically symmetric refractive indices. Firstly, we solve the corresponding direct problems to produce the required eigenvalues datasets in order to train the machine learning algorithms. Next, we consider several examples of inverse problems and compare the performance of each model to predict the unknown potentials and refractive indices respectively, from a given small set of the lowest eigenvalues. The supervised regression models we use are k-Nearest Neighbours, Random Forests and Multi-Layer Perceptron. Our experiments show that these machine learning methods, under appropriate tuning on their parameters, can numerically solve the examined inverse eigenvalue problems.

It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, enabling an efficient and principled approach to incorporating optimism or pessimism in the face of uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.

Quantum algorithms for optimization problems are of general interest. Despite recent progress in classical lower bounds for nonconvex optimization under different settings and quantum lower bounds for convex optimization, quantum lower bounds for nonconvex optimization are still widely open. In this paper, we conduct a systematic study of quantum query lower bounds on finding $\epsilon$-approximate stationary points of nonconvex functions, and we consider the following two important settings: 1) having access to $p$-th order derivatives; or 2) having access to stochastic gradients. The classical query lower bounds is $\Omega\big(\epsilon^{-\frac{1+p}{p}}\big)$ regarding the first setting, and $\Omega(\epsilon^{-4})$ regarding the second setting (or $\Omega(\epsilon^{-3})$ if the stochastic gradient function is mean-squared smooth). In this paper, we extend all these classical lower bounds to the quantum setting. They match the classical algorithmic results respectively, demonstrating that there is no quantum speedup for finding $\epsilon$-stationary points of nonconvex functions with $p$-th order derivative inputs or stochastic gradient inputs, whether with or without the mean-squared smoothness assumption. Technically, our quantum lower bounds are obtained by showing that the sequential nature of classical hard instances in all these settings also applies to quantum queries, preventing any quantum speedup other than revealing information of the stationary points sequentially.

This PhD thesis contains several contributions to the field of statistical causal modeling. Statistical causal models are statistical models embedded with causal assumptions that allow for the inference and reasoning about the behavior of stochastic systems affected by external manipulation (interventions). This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust (out-of-distribution generalizing) prediction methods. We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization. Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators. We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. This connection leads us to prove that general K-class estimators possess distributional robustness properties. We, furthermore, propose a general framework for distributional robustness with respect to intervention-induced distributions. In this framework, we derive sufficient conditions for the identifiability of distributionally robust prediction methods and present impossibility results that show the necessity of several of these conditions. We present a new structure learning method applicable in additive noise models with directed trees as causal graphs. We prove consistency in a vanishing identifiability setup and provide a method for testing substructure hypotheses with asymptotic family-wise error control that remains valid post-selection. Finally, we present heuristic ideas for learning summary graphs of nonlinear time-series models.

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.

Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.

北京阿比特科技有限公司