亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We theoretically investigate the typical learning performance of $\ell_{1}$-regularized linear regression ($\ell_1$-LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular (RR) graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of $\ell_1$-LinR is obtained, demonstrating that, for an Ising model with $N$ variables, $\ell_1$-LinR is model selection consistent with $M=\mathcal{O}\left(\log N\right)$ samples. Moreover, we provide a computationally efficient method to accurately predict the non-asymptotic behavior of $\ell_1$-LinR for moderate $M$ and $N$, such as the precision and recall rates. Simulations show a fairly good agreement between the theoretical predictions and experimental results, even for graphs with many loops, which supports our findings. Although this paper focuses on $\ell_1$-LinR, our method is readily applicable for precisely investigating the typical learning performances of a wide class of $\ell_{1}$-regularized M-estimators including $\ell_{1}$-regularized logistic regression and interaction screening.

相關內容

Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in high-dimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of $\ell_1$ penalty with respect to $\ell_2$; b) max-margin multi-class classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for $K>2$. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets.

Bayesian optimization is a form of sequential design: idealize input-output relationships with a suitably flexible nonlinear regression model; fit to data from an initial experimental campaign; devise and optimize a criterion for selecting the next experimental condition(s) under the fitted model (e.g., via predictive equations) to target outcomes of interest (say minima); repeat after acquiring output under those conditions and updating the fit. In many situations this "inner optimization" over the new-data acquisition criterion is cumbersome because it is non-convex/highly multi-modal, may be non-differentiable, or may otherwise thwart numerical optimizers, especially when inference requires Monte Carlo. In such cases it is not uncommon to replace continuous search with a discrete one over random candidates. Here we propose using candidates based on a Delaunay triangulation of the existing input design. In addition to detailing construction of these "tricands", based on a simple wrapper around a conventional convex hull library, we promote several advantages based on properties of the geometric criterion involved. We then demonstrate empirically how tricands can lead to better Bayesian optimization performance compared to both numerically optimized acquisitions and random candidate-based alternatives on benchmark problems.

We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.

We present a numerical method to model dynamical systems from data. We use the recently introduced method Scalable Probabilistic Approximation (SPA) to project points from a Euclidean space to convex polytopes and represent these projected states of a system in new, lower-dimensional coordinates denoting their position in the polytope. We then introduce a specific nonlinear transformation to construct a model of the dynamics in the polytope and to transform back into the original state space. To overcome the potential loss of information from the projection to a lower-dimensional polytope, we use memory in the sense of the delay-embedding theorem of Takens. By construction, our method produces stable models. We illustrate the capacity of the method to reproduce even chaotic dynamics and attractors with multiple connected components on various examples.

We show that external randomization may enforce the convergence of test statistics to their limiting distributions in particular cases. This results in a sharper inference. Our approach is based on a central limit theorem for weighted sums. We apply our method to a family of rank-based test statistics and a family of phi-divergence test statistics and prove that, with overwhelming probability with respect to the external randomization, the randomized statistics converge at the rate $O(1/n)$ (up to some logarithmic factors) to the limiting chi-square distribution in Kolmogorov metric.

In practice functional data are sampled on a discrete set of observation points and often susceptible to noise. We consider in this paper the setting where such data are used as explanatory variables in a regression problem. If the primary goal is prediction, we show that the gain by embedding the problem into a scalar-on-function regression is limited. Instead we impose a factor model on the predictors and suggest regressing the response on an appropriate number of factor scores. This approach is shown to be consistent under mild technical assumptions, numerically efficient and gives good practical performance in both simulations as well as real data settings.

Devising optimal interventions for constraining stochastic systems is a challenging endeavour that has to confront the interplay between randomness and nonlinearity. Existing methods for identifying the necessary dynamical adjustments resort either to space discretising solutions of ensuing partial differential equations, or to iterative stochastic path sampling schemes. Yet, both approaches become computationally demanding for increasing system dimension. Here, we propose a generally applicable and practically feasible non-iterative methodology for obtaining optimal dynamical interventions for diffusive nonlinear systems. We estimate the necessary controls from an interacting particle approximation to the logarithmic gradient of two forward probability flows evolved following deterministic particle dynamics. Applied to several biologically inspired models, we show that our method provides the necessary optimal controls in settings with terminal-, transient-, or generalised collective-state constraints and arbitrary system dynamics.

A structural version of the Gaussian mixture vector autoregressive model is introduced. The shocks are identified by combining simultaneous diagonalization of the error term covariance matrices with constraints on the time-varying B-matrix. This leads to more flexible identification conditions than in the conventional SVAR models, while some of the constraints are also testable. The empirical application considers a quarterly U.S. data covering the period from 1953Q3 to 2021Q1. Our model identifies two regimes: a stable inflation regime and an unstable inflation regime, of which the latter mainly prevails in late 1950's, 1970's, early 1980's, and during the Corona crisis. While the effects of monetary policy shocks are relatively symmetric in the unstable inflation regime, we found strong asymmetries with respect to the sign and size of the shock as well as to the initial state of the economy in the stable inflation regime. Large expansionary shocks, in particular, often drive the economy towards the unstable inflation regime and propagate high and persistent inflation. Consequently, the interest rate rises significantly, which appears to cause a strong contraction to the GDP after the initial short-term expansion. The accompanying, CRAN distributed R package gmvarkit provides easy-to-use tools for estimating the models and applying the introduced methods.

It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as \textit{over-smoothing}. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. However, there is little explanation of why it works empirically from the viewpoint of learning theory. In this study, we derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs. Using the boosting theory, we prove the convergence of the training error under weak learning-type conditions. By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the depth under the conditions. Our results offer theoretical explanations for the effectiveness of the multi-scale structure against the over-smoothing problem. We apply boosting algorithms to the training of multi-scale GNNs for real-world node prediction tasks. We confirm that its performance is comparable to existing GNNs, and the practical behaviors are consistent with theoretical observations. Code is available at //github.com/delta2323/GB-GNN

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

北京阿比特科技有限公司