亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Maximum margin binary classification is one of the most fundamental algorithms in machine learning, yet the role of featurization maps and the high-dimensional asymptotics of the misclassification error for non-Gaussian features are still poorly understood. We consider settings in which we observe binary labels $y_i$ and either $d$-dimensional covariates ${\boldsymbol z}_i$ that are mapped to a $p$-dimension space via a randomized featurization map ${\boldsymbol \phi}:\mathbb{R}^d \to\mathbb{R}^p$, or $p$-dimensional features of non-Gaussian independent entries. In this context, we study two fundamental questions: $(i)$ At what overparametrization ratio $p/n$ do the data become linearly separable? $(ii)$ What is the generalization error of the max-margin classifier? Working in the high-dimensional regime in which the number of features $p$, the number of samples $n$ and the input dimension $d$ (in the nonlinear featurization setting) diverge, with ratios of order one, we prove a universality result establishing that the asymptotic behavior is completely determined by the expected covariance of feature vectors and by the covariance between features and labels. In particular, the overparametrization threshold and generalization error can be computed within a simpler Gaussian model. The main technical challenge lies in the fact that max-margin is not the maximizer (or minimizer) of an empirical average, but the maximizer of a minimum over the samples. We address this by representing the classifier as an average over support vectors. Crucially, we find that in high dimensions, the support vector count is proportional to the number of samples, which ultimately yields universality.

相關內容

We present a numerical iterative optimization algorithm for the minimization of a cost function consisting of a linear combination of three convex terms, one of which is differentiable, a second one is prox-simple and the third one is the composition of a linear map and a prox-simple function. The algorithm's special feature lies in its ability to approximate, in a single iteration run, the minimizers of the cost function for many different values of the parameters determining the relative weight of the three terms in the cost function. A proof of convergence of the algorithm, based on an inexact variable metric approach, is also provided. As a special case, one recovers a generalization of the primal-dual algorithm of Chambolle and Pock, and also of the proximal-gradient algorithm. Finally, we show how it is related to a primal-dual iterative algorithm based on inexact proximal evaluations of the non-smooth terms of the cost function.

Generalized linear models (GLMs) are popular for data-analysis in almost all quantitative sciences, but the choice of likelihood family and link function is often difficult. This motivates the search for likelihoods and links that minimize the impact of potential misspecification. We perform a large-scale simulation study on double-bounded and lower-bounded response data where we systematically vary both true and assumed likelihoods and links. In contrast to previous studies, we also study posterior calibration and uncertainty metrics in addition to point-estimate accuracy. Our results indicate that certain likelihoods and links can be remarkably robust to misspecification, performing almost on par with their respective true counterparts. Additionally, normal likelihood models with identity link (i.e., linear regression) often achieve calibration comparable to the more structurally faithful alternatives, at least in the studied scenarios. On the basis of our findings, we provide practical suggestions for robust likelihood and link choices in GLMs.

Numerical methods for computing the solutions of Markov backward stochastic differential equations (BSDEs) driven by continuous-time Markov chains (CTMCs) are explored. The main contributions of this paper are as follows: (1) we observe that Euler-Maruyama temporal discretization methods for solving Markov BSDEs driven by CTMCs are equivalent to exponential integrators for solving the associated systems of ordinary differential equations (ODEs); (2) we introduce multi-stage Euler-Maruyama methods for effectively solving "stiff" Markov BSDEs driven by CTMCs; these BSDEs typically arise from the spatial discretization of Markov BSDEs driven by Brownian motion; (3) we propose a multilevel spatial discretization method on sparse grids that efficiently approximates high-dimensional Markov BSDEs driven by Brownian motion with a combination of multiple Markov BSDEs driven by CTMCs on grids with different resolutions. We also illustrate the effectiveness of the presented methods with a number of numerical experiments in which we treat nonlinear BSDEs arising from option pricing problems in finance.

Convergence is a crucial issue in iterative algorithms. Damping is commonly employed to ensure the convergence of iterative algorithms. The conventional ways of damping are scalar-wise, and either heuristic or empirical. Recently, an analytically optimized vector damping was proposed for memory message-passing (iterative) algorithms. As a result, it yields a special class of covariance matrices called L-banded matrices. In this paper, we show these matrices have broad algebraic properties arising from their L-banded structure. In particular, compact analytic expressions for the LDL decomposition, the Cholesky decomposition, the determinant after a column substitution, minors, and cofactors are derived. Furthermore, necessary and sufficient conditions for an L-banded matrix to be definite, a recurrence to obtain the characteristic polynomial, and some other properties are given. In addition, we give new derivations of the determinant and the inverse. (It's crucial to emphasize that some works have independently studied matrices with this special structure, named as L-matrices. Specifically, L-banded matrices are regarded as L-matrices with real and finite entries.)

In sampling-based Bayesian models of brain function, neural activities are assumed to be samples from probability distributions that the brain uses for probabilistic computation. However, a comprehensive understanding of how mechanistic models of neural dynamics can sample from arbitrary distributions is still lacking. We use tools from functional analysis and stochastic differential equations to explore the minimum architectural requirements for $\textit{recurrent}$ neural circuits to sample from complex distributions. We first consider the traditional sampling model consisting of a network of neurons whose outputs directly represent the samples (sampler-only network). We argue that synaptic current and firing-rate dynamics in the traditional model have limited capacity to sample from a complex probability distribution. We show that the firing rate dynamics of a recurrent neural circuit with a separate set of output units can sample from an arbitrary probability distribution. We call such circuits reservoir-sampler networks (RSNs). We propose an efficient training procedure based on denoising score matching that finds recurrent and output weights such that the RSN implements Langevin sampling. We empirically demonstrate our model's ability to sample from several complex data distributions using the proposed neural dynamics and discuss its applicability to developing the next generation of sampling-based brain models.

Simply-verifiable mathematical conditions for existence, uniqueness and explicit analytical computation of minimal adversarial paths (MAP) and minimal adversarial distances (MAD) for (locally) uniquely-invertible classifiers, for generalized linear models (GLM), and for entropic AI (EAI) are formulated and proven. Practical computation of MAP and MAD, their comparison and interpretations for various classes of AI tools (for neuronal networks, boosted random forests, GLM and EAI) are demonstrated on the common synthetic benchmarks: on a double Swiss roll spiral and its extensions, as well as on the two biomedical data problems (for the health insurance claim predictions, and for the heart attack lethality classification). On biomedical applications it is demonstrated how MAP provides unique minimal patient-specific risk-mitigating interventions in the predefined subsets of accessible control variables.

We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks. In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed to obtain a trained network. The sampling is based on the idea of random feature models. However, instead of a data-agnostic distribution, e.g., a normal distribution, we use both the input and the output training data to sample shallow and deep networks. We prove that sampled networks are universal approximators. For Barron functions, we show that the $L^2$-approximation error of sampled shallow networks decreases with the square root of the number of neurons. Our sampling scheme is invariant to rigid body transformations and scaling of the input data, which implies many popular pre-processing techniques are not required. In numerical experiments, we demonstrate that sampled networks achieve accuracy comparable to iteratively trained ones, but can be constructed orders of magnitude faster. Our test cases involve a classification benchmark from OpenML, sampling of neural operators to represent maps in function spaces, and transfer learning using well-known architectures.

Approximated forms of the RII and RIII redistribution matrices are frequently applied to simplify the numerical solution of the radiative transfer problem for polarized radiation, taking partial frequency redistribution (PRD) effects into account. A widely used approximation for RIII is to consider its expression under the assumption of complete frequency redistribution (CRD) in the observer frame (RIII CRD). The adequacy of this approximation for modeling the intensity profiles has been firmly established. By contrast, its suitability for modeling scattering polarization signals has only been analyzed in a few studies, considering simplified settings. In this work, we aim at quantitatively assessing the impact and the range of validity of the RIII CRD approximation in the modeling of scattering polarization. Methods. We first present an analytic comparison between RIII and RIII CRD. We then compare the results of radiative transfer calculations, out of local thermodynamic equilibrium, performed with RIII and RIII CRD in realistic 1D atmospheric models. We focus on the chromospheric Ca i line at 4227 A and on the photospheric Sr i line at 4607 A.

Statistical learning methods are widely utilized in tackling complex problems due to their flexibility, good predictive performance and its ability to capture complex relationships among variables. Additionally, recently developed automatic workflows have provided a standardized approach to implementing statistical learning methods across various applications. However these tools highlight a main drawbacks of statistical learning: its lack of interpretation in their results. In the past few years an important amount of research has been focused on methods for interpreting black box models. Having interpretable statistical learning methods is relevant to have a deeper understanding of the model. In problems were spatial information is relevant, combined interpretable methods with spatial data can help to get better understanding of the problem and interpretation of the results. This paper is focused in the individual conditional expectation (ICE-plot), a model agnostic methods for interpreting statistical learning models and combined them with spatial information. ICE-plot extension is proposed where spatial information is used as restriction to define Spatial ICE curves (SpICE). Spatial ICE curves are estimated using real data in the context of an economic problem concerning property valuation in Montevideo, Uruguay. Understanding the key factors that influence property valuation is essential for decision-making, and spatial data plays a relevant role in this regard.

Given a finite field F_q and a positive integer n, a flag is a sequence of nested F_q-subspaces of a vector space F_q^n and a flag code is a nonempty collection of flags. The projected codes of a flag code are the constant dimension codes containing all the subspaces of prescribed dimensions that form the flags in the flag code. In this paper we address the notion of equivalence for flag codes and explore in which situations such an equivalence can be reduced to the equivalence of the corresponding projected codes. In addition, this study leads to new results concerning the automorphism group of certain families of flag codes, some of them also introduced in this paper.

北京阿比特科技有限公司