亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Parameter estimation in logistic regression is a well-studied problem with the Newton-Raphson method being one of the most prominent optimization techniques used in practice. A number of monotone optimization methods including minorization-maximization (MM) algorithms, expectation-maximization (EM) algorithms and related variational Bayes approaches offer a family of useful alternatives guaranteed to increase the logistic regression likelihood at every iteration. In this article, we propose a modified version of a logistic regression EM algorithm which can substantially improve computationally efficiency while preserving the monotonicity of EM and the simplicity of the EM parameter updates. By introducing an additional latent parameter and selecting this parameter to maximize the penalized observed-data log-likelihood at every iteration, our iterative algorithm can be interpreted as a parameter-expanded expectation-condition maximization either (ECME) algorithm, and we demonstrate how to use the parameter-expanded ECME with an arbitrary choice of weights and penalty function. In addition, we describe a generalized version of our parameter-expanded ECME algorithm that can be tailored to the challenges encountered in specific high-dimensional problems, and we study several interesting connections between this generalized algorithm and other well-known methods. Performance comparisons between our method, the EM algorithm, and several other optimization methods are presented using a series of simulation studies based upon both real and synthetic datasets.

相關內容

邏輯回歸(也稱“對數幾率回歸”)(英語:Logistic regression 或logit regression),即邏輯模型(英語:Logit model,也譯作“評定模型”、“分類評定模型”)是離散選擇法模型之一,屬于多重變量分析范疇,是社會學、生物統計學、臨床、數量心理學、計量經濟學、市場營銷等統計實證分析的常用方法。在統計學中,logistic模型(或logit模型)用于對存在的某個類或事件的概率建模,例如通過/失敗、贏/輸、活著/死了或健康/生病。這可以擴展到建模若干類事件,如確定一個圖像是否包含貓、狗、獅子等。圖像中檢測到的每個物體的概率都在0到1之間,其和為1。

Score-based generative models are a popular class of generative modelling techniques relying on stochastic differential equations (SDE). From their inception, it was realized that it was also possible to perform generation using ordinary differential equations (ODE) rather than SDE. This led to the introduction of the probability flow ODE approach and denoising diffusion implicit models. Flow matching methods have recently further extended these ODE-based approaches and approximate a flow between two arbitrary probability distributions. Previous work derived bounds on the approximation error of diffusion models under the stochastic sampling regime, given assumptions on the $L^2$ loss. We present error bounds for the flow matching procedure using fully deterministic sampling, assuming an $L^2$ bound on the approximation error and a certain regularity condition on the data distributions.

In this paper, we design sub-linear space streaming algorithms for estimating three fundamental parameters -- maximum independent set, minimum dominating set and maximum matching -- on sparse graph classes, i.e., graphs which satisfy $m=O(n)$ where $m,n$ is the number of edges, vertices respectively. Each of the three graph parameters we consider can have size $\Omega(n)$ even on sparse graph classes, and hence for sublinear-space algorithms we are restricted to parameter estimation instead of attempting to find a solution.

In this paper we derive sufficient conditions for the convergence of two popular alternating minimisation algorithms for dictionary learning - the Method of Optimal Directions (MOD) and Online Dictionary Learning (ODL), which can also be thought of as approximative K-SVD. We show that given a well-behaved initialisation that is either within distance at most $1/\log(K)$ to the generating dictionary or has a special structure ensuring that each element of the initialisation only points to one generating element, both algorithms will converge with geometric convergence rate to the generating dictionary. This is done even for data models with non-uniform distributions on the supports of the sparse coefficients. These allow the appearance frequency of the dictionary elements to vary heavily and thus model real data more closely.

Many machine learning approaches for decision making, such as reinforcement learning, rely on simulators or predictive models to forecast the time-evolution of quantities of interest, e.g., the state of an agent or the reward of a policy. Forecasts of such complex phenomena are commonly described by highly nonlinear dynamical systems, making their use in optimization-based decision-making challenging. Koopman operator theory offers a beneficial paradigm for addressing this problem by characterizing forecasts via linear dynamical systems. This makes system analysis and long-term predictions simple -- involving only matrix multiplications. However, the transformation to a linear system is generally non-trivial and unknown, requiring learning-based approaches. While there exists a variety of approaches, they usually lack crucial learning-theoretic guarantees, such that the behavior of the obtained models with increasing data and dimensionality is often unclear. We address the aforementioned by deriving a novel reproducing kernel Hilbert space (RKHS) that solely spans transformations into linear dynamical systems. The resulting Koopman Kernel Regression (KKR) framework enables the use of statistical learning tools from function approximation for novel convergence results and generalization risk bounds under weaker assumptions than existing work. Our numerical experiments indicate advantages over state-of-the-art statistical learning approaches for Koopman-based predictors.

We study the problem of approximate sampling from non-log-concave distributions, e.g., Gaussian mixtures, which is often challenging even in low dimensions due to their multimodality. We focus on performing this task via Markov chain Monte Carlo (MCMC) methods derived from discretizations of the overdamped Langevin diffusions, which are commonly known as Langevin Monte Carlo algorithms. Furthermore, we are also interested in two nonsmooth cases for which a large class of proximal MCMC methods have been developed: (i) a nonsmooth prior is considered with a Gaussian mixture likelihood; (ii) a Laplacian mixture distribution. Such nonsmooth and non-log-concave sampling tasks arise from a wide range of applications to Bayesian inference and imaging inverse problems such as image deconvolution. We perform numerical simulations to compare the performance of most commonly used Langevin Monte Carlo algorithms.

Modelling in biology must adapt to increasingly complex and massive data. The efficiency of the inference algorithms used to estimate model parameters is therefore questioned. Many of these are based on stochastic optimization processes which waste a significant part of the computation time due to their rejection sampling approaches. We introduce the Fixed Landscape Inference MethOd (\textit{flimo}), a new likelihood-free inference method for continuous state-space stochastic models. It applies deterministic gradient-based optimization algorithms to obtain a point estimate of the parameters, minimizing the difference between the data and some simulations according to some prescribed summary statistics. In this sense, it is analogous to Approximate Bayesian Computation (ABC). Like ABC, it can also provide an approximation of the distribution of the parameters. Three applications are proposed: a usual theoretical example, namely the inference of the parameters of g-and-k distributions; a population genetics problem, not so simple as it seems, namely the inference of a selective value from time series in a Wright-Fisher model; and simulations from a Ricker model, representing chaotic population dynamics. In the two first applications, the results show a drastic reduction of the computational time needed for the inference phase compared to the other methods, despite an equivalent accuracy. Even when likelihood-based methods are applicable, the simplicity and efficiency of \textit{flimo} make it a compelling alternative. Implementations in Julia and in R are available on \url{//metabarcoding.org/flimo}. To run \textit{flimo}, the user must simply be able to simulate data according to the chosen model.

Kakutani's Fixed Point theorem is a fundamental theorem in topology with numerous applications in game theory and economics. Computational formulations of Kakutani exist only in special cases and are too restrictive to be useful in reductions. In this paper, we provide a general computational formulation of Kakutani's Fixed Point Theorem and we prove that it is PPAD-complete. As an application of our theorem we are able to characterize the computational complexity of the following fundamental problems: (1) Concave Games. Introduced by the celebrated works of Debreu and Rosen in the 1950s and 60s, concave $n$-person games have found many important applications in Economics and Game Theory. We characterize the computational complexity of finding an equilibrium in such games. We show that a general formulation of this problem belongs to PPAD, and that finding an equilibrium is PPAD-hard even for a rather restricted games of this kind: strongly-concave utilities that can be expressed as multivariate polynomials of a constant degree with axis aligned box constraints. (2) Walrasian Equilibrium. Using Kakutani's fixed point Arrow and Debreu we resolve an open problem related to Walras's theorem on the existence of price equilibria in general economies. There are many results about the PPAD-hardness of Walrasian equilibria, but the inclusion in PPAD is only known for piecewise linear utilities. We show that the problem with general convex utilities is in PPAD. Along the way we provide a Lipschitz continuous version of Berge's maximum theorem that may be of independent interest.

Stein Variational Gradient Descent (SVGD) can transport particles along trajectories that reduce the KL divergence between the target and particle distribution but requires the target score function to compute the update. We introduce a new perspective on SVGD that views it as a local estimator of the reversed KL gradient flow. This perspective inspires us to propose new estimators that use local linear models to achieve the same purpose. The proposed estimators can be computed using only samples from the target and particle distribution without needing the target score function. Our proposed variational gradient estimators utilize local linear models, resulting in computational simplicity while maintaining effectiveness comparable to SVGD in terms of estimation biases. Additionally, we demonstrate that under a mild assumption, the estimation of high-dimensional gradient flow can be translated into a lower-dimensional estimation problem, leading to improved estimation accuracy. We validate our claims with experiments on both simulated and real-world datasets.

In the rapidly evolving field of deep learning, the performance of model inference has become a pivotal aspect as models become more complex and are deployed in diverse applications. Among these, autoregressive models stand out due to their state-of-the-art performance in numerous generative tasks. These models, by design, harness a temporal dependency structure, where the current token's probability distribution is conditioned on preceding tokens. This inherently sequential characteristic, however, adheres to the Markov Chain assumption and lacks temporal parallelism, which poses unique challenges. Particularly in industrial contexts where inference requests, following a Poisson time distribution, necessitate diverse response lengths, this absence of parallelism is more profound. Existing solutions, such as dynamic batching and concurrent model instances, nevertheless, come with severe overheads and a lack of flexibility, these coarse-grained methods fall short of achieving optimal latency and throughput. To address these shortcomings, we propose Flavor -- a temporal fusion framework for efficient inference in autoregressive models, eliminating the need for heuristic settings and applies to a wide range of inference scenarios. By providing more fine-grained parallelism on the temporality of requests and employing an efficient memory shuffle algorithm, Flover achieves up to 11x faster inference on GPT models compared to the cutting-edge solutions provided by NVIDIA Triton FasterTransformer. Crucially, by leveraging the advanced tensor parallel technique, Flover proves efficacious across diverse computational landscapes, from single-GPU setups to multi-node scenarios, thereby offering robust performance optimization that transcends hardware boundaries.

Missing values challenge the probabilistic wind power forecasting at both parameter estimation and operational forecasting stages. In this paper, we illustrate that we are allowed to estimate forecasting functions for each missing patterns conveniently, and propose an adaptive quantile regression model whose parameters can adapt to missing patterns. For that, we particularly design a feature extraction block within the quantile regression model, where parameters are set as a function of missingness pattern and only account for observed values. To avoidthe quantile-crossing phenomena, we design a multi-task model to ensure the monotonicity of quantiles, where higher quantiles are derived by the addition between lower quantiles and non-negative increments modeled by neural networks. The proposed approach is distribution-free and applicable to both missing-at-random and missing-not-at-random cases. Case studies demonstrate that the proposed approach achieves the state-of-the-art in terms of the continuous ranked probability score.

北京阿比特科技有限公司