亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. This model, which we call a matrix GLM, covers many widely studied problems in statistical learning, including mixed linear regression, max-affine regression, and mixture-of-experts. In mixed linear regression, each observation comes from one of $L$ signal vectors (regressors), but we do not know which one; in max-affine regression, each observation comes from the maximum of $L$ affine functions, each defined via a different signal vector. The goal in all these problems is to estimate the signals, and possibly some of the latent variables, from the observations. We propose a novel approximate message passing (AMP) algorithm for estimation in a matrix GLM and rigorously characterize its performance in the high-dimensional limit. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic mean-squared error. The state evolution characterization can be used to tailor the AMP algorithm to take advantage of any structural information known about the signals. Using state evolution, we derive an optimal choice of AMP `denoising' functions that minimizes the estimation error in each iteration. The theoretical results are validated by numerical simulations for mixed linear regression, max-affine regression, and mixture-of-experts. For max-affine regression, we propose an algorithm that combines AMP with expectation-maximization to estimate intercepts of the model along with the signals. The numerical results show that AMP significantly outperforms other estimators for mixed linear regression and max-affine regression in most parameter regimes.

相關內容

線性回歸是(shi)利用數理統計中(zhong)回歸分析,來確定(ding)兩(liang)種或兩(liang)種以上變量間相互(hu)依賴的(de)定(ding)量關系的(de)一種統計分析方(fang)法,運用十分廣(guang)泛。其表(biao)達形式(shi)為(wei)y = w'x+e,e為(wei)誤差服(fu)從均值為(wei)0的(de)正態分布。

知識薈萃

精品入門和進階(jie)教程、論文和代碼整理等

更多

查(cha)看相關VIP內容、論文、資訊等

This paper studies delayed stochastic algorithms for weakly convex optimization in a distributed network with workers connected to a master node. More specifically, we consider a structured stochastic weakly convex objective function which is the composition of a convex function and a smooth nonconvex function. Recently, Xu et al. 2022 showed that an inertial stochastic subgradient method converges at a rate of $\mathcal{O}(\tau/\sqrt{K})$, which suffers a significant penalty from the maximum information delay $\tau$. To alleviate this issue, we propose a new delayed stochastic prox-linear ($\texttt{DSPL}$) method in which the master performs the proximal update of the parameters and the workers only need to linearly approximate the inner smooth function. Somewhat surprisingly, we show that the delays only affect the high order term in the complexity rate and hence, are negligible after a certain number of $\texttt{DSPL}$ iterations. Moreover, to further improve the empirical performance, we propose a delayed extrapolated prox-linear ($\texttt{DSEPL}$) method which employs Polyak-type momentum to speed up the algorithm convergence. Building on the tools for analyzing $\texttt{DSPL}$, we also develop improved analysis of delayed stochastic subgradient method ($\texttt{DSGD}$). In particular, for general weakly convex problems, we show that convergence of $\texttt{DSGD}$ only depends on the expected delay.

Sampling-based planning algorithm is a powerful tool for solving planning problems in high-dimensional state spaces. In this article, we present a novel approach to sampling in the most promising regions, which significantly reduces planning time-consumption. The RRT# algorithm defines the Relevant Region based on the cost-to-come provided by the optimal forward-searching tree. However, it uses the cumulative cost of a direct connection between the current state and the goal state as the cost-to-go. To improve the path planning efficiency, we propose a batch sampling method that samples in a refined Relevant Region with a direct sampling strategy, which is defined according to the optimal cost-to-come and the adaptive cost-to-go, taking advantage of various sources of heuristic information. The proposed sampling approach allows the algorithm to build the search tree in the direction of the most promising area, resulting in a superior initial solution quality and reducing the overall computation time compared to related work. To validate the effectiveness of our method, we conducted several simulations in both $SE(2)$ and $SE(3)$ state spaces. And the simulation results demonstrate the superiorities of proposed algorithm.

This paper presents an algorithm that generates the conditional moment inequalities that characterize the identified set of the common parameter of various semi-parametric panel multinomial choice models. I consider both static and dynamic models, and consider various weak stochastic restrictions on the distribution of observed and unobserved components of the models. For a broad class of such stochastic restrictions, the paper demonstrates that the inequalities characterizing the identified set of the common parameter can be obtained as solutions of multiple objective linear programs (MOLPs), thereby transforming the task of finding these inequalities into a purely computational problem. The algorithm that I provide reproduces many well-known results, including the conditional moment inequalities derived in Manski 1987, Pakes and Porter 2023, and Khan, Ponomareva, and Tamer 2023. Moreover, I use the algorithm to generate some new results, by providing characterizations of the identified set in some cases that were left open in Pakes and Porter 2023 and Khan, Ponomareva, and Tamer 2023, as well as characterizations of the identified set under alternative stochastic restrictions.

In this paper, we explore the feasibility of finding algorithm implementations from code. Successfully matching code and algorithms can help understand unknown code, provide reference implementations, and automatically collect data for learning-based program synthesis. To achieve the goal, we designed a new language named p-language to specify the algorithms and a static analyzer for the p-language to automatically extract control flow, math, and natural language information from the algorithm descriptions. We embedded the output of p-language (p-code) and source code in a common vector space using self-supervised machine learning methods to match algorithm with code without any manual annotation. We developed a tool named Beryllium. It takes pseudo code as a query and returns a list of ranked code snippets that likely match the algorithm query. Our evaluation on Stony Brook Algorithm Repository and popular GitHub projects show that Beryllium significantly outperformed the state-of-the-art code search tools in both C and Java. Specifically, for 98.5%, 93.8%, and 66.2% queries, we found the algorithm implementations in the top 25, 10, and 1 ranked list, respectively. Given 87 algorithm queries, we found implementations for 74 algorithms in the GitHub projects where we did not know the algorithms before.

We consider a type of pull voting suitable for discrete numeric opinions which can be compared on a linear scale, for example, 1 ('disagree strongly'), 2 ('disagree'), $\ldots,$ 5 ('agree strongly'). On observing the opinion of a random neighbour, a vertex changes its opinion incrementally towards the value of the neighbour's opinion, if different. For opinions drawn from a set $\{1,2,\ldots,k\}$, the opinion of the vertex would change by $+1$ if the opinion of the neighbour is larger, or by $-1$, if it is smaller. It is not clear how to predict the outcome of this process, but we observe that the total weight of the system, that is, the sum of the individual opinions of all vertices, is a martingale. This allows us analyse the outcome of the process on some classes of dense expanders such as clique graphs $K_n$ and random graphs $ G_{n,p}$ for suitably large $p$. If the average of the original opinions satisfies $i \le c \le i+1$ for some integer $i$, then the asymptotic probability that opinion $i$ wins is $i+1-c$, and the probability that opinion $i+1$ wins is $c-i$. With high probability, the winning opinion cannot be other than $i$ or $i+1$. To contrast this, we show that for a path and opinions $0,1,2$ arranged initially in non-decreasing order along the path, the outcome is very different. Any of the opinions can win with constant probability, provided that each of the two extreme opinions $0$ and $2$ is initially supported by a constant fraction of vertices.

Signature transforms are iterated path integrals of continuous and discrete-time time series data, and their universal nonlinearity linearizes the problem of feature selection. This paper revisits the consistency issue of Lasso regression for the signature transform, both theoretically and numerically. Our study shows that, for processes and time series that are closer to Brownian motion or random walk with weaker inter-dimensional correlations, the Lasso regression is more consistent for their signatures defined by It\^o integrals; for mean reverting processes and time series, their signatures defined by Stratonovich integrals have more consistency in the Lasso regression. Our findings highlight the importance of choosing appropriate definitions of signatures and stochastic models in statistical inference and machine learning.

In this paper, we study the problem of robust sparse mean estimation, where the goal is to estimate a $k$-sparse mean from a collection of partially corrupted samples drawn from a heavy-tailed distribution. Existing estimators face two critical challenges in this setting. First, they are limited by a conjectured computational-statistical tradeoff, implying that any computationally efficient algorithm needs $\tilde\Omega(k^2)$ samples, while its statistically-optimal counterpart only requires $\tilde O(k)$ samples. Second, the existing estimators fall short of practical use as they scale poorly with the ambient dimension. This paper presents a simple mean estimator that overcomes both challenges under moderate conditions: it runs in near-linear time and memory (both with respect to the ambient dimension) while requiring only $\tilde O(k)$ samples to recover the true mean. At the core of our method lies an incremental learning phenomenon: we introduce a simple nonconvex framework that can incrementally learn the top-$k$ nonzero elements of the mean while keeping the zero elements arbitrarily small. Unlike existing estimators, our method does not need any prior knowledge of the sparsity level $k$. We prove the optimality of our estimator by providing a matching information-theoretic lower bound. Finally, we conduct a series of simulations to corroborate our theoretical findings. Our code is available at //github.com/huihui0902/Robust_mean_estimation.

These notes are an overview of some classical linear methods in Multivariate Data Analysis. This is a good old domain, well established since the 60's, and refreshed timely as a key step in statistical learning. It can be presented as part of statistical learning, or as dimensionality reduction with a geometric flavor. Both approaches are tightly linked: it is easier to learn patterns from data in low dimensional spaces than in high-dimensional spaces. It is shown how a diversity of methods and tools boil down to a single core methods, PCA with SVD, such that the efforts to optimize codes for analyzing massive data sets like distributed memory and task-based programming or to improve the efficiency of the algorithms like Randomised SVD can focus on this shared core method, and benefit to all methods.

While many languages possess processes of joining two or more words to create compound words, previous studies have been typically limited only to languages with excessively productive compound formation (e.g., German, Dutch) and there is no public dataset containing compound and non-compound words across a large number of languages. In this work, we systematically study decompounding, the task of splitting compound words into their constituents, at a wide scale. We first address the data gap by introducing a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary. We then use this dataset to evaluate an array of Large Language Models (LLMs) on the decompounding task. We find that LLMs perform poorly, especially on words which are tokenized unfavorably by subword tokenization. We thus introduce a novel methodology to train dedicated models for decompounding. The proposed two-stage procedure relies on a fully self-supervised objective in the first stage, while the second, supervised learning stage optionally fine-tunes the model on the annotated Wiktionary data. Our self-supervised models outperform the prior best unsupervised decompounding models by 13.9% accuracy on average. Our fine-tuned models outperform all prior (language-specific) decompounding tools. Furthermore, we use our models to leverage decompounding during the creation of a subword tokenizer, which we refer to as CompoundPiece. CompoundPiece tokenizes compound words more favorably on average, leading to improved performance on decompounding over an otherwise equivalent model using SentencePiece tokenization.

Nonparametric maximum likelihood estimators (MLEs) in inverse problems often have non-normal limit distributions, like Chernoff's distribution. However, if one considers smooth functionals of the model, with corresponding functionals of the MLE, one gets normal limit distributions and faster rates of convergence. We demonstrate this for a model for the incubation time of a disease. The usual approach in the latter models is to use parametric distributions, like Weibull and gamma distributions, which leads to inconsistent estimators. Smoothed bootstrap methods are discussed for constructing confidence intervals. The classical bootstrap, based on the nonparametric MLE itself, has been proved to be inconsistent in this situation.

北京阿比特科技有限公司