国产白浆一区二区无码视频在线_伊人久久大香线蕉精品69_国内下药视频在线观看_精品欧洲亚洲国产日韩一区_国产一级H片普通话在线观看_免费国产最新进精品视频_久久农村少妇黄色毛片

2021 年 11 月 7 日

Change Point Analysis of Multivariate Data: Using Multivariate Rank-based Distribution-free Nonparametric Testing via Measure Transportation with Applications in Tumor Microarrays and Dementia

Amanda Ng

from arxiv, 20 pages and 4 figures

In this paper, I propose a general algorithm for multiple change point analysis via multivariate distribution-free nonparametric testing based on the concept of ranks that are defined by measure transportation. Multivariate ranks and the usual one-dimensional ranks both share an important property: they are both distribution-free. This finding allows for the creation of nonparametric tests that are distribution-free under the null hypothesis. This method has applications in a variety of fields, and in this paper I implement this algorithm to a microarray dataset for individuals with bladder tumors, an ECoG snapshot for a patient with epilepsy, and in the context of trajectories of CASI scores by education level and dementia status. Each change point denotes a shift in the rate of change of Cognitive Abilities score over years, indicating the existence of preclinical dementia. Here I will estimate the number of change points and each of their locations within a multivariate series of time-ordered observations. This paper will examine the multiple change point question in a broad setting in which the observed distributions and number of change points are unspecified, rather than assume the time series observations follow a parametric model or there is one change point, as many works in this area assume. The objective here is to create an algorithm for change point detection while making as few assumptions about the dataset as possible. Presented are the theoretical properties of this new algorithm and the conditions under which the approximate number of change points and their locations can be estimated. This algorithm has also been successfully implemented in the R package recp, which is available on GitHub. A section of this paper is dedicated to the execution of this procedure, as well as the use of the recp package.

相關內容

參數化(hua)模型(xing)

關注 0

估計/估計量 · 推斷 · 特化 · 規范化的 · 線性的 ·

2022 年 1 月 12 日

Inference in Regression Discontinuity Designs with High-Dimensional Covariates

Alexander Krei?,Christoph Rothe

We study regression discontinuity designs in which many covariates, possibly much more than the number of observations, are available. We consider a two-step algorithm which first selects the set of covariates to be used through a localized Lasso-type procedure, and then, in a second step, estimates the treatment effect by including the selected covariates into the usual local linear estimator. We provide an in-depth analysis of the algorithm's theoretical properties, showing that, under an approximate sparsity condition, the resulting estimator is asymptotically normal, with asymptotic bias and variance that are conceptually similar to those obtained in low-dimensional settings. Bandwidth selection and inference can be carried out using standard methods. We also provide simulations and an empirical application.

優化器 · 奇異的 · 基 · Integration · 正則化項 ·

2022 年 1 月 11 日

Transport type metrics on the space of probability measures involving singular base measures

Luca Nenna,Brendan Pass

We develop the theory of a metric, which we call the $\nu$-based Wasserstein metric and denote by $W_\nu$, on the set of probability measures $\mathcal P(X)$ on a domain $X \subseteq \mathbb{R}^m$. This metric is based on a slight refinement of the notion of generalized geodesics with respect to a base measure $\nu$ and is relevant in particular for the case when $\nu$ is singular with respect to $m$-dimensional Lebesgue measure; it is also closely related to the concept of linearized optimal transport. The $\nu$-based Wasserstein metric is defined in terms of an iterated variational problem involving optimal transport to $\nu$; we also characterize it in terms of integrations of classical Wasserstein distance between the conditional probabilities and through limits of certain multi-marginal optimal transport problems. As we vary the base measure $\nu$, the $\nu$-based Wasserstein metric interpolates between the usual quadratic Wasserstein distance and a metric associated with the uniquely defined generalized geodesics obtained when $\nu$ is sufficiently regular. When $\nu$ concentrates on a lower dimensional submanifold of $\mathbb{R}^m$, we prove that the variational problem in the definition of the $\nu$-based Wasserstein distance has a unique solution. We establish geodesic convexity of the usual class of functionals and of the set of source measures $\mu$ such that optimal transport between $\mu$ and $\nu$ satisfies a strengthening of the generalized nestedness condition introduced in \cite{McCannPass20}. We also present two applications of the ideas introduced here. First, our dual metric is used to prove convergence of an iterative scheme to solve a variational problem arising in game theory. We also use the multi-marginal formulation to characterize solutions to the multi-marginal problem by an ordinary differential equation, yielding a new numerical method for it.

估計/估計量 · 向量化 · 似然 · 聯系函數 · 規范化的 ·

2022 年 1 月 11 日

Mixed-type multivariate response regression with covariance estimation

Karl Oskar Ekvall,Aaron J. Molstad

We propose a new method for multivariate response regression and covariance estimation when elements of the response vector are of mixed types, for example some continuous and some discrete. Our method is based on a model which assumes the observable mixed-type response vector is connected to a latent multivariate normal response linear regression through a link function. We explore the properties of this model and show its parameters are identifiable under reasonable conditions. We impose no parametric restrictions on the covariance of the latent normal other than positive definiteness, thereby avoiding assumptions about unobservable variables which can be difficult to verify in practice. To accommodate this generality, we propose a novel algorithm for approximate maximum likelihood estimation that works "off-the-shelf" with many different combinations of response types, and which scales well in the dimension of the response vector. Our method typically gives better predictions and parameter estimates than fitting separate models for the different response types and allows for approximate likelihood ratio testing of relevant hypotheses such as independence of responses. The usefulness of the proposed method is illustrated in simulations; and one biomedical and one genomic data example.

優化器 · 學成 · 小批量梯度下降法 · Machine Learning · 小批量梯度 ·

2022 年 1 月 11 日

Optimal transport-based machine learning to match specific expression patterns in omics data

Thi Thanh Yen Nguyen,Olivier Bouaziz,Warith Harchaoui,Christian Neri,Antoine Chambaz

We present two algorithms designed to learn a pattern of correspondence between two data sets in situations where it is desirable to match elements that exhibit a relationship belonging to a known parametric model. In the motivating case study, the challenge is to better understand micro-RNA (miRNA) regulation in the striatum of Huntington's disease (HD) model mice. The two data sets contain miRNA and messenger-RNA (mRNA) data, respectively, each data point consisting in a multi-dimensional profile. The biological hypothesis is that if a miRNA induces the degradation of a target mRNA or blocks its translation into proteins, or both, then the profile of the former should be similar to minus the profile of the latter (a particular form of affine relationship).The algorithms unfold in two stages. During the first stage, an optimal transport plan P and an optimal affine transformation are learned, using the Sinkhorn-Knopp algorithm and a mini-batch gradient descent. During the second stage, P is exploited to derive either several co-clusters or several sets of matched elements.We share codes that implement our algorithms. A simulation study illustrates how they work and perform. A brief summary of the real data application in the motivating case-study further illustrates the applicability and interest of the algorithms.

估計/估計量 · 推斷 · 統計量 · 優化器 · 線性的 ·

2022 年 1 月 11 日

Estimation and Inference with Proxy Data and its Genetic Applications

Sai Li,T. Tony Cai,Hongzhe Li

Existing high-dimensional statistical methods are largely established for analyzing individual-level data. In this work, we study estimation and inference for high-dimensional linear models where we only observe "proxy data", which include the marginal statistics and sample covariance matrix that are computed based on different sets of individuals. We develop a rate optimal method for estimation and inference for the regression coefficient vector and its linear functionals based on the proxy data. Moreover, we show the intrinsic limitations in the proxy-data based inference: the minimax optimal rate for estimation is slower than that in the conventional case where individual data are observed; the power for testing and multiple testing does not go to one as the signal strength goes to infinity. These interesting findings are illustrated through simulation studies and an analysis of a dataset concerning the genetic associations of hindlimb muscle weight in a mouse population.

INTERACT · 可理解性 · PARCO · Performer · state-of-the-art ·

2022 年 1 月 7 日

In Situ Data Summaries for Flexible Feature Analysis in Large-Scale Multiphase Flow Simulations

Soumya Dutta,Terece Turton,David Rogers,Jordan Musser,James Ahrens,Ann Almgren

The study of multiphase flow is essential for understanding the complex interactions of various materials. In particular, when designing chemical reactors such as fluidized bed reactors (FBR), a detailed understanding of the hydrodynamics is critical for optimizing reactor performance and stability. An FBR allows experts to conduct different types of chemical reactions involving multiphase materials, especially interaction between gas and solids. During such complex chemical processes, formation of void regions in the reactor, generally termed as bubbles, is an important phenomenon. Study of these bubbles has a deep implication in predicting the reactor's overall efficiency. But physical experiments needed to understand bubble dynamics are costly and non-trivial. Therefore, to study such chemical processes and bubble dynamics, a state-of-the-art massively parallel computational fluid dynamics discrete element model (CFD-DEM), MFIX-Exa is being developed for simulating multiphase flows. Despite the proven accuracy of MFIX-Exa in modeling bubbling phenomena, the very-large size of the output data prohibits the use of traditional post hoc analysis capabilities in both storage and I/O time. To address these issues and allow the application scientists to explore the bubble dynamics in an efficient and timely manner, we have developed an end-to-end visual analytics pipeline that enables in situ detection of bubbles using statistical techniques, followed by a flexible and interactive visual exploration of bubble dynamics in the post hoc analysis phase. Positive feedback from the experts has indicated the efficacy of the proposed approach for exploring bubble dynamics in very-large scale multiphase flow simulations.

相關系數 · 協方差矩陣 · 樣本 · 可辨認的 · 幾乎必然 ·

2022 年 1 月 4 日

Large sample correlation matrices: a comparison theorem and its applications

Johannes Heiny

from arxiv, 20 pages

In this paper, we show that the diagonal of a high-dimensional sample covariance matrix stemming from $n$ independent observations of a $p$-dimensional time series with finite fourth moments can be approximated in spectral norm by the diagonal of the population covariance matrix. We assume that $n,p\to \infty$ with $p/n$ tending to a constant which might be positive or zero. As applications, we provide an approximation of the sample correlation matrix ${\mathbf R}$ and derive a variety of results for its eigenvalues. We identify the limiting spectral distribution of ${\mathbf R}$ and construct an estimator for the population correlation matrix and its eigenvalues. Finally, the almost sure limits of the extreme eigenvalues of ${\mathbf R}$ in a generalized spiked correlation model are analyzed.

分解的 · 核化 · Notability · 生成方法 · 約束 ·

2021 年 3 月 8 日

Low-Rank Sinkhorn Factorization

Meyer Scetbon,Marco Cuturi,Gabriel Peyré

Several recent applications of optimal transport (OT) theory to machine learning have relied on regularization, notably entropy and the Sinkhorn algorithm. Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors. Another route lies instead in imposing low-rank constraints on the feasible set of couplings considered in OT problems, with no approximations on cost nor kernel matrices. This route was first explored by Forrow et al., 2018, who proposed an algorithm tailored for the squared Euclidean ground cost, using a proxy objective that can be solved through the machinery of regularized 2-Wasserstein barycenters. Building on this, we introduce in this work a generic approach that aims at solving, in full generality, the OT problem under low-rank constraints with arbitrary costs. Our algorithm relies on an explicit factorization of low rank couplings as a product of \textit{sub-coupling} factors linked by a common marginal; similar to an NMF approach, we alternatively updates these factors. We prove the non-asymptotic stationary convergence of this algorithm and illustrate its efficiency on benchmark experiments.

秩 · 評分函數 · 無偏 · 得分 · 泛函 ·

2020 年 8 月 20 日

Analysis of Multivariate Scoring Functions for Automatic Unbiased Learning to Rank

Tao Yang,Shikai Fang,Shibo Li,Yulan Wang,Qingyao Ai

from arxiv, 4 pages, 2 figures. It has already been accepted and will show in Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20), October 19--23, 2020

Leveraging biased click data for optimizing learning to rank systems has been a popular approach in information retrieval. Because click data is often noisy and biased, a variety of methods have been proposed to construct unbiased learning to rank (ULTR) algorithms for the learning of unbiased ranking models. Among them, automatic unbiased learning to rank (AutoULTR) algorithms that jointly learn user bias models (i.e., propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost in practice. Despite their differences in theories and algorithm design, existing studies on ULTR usually use uni-variate ranking functions to score each document or result independently. On the other hand, recent advances in context-aware learning-to-rank models have shown that multivariate scoring functions, which read multiple documents together and predict their ranking scores jointly, are more powerful than uni-variate ranking functions in ranking tasks with human-annotated relevance labels. Whether such superior performance would hold in ULTR with noisy data, however, is mostly unknown. In this paper, we investigate existing multivariate scoring functions and AutoULTR algorithms in theory and prove that permutation invariance is a crucial factor that determines whether a context-aware learning-to-rank model could be applied to existing AutoULTR framework. Our experiments with synthetic clicks on two large-scale benchmark datasets show that AutoULTR models with permutation-invariant multivariate scoring functions significantly outperform those with uni-variate scoring functions and permutation-variant multivariate scoring functions.

估計/估計量 · 圖 · Branch · Processing（編程語言） · MoDELS ·

2017 年 6 月 13 日

Hawkes graphs

Paul Embrechts,Matthias Kirchner

from arxiv, 22 pages

This paper introduces the Hawkes skeleton and the Hawkes graph. These objects summarize the branching structure of a multivariate Hawkes point process in a compact, yet meaningful way. We demonstrate how graph-theoretic vocabulary (`ancestor sets', `parent sets', `connectivity', `walks', `walk weights', ...) is very convenient for the discussion of multivariate Hawkes processes. For example, we reformulate the classic eigenvalue-based subcriticality criterion of multitype branching processes in graph terms. Next to these more terminological contributions, we show how the graph view may be used for the specification and estimation of Hawkes models from large, multitype event streams. Based on earlier work, we give a nonparametric statistical procedure to estimate the Hawkes skeleton and the Hawkes graph from data. We show how the graph estimation may then be used for specifying and fitting parametric Hawkes models. Our estimation method avoids the a priori assumptions on the model from a straighforward MLE-approach and is numerically more flexible than the latter. Our method has two tuning parameters: one controlling numerical complexity, the other one controlling the sparseness of the estimated graph. A simulation study confirms that the presented procedure works as desired. We pay special attention to computational issues in the implementation. This makes our results applicable to high-dimensional event-stream data, such as dozens of event streams and thousands of events per component.