亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider sequential hypothesis testing based on observations which are received in groups of random size. The observations are assumed to be independent both within and between the groups. We assume that the group sizes are independent and their distributions are known, and that the groups are formed independently of the observations. We are concerned with a problem of testing a simple hypothesis against a simple alternative. For any (group-) sequential test, we take into account the following three characteristics: its type I and type II error probabilities and the average cost of observations. Under mild conditions, we characterize the structure of sequential tests minimizing the average cost of observations among all sequential tests whose type I and type II error probabilities do not exceed some prescribed levels.

相關內容

Group一直是研究計算機支持的合作工作、人機交互、計算機支持的協作學習和社會技術研究的主要場所。該會議將社會科學、計算機科學、工程、設計、價值觀以及其他與小組工作相關的多個不同主題的工作結合起來,并進行了廣泛的概念化。官網鏈接: · 優化器 · 分解的 · DC · 學成 ·
2021 年 12 月 1 日

Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the multi-marginal OT (MMOT) problem and unify several popular OT methods under its umbrella by promoting structural information on the coupling. We show that incorporating such structural information into MMOT results in an instance of a different of convex (DC) programming problem allowing us to solve it numerically. Despite high computational cost of the latter procedure, the solutions provided by DC optimization are usually as qualitative as those obtained using currently employed optimization schemes.

We show that any nonzero polynomial in the ideal generated by the $r \times r$ minors of an $n \times n$ matrix $X$ can be used to efficiently approximate the determinant. For any nonzero polynomial $f$ in this ideal, we construct a small depth-three $f$-oracle circuit that approximates the determinant of size $\Theta(r^{1/3})$ in the sense of border complexity. For many classes of algebraic circuits, this implies that every nonzero polynomial in the ideal generated by $r \times r$ minors is at least as hard to approximately compute as the determinant of size $\Theta(r^{1/3})$. We also prove an analogous result for the Pfaffian of a $2n \times 2n$ skew-symmetric matrix and the ideal generated by Pfaffians of $2r \times 2r$ principal submatrices. This answers a recent question of Grochow about complexity in polynomial ideals in the setting of border complexity. We give several applications of our result, two of which are highlighted below. $\bullet$ We prove super-polynomial lower bounds for Ideal Proof System refutations computed by low-depth circuits. This extends the recent breakthrough low-depth circuit lower bounds of Limaye, Srinivasan, and Tavenas to the setting of proof complexity. For many natural circuit classes, we show that the approximative proof complexity of our hard instance is governed by the approximative circuit complexity of the determinant. $\bullet$ We construct new hitting set generators for polynomial-size low-depth circuits. For any $\varepsilon > 0$, we construct generators with seed length $O(n^\varepsilon)$ that attain a near-optimal tradeoff between their seed length and degree, and are computable by low-depth circuits of near-linear size (with respect to the size of their output). This matches the seed length of the generators recently obtained by Limaye, Srinivasan, and Tavenas, but improves on the generator's degree and circuit complexity.

We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect, optimal treatment rule, and average reward, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical experiments, we show marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.

We consider the problem of testing for two Gibbs probabilities $\mu_0$ and $\mu_1$ defined for a dynamical system $(\Omega,T)$. Due to the fact that in general full orbits are not observable or computable, one needs to restrict to subclasses of tests defined by a finite time series $h(x_0), h(x_1)=h(T(x_0)),..., h(x_n)=h(T^n(x_0))$, $x_0\in \Omega$, $n\ge 0$, where $h:\Omega\to\mathbb R$ denotes a suitable measurable function. We determine in each class the Neyman-Pearson tests, the minimax tests, and the Bayes solutions and show the asymptotic decay of their risk functions as $n\to\infty$. In the case of $\Omega$ being a symbolic space, for each $n\in \mathbb{N}$, these optimal tests rely on the information of the measures for cylinder sets of size $n$.

A bipartite experiment consists of one set of units being assigned treatments and another set of units for which we measure outcomes. The two sets of units are connected by a bipartite graph, governing how the treated units can affect the outcome units. In this paper, we consider estimation of the average total treatment effect in the bipartite experimental framework under a linear exposure-response model. We introduce the Exposure Reweighted Linear (ERL) estimator, and show that the estimator is unbiased, consistent and asymptotically normal, provided that the bipartite graph is sufficiently sparse. To facilitate inference, we introduce an unbiased and consistent estimator of the variance of the ERL point estimator. In addition, we introduce a cluster-based design, Exposure-Design, that uses heuristics to increase the precision of the ERL estimator by realizing a desirable exposure distribution.

The Neyman-Pearson (NP) binary classification paradigm constrains the more severe type of error (e.g., the type I error) under a preferred level while minimizing the other (e.g., the type II error). This paradigm is suitable for applications such as severe disease diagnosis, fraud detection, among others. A series of NP classifiers have been developed to guarantee the type I error control with high probability. However, these existing classifiers involve a sample splitting step: a mixture of class 0 and class 1 observations to construct a scoring function and some left-out class 0 observations to construct a threshold. This splitting enables classifier construction built upon independence, but it amounts to insufficient use of data for training and a potentially higher type II error. Leveraging a canonical linear discriminant analysis model, we derive a quantitative CLT for a certain functional of quadratic forms of the inverse of sample and population covariance matrices, and based on this result, develop for the first time NP classifiers without splitting the training sample. Numerical experiments have confirmed the advantages of our new non-splitting parametric strategy.

Defining multivariate generalizations of the classical univariate ranks has been a long-standing open problem in statistics. Optimal transport has been shown to offer a solution by transporting data points to grid approximating a reference measure (Chernozhukov et al., 2017; Hallin, 2017; Hallin et al., 2021a). We take up this new perspective to develop and study multivariate analogues of popular correlations measures including the sign covariance, Kendall's tau and Spearman's rho. Our tests are genuinely distribution-free, hence valid irrespective of the actual (absolutely continuous) distributions of the observations. We present asymptotic distribution theory for these new statistics, providing asymptotic approximations to critical values to be used for testing independence as well as an analysis of power of the resulting tests. Interestingly, we are able to establish a multivariate elliptical Chernoff-Savage property, which guarantees that, under ellipticity, our nonparametric tests of independence when compared to Gaussian procedures enjoy an asymptotic relative efficiency of one or larger. Hence, the nonparametric tests constitute a safe replacement for procedures based on multivariate Gaussianity.

Data in non-Euclidean spaces are commonly encountered in many fields of Science and Engineering. For instance, in Robotics, attitude sensors capture orientation which is an element of a Lie group. In the recent past, several researchers have reported methods that take into account the geometry of Lie Groups in designing parameter estimation algorithms in nonlinear spaces. Maximum likelihood estimators (MLE) are quite commonly used for such tasks and it is well known in the field of statistics that Stein's shrinkage estimators dominate the MLE in a mean-squared sense assuming the observations are from a normal population. In this paper, we present a novel shrinkage estimator for data residing in Lie groups, specifically, abelian or compact Lie groups. The key theoretical results presented in this paper are: (i) Stein's Lemma and its proof for Lie groups and, (ii) proof of dominance of the proposed shrinkage estimator over MLE for abelian and compact Lie groups. We present examples of simulation studies of the dominance of the proposed shrinkage estimator and an application of shrinkage estimation to multiple-robot localization.

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.

北京阿比特科技有限公司