亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Conventional information-theoretic quantities assume access to probability distributions. Estimating such distributions is not trivial. Here, we consider function-based formulations of cross entropy that sidesteps this a priori estimation requirement. We propose three measures of R\'enyi's $\alpha$-cross-entropies in the setting of reproducing-kernel Hilbert spaces. Each measure has its appeals. We prove that we can estimate these measures in an unbiased, non-parametric, and minimax-optimal way. We do this via sample-constructed Gram matrices. This yields matrix-based estimators of R\'enyi's $\alpha$-cross-entropies. These estimators satisfy all of the axioms that R\'enyi established for divergences. Our cross-entropies can thus be used for assessing distributional differences. They are also appropriate for handling high-dimensional distributions, since the convergence rate of our estimator is independent of the sample dimensionality. Python code for implementing these measures can be found at //github.com/isledge/MBRCE

相關內容

Large observational data are increasingly available in disciplines such as health, economic and social sciences, where researchers are interested in causal questions rather than prediction. In this paper, we examine the problem of estimating heterogeneous treatment effects using non-parametric regression-based methods, starting from an empirical study aimed at investigating the effect of participation in school meal programs on health indicators. Firstly, we introduce the setup and the issues related to conducting causal inference with observational or non-fully randomized data, and how these issues can be tackled with the help of statistical learning tools. Then, we review and develop a unifying taxonomy of the existing state-of-the-art frameworks that allow for individual treatment effects estimation via non-parametric regression models. After presenting a brief overview on the problem of model selection, we illustrate the performance of some of the methods on three different simulated studies. We conclude by demonstrating the use of some of the methods on an empirical analysis of the school meal program data.

Existing results for low-rank matrix recovery largely focus on quadratic loss, which enjoys favorable properties such as restricted strong convexity/smoothness (RSC/RSM) and well conditioning over all low rank matrices. However, many interesting problems involve more general, non-quadratic losses, which do not satisfy such properties. For these problems, standard nonconvex approaches such as rank-constrained projected gradient descent (a.k.a. iterative hard thresholding) and Burer-Monteiro factorization could have poor empirical performance, and there is no satisfactory theory guaranteeing global and fast convergence for these algorithms. In this paper, we show that a critical component in provable low-rank recovery with non-quadratic loss is a regularity projection oracle. This oracle restricts iterates to low-rank matrices within an appropriate bounded set, over which the loss function is well behaved and satisfies a set of approximate RSC/RSM conditions. Accordingly, we analyze an (averaged) projected gradient method equipped with such an oracle, and prove that it converges globally and linearly. Our results apply to a wide range of non-quadratic low-rank estimation problems including one bit matrix sensing/completion, individualized rank aggregation, and more broadly generalized linear models with rank constraints.

The asymptotic behaviour of Linear Spectral Statistics (LSS) of the smoothed periodogram estimator of the spectral coherency matrix of a complex Gaussian high-dimensional time series $(\y_n)_{n \in \mathbb{Z}}$ with independent components is studied under the asymptotic regime where the sample size $N$ converges towards $+\infty$ while the dimension $M$ of $\y$ and the smoothing span of the estimator grow to infinity at the same rate in such a way that $\frac{M}{N} \rightarrow 0$. It is established that, at each frequency, the estimated spectral coherency matrix is close from the sample covariance matrix of an independent identically $\mathcal{N}_{\mathbb{C}}(0,\I_M)$ distributed sequence, and that its empirical eigenvalue distribution converges towards the Marcenko-Pastur distribution. This allows to conclude that each LSS has a deterministic behaviour that can be evaluated explicitly. Using concentration inequalities, it is shown that the order of magnitude of the supremum over the frequencies of the deviation of each LSS from its deterministic approximation is of the order of $\frac{1}{M} + \frac{\sqrt{M}}{N}+ (\frac{M}{N})^{3}$ where $N$ is the sample size. Numerical simulations supports our results.

This paper presents some new results on maximum likelihood of incomplete data. Finite sample properties of conditional observed information matrices are established. In particular, they possess the same Loewner partial ordering properties as the expected information matrices do. In its new form, the observed Fisher information (OFI) simplifies conditional expectation of outer product of the complete-data score function appearing in the Louis (1982) general matrix formula. It verifies positive definiteness and consistency to the expected Fisher information as the sample size increases. Furthermore, it shows a resulting information loss presented in the incomplete data. For this reason, the OFI may not be the right (consistent and efficient) estimator to derive the standard error (SE) of maximum likelihood estimates (MLE) for incomplete data. A sandwich estimator of covariance matrix is developed to provide consistent and efficient estimates of SE. The proposed sandwich estimator coincides with the Huber sandwich estimator for model misspecification under complete data (Huber, 1967; Freedman, 2006; Little and Rubin, 2020). However, in contrast to the latter, the new estimator does not involve OFI which notably gives an appealing feature for application. Recursive algorithms for the MLE, the observed information and the sandwich estimator are presented. Application to parameter estimation of a regime switching conditional Markov jump process is considered to verify the results. The recursive equations for the inverse OFI generalizes the algorithm of Hero and Fessler (1994). The simulation study confirms that the MLEs are accurate and consistent having asymptotic normality. The sandwich estimator produces standard error of the MLE close to their analytic values compared to those overestimated by the OFI.

We give a new algorithm for the estimation of the cross-covariance matrix $\mathbb{E} XY'$ of two large dimensional signals $X\in\mathbb{R}^n$, $Y\in \mathbb{R}^p$ in the context where the number $T$ of observations of the pair $(X,Y)$ is itself large, but with $T/n$ and $T/p$ not supposed to be small. In the asymptotic regime where $n,p,T$ are large, with high probability, this algorithm is optimal for the Frobenius norm among rotationally invariant estimators, i.e. estimators derived from the empirical estimator by cleaning the singular values, while letting singular vectors unchanged.

Many well-known matrices $Z$ are associated to fast transforms corresponding to factorizations of the form $Z = X^J \ldots X^1$, where each factor $X^\ell$ is sparse and possibly structured. This paper investigates essential uniqueness of such factorizations. Our first main contribution is to prove that any $N \times N$ matrix having the so-called butterfly structure admits a unique factorization into $J$ butterfly factors (where $N = 2^J$), and that the factors can be recovered by a hierarchical factorization method. This contrasts with existing approaches which fit the product of the butterfly factors to a given matrix via gradient descent. The proposed method can be applied in particular to retrieve the factorizations of the Hadamard or the Discrete Fourier Transform matrices of size $2^J$. Computing such factorizations costs $\mathcal{O}(N^2)$, which is of the order of dense matrix-vector multiplication, while the obtained factorizations enable fast $\mathcal{O}(N \log N)$ matrix-vector multiplications. This hierarchical identifiability property relies on a simple identifiability condition in the two-layer and fixed-support setting that was recently established. While the butterfly structure corresponds to a fixed prescribed support for each factor, our second contribution is to obtain identifiability results with more general families of allowed sparsity patterns, taking into account permutation ambiguities when they are unavoidable. Typically, we show through the hierarchical paradigm that the butterfly factorization of the Discrete Fourier Transform matrix of size $2^J$ admits a unique sparse factorization into $J$ factors, when enforcing only $2$-sparsity by column and a block-diagonal structure on each factor.

The matrix normal model, the family of Gaussian matrix-variate distributions whose covariance matrix is the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor models. We show nonasymptotic bounds for the error achieved by the maximum likelihood estimator (MLE) in several natural metrics. In contrast to existing bounds, our results do not rely on the factors being well-conditioned or sparse. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bound for the largest factor and overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that an iterative procedure to compute the MLE known as the flip-flop algorithm converges linearly with high probability. Our main tool is geodesic strong convexity in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels. We also provide numerical evidence that combining the flip-flop algorithm with a simple shrinkage estimator can improve performance in the undersampled regime.

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.

This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.

北京阿比特科技有限公司