亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the basic statistical problem of testing whether normally distributed $n$-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set $S \subseteq \mathbb{R}^n$. As our main algorithmic results, (1) We give a computationally efficient $O(n)$-sample algorithm that can distinguish the standard normal distribution $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary convex set $S$. (2) We give a different computationally efficient $O(n)$-sample algorithm that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary mixture of symmetric convex sets. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially $n^{\sqrt{n}}$ samples. An easy argument shows that no finite number of samples suffices to distinguish $N(0,I_n)$ from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown symmetric convex set must use $\Omega(n)$ samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.

相關內容

We present a new approach to semiparametric inference using corrected posterior distributions. The method allows us to leverage the adaptivity, regularization and predictive power of nonparametric Bayesian procedures to estimate low-dimensional functionals of interest without being restricted by the holistic Bayesian formalism. Starting from a conventional nonparametric posterior, we target the functional of interest by transforming the entire distribution with a Bayesian bootstrap correction. We provide conditions for the resulting $\textit{one-step posterior}$ to possess calibrated frequentist properties and specialize the results for several canonical examples: the integrated squared density, the mean of a missing-at-random outcome, and the average causal treatment effect on the treated. The procedure is computationally attractive, requiring only a simple, efficient post-processing step that can be attached onto any arbitrary posterior sampling algorithm. Using the ACIC 2016 causal data analysis competition, we illustrate that our approach can outperform the existing state-of-the-art through the propagation of Bayesian uncertainty.

In his seminal paper "Computing Machinery and Intelligence", Alan Turing introduced the "imitation game" as part of exploring the concept of machine intelligence. The Turing Test has since been the subject of much analysis, debate, refinement and extension. Here we sidestep the question of whether a particular machine can be labeled intelligent, or can be said to match human capabilities in a given context. Instead, but inspired by Turing, we draw attention to the seemingly simpler challenge of determining whether one is interacting with a human or with a machine, in the context of everyday life. We are interested in reflecting upon the importance of this Human-or-Machine question and the use one may make of a reliable answer thereto. Whereas Turing's original test is widely considered to be more of a thought experiment, the Human-or-Machine question as discussed here has obvious practical significance. And while the jury is still not in regarding the possibility of machines that can mimic human behavior with high fidelity in everyday contexts, we argue that near-term exploration of the issues raised here can contribute to development methods for computerized systems, and may also improve our understanding of human behavior in general.

We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out parameters, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using this estimate for the group-wise drop-out rate, we construct a re-weighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. Finally, we present an algorithm encapsulating this learning and re-weighting process, and we provide strong PAC-style guarantees that, with high probability, our estimate of the risk of the hypothesis over the true distribution will be arbitrarily close to the true risk.

This paper considers an ML inspired approach to hypothesis testing known as classifier/classification-accuracy testing ($\mathsf{CAT}$). In $\mathsf{CAT}$, one first trains a classifier by feeding it labeled synthetic samples generated by the null and alternative distributions, which is then used to predict labels of the actual data samples. This method is widely used in practice when the null and alternative are only specified via simulators (as in many scientific experiments). We study goodness-of-fit, two-sample ($\mathsf{TS}$) and likelihood-free hypothesis testing ($\mathsf{LFHT}$), and show that $\mathsf{CAT}$ achieves (near-)minimax optimal sample complexity in both the dependence on the total-variation ($\mathsf{TV}$) separation $\epsilon$ and the probability of error $\delta$ in a variety of non-parametric settings, including discrete distributions, $d$-dimensional distributions with a smooth density, and the Gaussian sequence model. In particular, we close the high probability sample complexity of $\mathsf{LFHT}$ for each class. As another highlight, we recover the minimax optimal complexity of $\mathsf{TS}$ over discrete distributions, which was recently established by Diakonikolas et al. (2021). The corresponding $\mathsf{CAT}$ simply compares empirical frequencies in the first half of the data, and rejects the null when the classification accuracy on the second half is better than random.

Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence (and thus lower accuracy), as well as more severe mis-calibration than its non-private counterpart. To analyze the convergence of DP training, we formulate a continuous time analysis through the lens of neural tangent kernel (NTK), which characterizes the per-sample gradient clipping and the noise addition in DP training, for arbitrary network architectures and loss functions. Interestingly, we show that the noise addition only affects the privacy risk but not the convergence or calibration, whereas the per-sample gradient clipping (under both flat and layerwise clipping styles) only affects the convergence and calibration. Furthermore, we observe that while DP models trained with small clipping norm usually achieve the best accurate, but are poorly calibrated and thus unreliable. In sharp contrast, DP models trained with large clipping norm enjoy the same privacy guarantee and similar accuracy, but are significantly more \textit{calibrated}. Our code can be found at \url{//github.com/woodyx218/opacus_global_clipping}.

The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ in which $\Phi$ is a geometrically ergodic Markov chain on a general state space $\textsf{X}$ with stationary distribution $\pi$, and $f:\Re^d\times\textsf{X}\to\Re^d$. The main results are established under a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3), and a stability condition for the mean flow with vector field $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$. (i) $\{ \theta_n\}$ is convergent a.s. and in $L_4$ to the unique root $\theta^*$ of $\bar{f}(\theta)$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. (iii) The CLT holds for the normalized version, $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, of the averaged parameters, $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the normalized covariance converges, $$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$ where $\Sigma_\theta^*$ is the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain $\Phi$ is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$.

In this study, we examine numerical approximations for 2nd-order linear-nonlinear differential equations with diverse boundary conditions, followed by the residual corrections of the first approximations. We first obtain numerical results using the Galerkin weighted residual approach with Bernstein polynomials. The generation of residuals is brought on by the fact that our first approximation is computed using numerical methods. To minimize these residuals, we use the compact finite difference scheme of 4th-order convergence to solve the error differential equations in accordance with the error boundary conditions. We also introduce the formulation of the compact finite difference method of fourth-order convergence for the nonlinear BVPs. The improved approximations are produced by adding the error values derived from the approximations of the error differential equation to the weighted residual values. Numerical results are compared to the exact solutions and to the solutions available in the published literature to validate the proposed scheme, and high accuracy is achieved in all cases

We consider the problem of clustering data points coming from sub-Gaussian mixtures. Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are usually vulnerable to outliers. In contrast, clustering methods seemingly robust to adversarial perturbations are not known to satisfy the optimal statistical guarantees. We propose a simple algorithm that obtains the optimal mislabeling rate even when we allow adversarial outliers to be present. Our algorithm achieves the optimal error rate in constant iterations when a weak initialization condition is satisfied. In the absence of outliers, in fixed dimensions, our theoretical guarantees are similar to that of the Lloyd algorithm. Extensive experiments on various simulated data sets are conducted to support the theoretical guarantees of our method.

We consider the multiple testing of the general regression framework aiming at studying the relationship between a univariate response and a p-dimensional predictor. To test the hypothesis of the effect of each predictor, we construct an Angular Balanced Statistic (ABS) based on the estimator of the sliced inverse regression without assuming a model of the conditional distribution of the response. According to the developed limiting distribution results in this paper, we have shown that ABS is asymptotically symmetric with respect to zero under the null hypothesis. We then propose a Model-free multiple Testing procedure using Angular balanced statistics (MTA) and show theoretically that the false discovery rate of this method is less than or equal to a designated level asymptotically. Numerical evidence has shown that the MTA method is much more powerful than its alternatives, subject to the control of the false discovery rate.

There has a major problem in the current theory of hypothesis testing in which no unified indicator to evaluate the goodness of various test methods since the cost function or utility function usually relies on the specific application scenario, resulting in no optimal hypothesis testing method. In this paper, the problem of optimal hypothesis testing is investigated based on information theory. We propose an information-theoretic framework of hypothesis testing consisting of five parts: test information (TI) is proposed to evaluate the hypothesis testing, which depends on the a posteriori probability distribution function of hypotheses and independent of specific test methods; accuracy with the unit of bit is proposed to evaluate the degree of validity of specific test methods; the sampling a posteriori (SAP) probability test method is presented, which makes stochastic selections on the hypotheses according to the a posteriori probability distribution of the hypotheses; the probability of test failure is defined to reflect the probability of the failed decision is made; test theorem is proved that all accuracy lower than the TI is achievable. Specifically, for every accuracy lower than TI, there exists a test method with the probability of test failure tending to zero. Conversely, there is no test method whose accuracy is more than TI. Numerical simulations are performed to demonstrate that the SAP test is asymptotically optimal. In addition, the results show that the accuracy of the SAP test and the existing test methods, such as the maximum a posteriori probability, expected a posteriori probability, and median a posteriori probability tests, are not more than TI.

北京阿比特科技有限公司