亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We expand our effective framework for weak convergence of measures on the real line by showing that effective convergence in the Prokhorov metric is equivalent to effective weak convergence. In addition, we establish a framework for the study of the effective theory of vague convergence of measures. We introduce a uniform notion and a non-uniform notion of vague convergence, and we show that both these notions are equivalent. However, limits under effective vague convergence may not be computable even when they are finite. We give an example of a finite incomputable effective vague limit measure, and we provide a necessary and sufficient condition so that effective vague convergence produces a computable limit. Finally, we determine a sufficient condition for which effective weak and vague convergence of measures coincide. As a corollary, we obtain an effective version of the equivalence between classical weak and vague convergence of sequences of probability measures.

相關內容

We study the classical expander codes, introduced by Sipser and Spielman \cite{SS96}. Given any constants $0< \alpha, \varepsilon < 1/2$, and an arbitrary bipartite graph with $N$ vertices on the left, $M < N$ vertices on the right, and left degree $D$ such that any left subset $S$ of size at most $\alpha N$ has at least $(1-\varepsilon)|S|D$ neighbors, we show that the corresponding linear code given by parity checks on the right has distance at least roughly $\frac{\alpha N}{2 \varepsilon }$. This is strictly better than the best known previous result of $2(1-\varepsilon ) \alpha N$ \cite{Sudan2000note, Viderman13b} whenever $\varepsilon < 1/2$, and improves the previous result significantly when $\varepsilon $ is small. Furthermore, we show that this distance is tight in general, thus providing a complete characterization of the distance of general expander codes. Next, we provide several efficient decoding algorithms, which vastly improve previous results in terms of the fraction of errors corrected, whenever $\varepsilon < \frac{1}{4}$. Finally, we also give a bound on the list-decoding radius of general expander codes, which beats the classical Johnson bound in certain situations (e.g., when the graph is almost regular and the code has a high rate). Our techniques exploit novel combinatorial properties of bipartite expander graphs. In particular, we establish a new size-expansion tradeoff, which may be of independent interests.

We formulate em algorithm in the framework of Bregman divergence, which is a general problem setting of information geometry. That is, we address the minimization problem of the Bregman divergence between an exponential subfamily and a mixture subfamily in a Bregman divergence system. Then, we show the convergence and its speed under several conditions. We apply this algorithm to rate distortion and its variants including the quantum setting, and show the usefulness of our general algorithm.

The problem of linear predictions has been extensively studied for the past century under pretty generalized frameworks. Recent advances in the robust statistics literature allow us to analyze robust versions of classical linear models through the prism of Median of Means (MoM). Combining these approaches in a piecemeal way might lead to ad-hoc procedures, and the restricted theoretical conclusions that underpin each individual contribution may no longer be valid. To meet these challenges coherently, in this study, we offer a unified robust framework that includes a broad variety of linear prediction problems on a Hilbert space, coupled with a generic class of loss functions. Notably, we do not require any assumptions on the distribution of the outlying data points ($\mathcal{O}$) nor the compactness of the support of the inlying ones ($\mathcal{I}$). Under mild conditions on the dual norm, we show that for misspecification level $\epsilon$, these estimators achieve an error rate of $O(\max\left\{|\mathcal{O}|^{1/2}n^{-1/2}, |\mathcal{I}|^{1/2}n^{-1} \right\}+\epsilon)$, matching the best-known rates in literature. This rate is slightly slower than the classical rates of $O(n^{-1/2})$, indicating that we need to pay a price in terms of error rates to obtain robust estimates. Additionally, we show that this rate can be improved to achieve so-called ``fast rates" under additional assumptions.

Stochastic majorization-minimization (SMM) is an online extension of the classical principle of majorization-minimization, which consists of sampling i.i.d. data points from a fixed data distribution and minimizing a recursively defined majorizing surrogate of an objective function. In this paper, we introduce stochastic block majorization-minimization, where the surrogates can now be only block multi-convex and a single block is optimized at a time within a diminishing radius. Relaxing the standard strong convexity requirements for surrogates in SMM, our framework gives wider applicability including online CANDECOMP/PARAFAC (CP) dictionary learning and yields greater computational efficiency especially when the problem dimension is large. We provide an extensive convergence analysis on the proposed algorithm, which we derive under possibly dependent data streams, relaxing the standard i.i.d. assumption on data samples. We show that the proposed algorithm converges almost surely to the set of stationary points of a nonconvex objective under constraints at a rate $O((\log n)^{1+\eps}/n^{1/2})$ for the empirical loss function and $O((\log n)^{1+\eps}/n^{1/4})$ for the expected loss function, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+\eps}/n^{1/2})$. Our results provide first convergence rate bounds for various online matrix and tensor decomposition algorithms under a general Markovian data setting.

The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures. However, the topological, statistical, and computational consequences of this technique have not yet been well-established. In this paper, we aim at bridging this gap and derive various theoretical properties of sliced probability divergences. First, we show that slicing preserves the metric axioms and the weak continuity of the divergence, implying that the sliced divergence will share similar topological properties. We then precise the results in the case where the base divergence belongs to the class of integral probability metrics. On the other hand, we establish that, under mild conditions, the sample complexity of a sliced divergence does not depend on the problem dimension. We finally apply our general results to several base divergences, and illustrate our theory on both synthetic and real data experiments.

Many-user MAC is an important model for understanding energy efficiency of massive random access in 5G and beyond. Introduced in Polyanskiy'2017 for the AWGN channel, subsequent works have provided improved bounds on the asymptotic minimum energy-per-bit required to achieve a target per-user error at a given user density and payload, going beyond the AWGN setting. The best known rigorous bounds use spatially coupled codes along with the optimal AMP algorithm. But these bounds are infeasible to compute beyond a few (around 10) bits of payload. In this paper, we provide new achievability bounds for the many-user AWGN and quasi-static Rayleigh fading MACs using the spatially coupled codebook design along with a scalar AMP algorithm. The obtained bounds are computable even up to 100 bits and outperform the previous ones at this payload.

Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic ``expansion'' assumption, which states that a low-probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.

To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.

This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.

We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.

北京阿比特科技有限公司