亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Rough path theory provides one with the notion of signature, a graded family of tensors which characterise, up to a negligible equivalence class, and ordered stream of vector-valued data. In the last few years, use of the signature has gained traction in time-series analysis, machine learning , deep learning and more recently in kernel methods. In this article, we lay down the theoretical foundations for a connection between signature asymptotics, the theory of empirical processes, and Wasserstein distances, opening up the landscape and toolkit of the second and third in the study of the first. Our main contribution is to show that the Hambly-Lyons limit can be reinterpreted as a statement about the asymptotic behaviour of Wasserstein distances between two independent empirical measures of samples from the same underlying distribution. In the setting studied here, these measures are derived from samples from a probability distribution which is determined by geometrical properties of the underlying path. The general question of rates of convergence for these objects has been studied in depth in the recent monograph of Bobkov and Ledoux. By using these results, we generalise the original result of Hambly and Lyons from $C^3$ curves to a broad class of $C^2$ ones. We conclude by providing an explicit way to compute the limit in terms of a second-order differential equation.

相關內容

We revisit the problem of computing with noisy information considered in Feige et al. 1994, which includes computing the OR function from noisy queries, and computing the MAX, SEARCH and SORT functions from noisy pairwise comparisons. For $K$ given elements, the goal is to correctly recover the desired function with probability at least $1-\delta$ when the outcome of each query is flipped with probability $p$. We consider both the adaptive sampling setting where each query can be adaptively designed based on past outcomes, and the non-adaptive sampling setting where the query cannot depend on past outcomes. The prior work provides tight bounds on the worst-case query complexity in terms of the dependence on $K$. However, the upper and lower bounds do not match in terms of the dependence on $\delta$ and $p$. We improve the lower bounds for all the four functions under both adaptive and non-adaptive query models. Most of our lower bounds match the upper bounds up to constant factors when either $p$ or $\delta$ is bounded away from $0$, while the ratio between the best prior upper and lower bounds goes to infinity when $p\rightarrow 0$ or $p\rightarrow 1/2$. On the other hand, we also provide matching upper and lower bounds for the number of queries in expectation, improving both the upper and lower bounds for the variable-length query model.

In compact settings, the convergence rate of the empirical optimal transport cost to its population value is well understood for a wide class of spaces and cost functions. In unbounded settings, however, hitherto available results require strong assumptions on the ground costs and the concentration of the involved measures. In this work, we pursue a decomposition-based approach to generalize the convergence rates found in compact spaces to unbounded settings under generic moment assumptions that are sharp up to an arbitrarily small $\epsilon > 0$. Hallmark properties of empirical optimal transport on compact spaces, like the recently established adaptation to lower complexity, are shown to carry over to the unbounded case.

Bayesian model-averaged meta-analysis allows quantification of evidence for both treatment effectiveness $\mu$ and across-study heterogeneity $\tau$. We use the Cochrane Database of Systematic Reviews to develop discipline-wide empirical prior distributions for $\mu$ and $\tau$ for meta-analyses of binary and time-to-event clinical trial outcomes. First, we use 50% of the database to estimate parameters of different required parametric families. Second, we use the remaining 50% of the database to select the best-performing parametric families and explore essential assumptions about the presence or absence of the treatment effectiveness and across-study heterogeneity in real data. We find that most meta-analyses of binary outcomes are more consistent with the absence of the meta-analytic effect or heterogeneity while meta-analyses of time-to-event outcomes are more consistent with the presence of the meta-analytic effect or heterogeneity. Finally, we use the complete database - with close to half a million trial outcomes - to propose specific empirical prior distributions, both for the field in general and for specific medical subdisciplines. An example from acute respiratory infections demonstrates how the proposed prior distributions can be used to conduct a Bayesian model-averaged meta-analysis in the open-source software R and JASP.

Gaussian Process Networks (GPNs) are a class of directed graphical models which employ Gaussian processes as priors for the conditional expectation of each variable given its parents in the network. The model allows describing continuous joint distributions in a compact but flexible manner with minimal parametric assumptions on the dependencies between variables. Bayesian structure learning of GPNs requires computing the posterior over graphs of the network and is computationally infeasible even in low dimensions. This work implements Monte Carlo and Markov Chain Monte Carlo methods to sample from the posterior distribution of network structures. As such, the approach follows the Bayesian paradigm, comparing models via their marginal likelihood and computing the posterior probability of the GPN features. Simulation studies show that our method outperforms state-of-the-art algorithms in recovering the graphical structure of the network and provides an accurate approximation of its posterior distribution.

Large-scale administrative or observational datasets are increasingly used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arise as that selection bias and other forms of distribution shift often plague observational data. Previous attempts to provide robust inferences have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of shifts which are possible. To leverage such information, we proposed a framework that enables statistical inference in the presence of distribution shifts which obey user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value an estimand takes on the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds, and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks.

In extreme value theory and other related risk analysis fields, probability weighted moments (PWM) have been frequently used to estimate the parameters of classical extreme value distributions. This method-of-moment technique can be applied when second moments are finite, a reasonable assumption in many environmental domains like climatological and hydrological studies. Three advantages of PWM estimators can be put forward: their simple interpretations, their rapid numerical implementation and their close connection to the well-studied class of U-statistics. Concerning the later, this connection leads to precise asymptotic properties, but non asymptotic bounds have been lacking when off-the-shelf techniques (Chernoff method) cannot be applied, as exponential moment assumptions become unrealistic in many extreme value settings. In addition, large values analysis is not immune to the undesirable effect of outliers, for example, defective readings in satellite measurements or possible anomalies in climate model runs. Recently, the treatment of outliers has sparked some interest in extreme value theory, but results about finite sample bounds in a robust extreme value theory context are yet to be found, in particular for PWMs or tail index estimators. In this work, we propose a new class of robust PWM estimators, inspired by the median-of-means framework of Devroye et al. (2016). This class of robust estimators is shown to satisfy a sub-Gaussian inequality when the assumption of finite second moments holds. Such non asymptotic bounds are also derived under the general contamination model. Our main proposition confirms theoretically a trade-off between efficiency and robustness. Our simulation study indicates that, while classical estimators of PWMs can be highly sensitive to outliers.

Gaussian processes (GPs) are widely-used tools in spatial statistics and machine learning and the formulae for the mean function and covariance kernel of a GP $v$ that is the image of another GP $u$ under a linear transformation $T$ acting on the sample paths of $u$ are well known, almost to the point of being folklore. However, these formulae are often used without rigorous attention to technical details, particularly when $T$ is an unbounded operator such as a differential operator, which is common in several modern applications. This note provides a self-contained proof of the claimed formulae for the case of a closed, densely-defined operator $T$ acting on the sample paths of a square-integrable stochastic process. Our proof technique relies upon Hille's theorem for the Bochner integral of a Banach-valued random variable.

Reliable probabilistic primality tests are fundamental in public-key cryptography. In adversarial scenarios, a composite with a high probability of passing a specific primality test could be chosen. In such cases, we need worst-case error estimates for the test. However, in many scenarios the numbers are randomly chosen and thus have significantly smaller error probability. Therefore, we are interested in average case error estimates. In this paper, we establish such bounds for the strong Lucas primality test, as only worst-case, but no average case error bounds, are currently available. This allows us to use this test with more confidence. We examine an algorithm that draws odd $k$-bit integers uniformly and independently, runs $t$ independent iterations of the strong Lucas test with randomly chosen parameters, and outputs the first number that passes all $t$ consecutive rounds. We attain numerical upper bounds on the probability on returing a composite. Furthermore, we consider a modified version of this algorithm that excludes integers divisible by small primes, resulting in improved bounds. Additionally, we classify the numbers that contribute most to our estimate.

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

北京阿比特科技有限公司