东京热加勒比中文无码_黄色真人AV在线_国产黄色视频入口_久久久久久久久精品中文字幕一区_国产人妻久久精品二区三区特_无码人妻一区二区三区色欲_久久99久久人婷婷精品综合

Sangwon Hyun,Aditya Mishra,Christopher L. Follett,Bror Jonsson,Gemma Kulk,Gael Forget,Marie-Fanny Racault,Thomas Jackson,Stephanie Dutkiewicz,Christian L. Müller,Jacob Bien

from arxiv, 6 figures

Modern ocean datasets are large, multi-dimensional, and inherently spatiotemporal. A common oceanographic analysis task is the comparison of such datasets along one or several dimensions of latitude, longitude, depth, time as well as across different data modalities. Here, we show that the Wasserstein distance, also known as earth mover's distance, provides a promising optimal transport metric for quantifying differences in ocean spatiotemporal data. The Wasserstein distance complements commonly used point-wise difference methods such as, e.g., the root mean squared error, by quantifying deviations in terms of apparent displacements (in distance units of space or time) rather than magnitudes of a measured quantity. Using large-scale gridded remote sensing and ocean simulation data of Chlorophyll concentration, a proxy for phytoplankton biomass, in the North Pacific, we show that the Wasserstein distance enables meaningful low-dimensional embeddings of marine seasonal cycles, provides oceanographically relevant summaries of Chlorophyll depth profiles and captures hitherto overlooked trends in the temporal variability of Chlorophyll in a warming climate. We also illustrate how the optimal transport vectors underlying the Wasserstein distance calculation can serve as a novel interpretable visual aid in other exploratory ocean data analysis tasks, e.g., in tracking ocean province boundaries across space and time.

相關內容

優化器(qi)

關注 4

核嶺回歸 · 核化 · MoDELS · 嶺回歸 · 泛函 ·

2022 年 1 月 20 日

Distributional data analysis of accelerometer data from the NHANES database using nonparametric survey regression models

Marcos Matabuena,Alexander Petersen

Accelerometers enable an objective measurement of physical activity levels among groups of individuals in free-living environments, providing high-resolution detail about physical activity changes at different time scales. Current approaches used in the literature for analyzing such data typically employ summary measures such as total inactivity time or compositional metrics. However, at the conceptual level, these methods have the potential disadvantage of discarding important information from recorded data when calculating these summaries and metrics since these typically depend on cut-offs related to exercise intensity zones chosen subjectively or even arbitrarily. Furthermore, much of the data collected in these studies follow complex survey designs. Then, using specific estimation strategies adapted to a particular sampling mechanism is mandatory. The aim of this paper is two-fold. First, a new functional representation of a distributional nature accelerometer data is introduced to build a complete individualized profile of each subject's physical activity levels. Second, we extend two nonparametric functional regression models, kernel smoothing and kernel ridge regression, to handle survey data and obtain reliable conclusions about the influence of physical activity in the different analyses performed in the complex sampling design NHANES cohort and so, show representation advantages.

PDE · Neural Networks · 優化器 · Networking · 近似誤差 ·

2022 年 1 月 20 日

Derivative-informed projected neural network for large-scale Bayesian optimal experimental design

Keyi Wu,Thomas O'Leary-Roseberry,Peng Chen,Omar Ghattas

We address the solution of large-scale Bayesian optimal experimental design (OED) problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields. The OED problem seeks to find sensor locations that maximize the expected information gain (EIG) in the solution of the underlying Bayesian inverse problem. Computation of the EIG is usually prohibitive for PDE-based OED problems. To make the evaluation of the EIG tractable, we approximate the (PDE-based) parameter-to-observable map with a derivative-informed projected neural network (DIPNet) surrogate, which exploits the geometry, smoothness, and intrinsic low-dimensionality of the map using a small and dimension-independent number of PDE solves. The surrogate is then deployed within a greedy algorithm-based solution of the OED problem such that no further PDE solves are required. We analyze the EIG approximation error in terms of the generalization error of the DIPNet, and demonstrate the efficiency and accuracy of the method via numerical experiments involving inverse scattering and inverse reactive transport.

估計/估計量 · Processing（編程語言） · 統計量 · 推斷 · 離散化 ·

2022 年 1 月 19 日

Adaptive inference for small diffusion processes based on sampled data

Tetsuya Kawai,Masayuki Uchida

from arxiv, 38 pages, 3 figures

We consider parametric estimation and tests for multi-dimensional diffusion processes with a small dispersion parameter $\varepsilon$ from discrete observations. For parametric estimation of diffusion processes, the main target is to estimate the drift parameter and the diffusion parameter. In this paper, we propose two types of adaptive estimators for both parameters and show their asymptotic properties under $\varepsilon\to0$, $n\to\infty$ and the balance condition that $(\varepsilon n^\rho)^{-1} =O(1)$ for some $\rho>0$. Using these adaptive estimators, we also introduce consistent adaptive testing methods and prove that test statistics for adaptive tests have asymptotic distributions under null hypothesis. In simulation studies, we examine and compare asymptotic behaviors of the two kinds of adaptive estimators and test statistics. Moreover, we treat the SIR model which describes a simple epidemic spread for a biological application.

GANs · MoDELS · 極大值 · 生成式對抗網絡 · 統計量 ·

2022 年 1 月 19 日

Modelling and simulating spatial extremes by combining extreme value theory with generative adversarial networks

Younes Boulaguiem,Jakob Zscheischler,Edoardo Vignotto,Karin van der Wiel,Sebastian Engelke

Modelling dependencies between climate extremes is important for climate risk assessment, for instance when allocating emergency management funds. In statistics, multivariate extreme value theory is often used to model spatial extremes. However, most commonly used approaches require strong assumptions and are either too simplistic or over-parameterized. From a machine learning perspective, Generative Adversarial Networks (GANs) are a powerful tool to model dependencies in high-dimensional spaces. Yet in the standard setting, GANs do not well represent dependencies in the extremes. Here we combine GANs with extreme value theory (evtGAN) to model spatial dependencies in summer maxima of temperature and winter maxima in precipitation over a large part of western Europe. We use data from a stationary 2000-year climate model simulation to validate the approach and explore its sensitivity to small sample sizes. Our results show that evtGAN outperforms classical GANs and standard statistical approaches to model spatial extremes. Already with about 50 years of data, which corresponds to commonly available climate records, we obtain reasonably good performance. In general, dependencies between temperature extremes are better captured than dependencies between precipitation extremes due to the high spatial coherence in temperature fields. Our approach can be applied to other climate variables and can be used to emulate climate models when running very long simulations to determine dependencies in the extremes is deemed infeasible.

規范化的 · 近似 · 可約的 · 標準正態分布 · 極大 ·

2022 年 1 月 19 日

Refined normal approximations for the central and noncentral chi-square distributions and some applications

Frédéric Ouimet

from arxiv, 20 pages, 2 figures

In this paper, we prove a local limit theorem for the chi-square distribution with $r > 0$ degrees of freedom and noncentrality parameter $\lambda \geq 0$. We use it to develop refined normal approximations for the survival function. Our maximal errors go down to an order of $r^{-2}$, which is significantly smaller than the maximal error bounds of order $r^{-1/2}$ recently found by Horgan & Murphy (2013) and Seri (2015). Our results allow us to drastically reduce the number of observations required to obtain negligible errors in the energy detection problem, from $250$, as recommended in the seminal work of Urkowitz (1967), to only $8$ here with our new approximations. We also obtain an upper bound on several probability metrics between the central and noncentral chi-square distributions and the standard normal distribution, and we obtain an approximation for the median that improves the lower bound previously obtained by Robert (1990).

優化器 · 小批量 · Better · 估計/估計量 · 無偏 ·

2021 年 3 月 5 日

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Kilian Fatras,Thibault Séjourné,Nicolas Courty,Rémi Flamary

Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.

基于上下文的表示 · 圖像字幕 · Performer · 泛化理論 · 相關系數 ·

2019 年 9 月 26 日

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Wei Zhao,Maxime Peyrard,Fei Liu,Yang Gao,Christian M. Meyer,Steffen Eger

from arxiv, EMNLP19 Camera-Ready

A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image captioning, and data-to-text generation, where the outputs are produced by a variety of neural and non-neural systems. Our findings suggest that metrics combining contextualized representations with a distance measure perform the best. Such metrics also demonstrate strong generalization capability across tasks. For ease-of-use we make our metrics available as web service.

似然 · 估計/估計量 · 最大似然估計 · 極大似然 · MoDELS ·

2018 年 9 月 24 日

Implicit Maximum Likelihood Estimation

Ke Li,Jitendra Malik

from arxiv, 21 pages, 4 figures. In the interest of promoting discussion, we make the reviews available at //people.eecs.berkeley.edu/~ke.li/papers/imle_reviews.pdf

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

話題模型 · Continuity · MoDELS · 話題 · Perplexity ·

2015 年 5 月 16 日

Continuous Time Dynamic Topic Models

Chong Wang,David Blei,David Heckerman

from arxiv, Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

In this paper, we develop the continuous time dynamic topic model (cDTM). The cDTM is a dynamic topic model that uses Brownian motion to model the latent topics through a sequential collection of documents, where a "topic" is a pattern of word use that we expect to evolve over the course of the collection. We derive an efficient variational approximate inference algorithm that takes advantage of the sparsity of observations in text, a property that lets us easily handle many time points. In contrast to the cDTM, the original discrete-time dynamic topic model (dDTM) requires that time be discretized. Moreover, the complexity of variational inference for the dDTM grows quickly as time granularity increases, a drawback which limits fine-grained discretization. We demonstrate the cDTM on two news corpora, reporting both predictive perplexity and the novel task of time stamp prediction.