顾美玲国产一区二区三区-欧美日韩一区不卡在线看片

Reliably measuring the collinearity of bivariate data is crucial in statistics, particularly for time-series analysis or ongoing studies in which incoming observations can significantly impact current collinearity estimates. Leveraging identities from Welford's online algorithm for sample variance, we develop a rigorous theoretical framework for analyzing the maximal change to the Pearson correlation coefficient and its p-value that can be induced by additional data. Further, we show that the resulting optimization problems yield elegant closed-form solutions that can be accurately computed by linear- and constant-time algorithms. Our work not only creates new theoretical avenues for robust correlation measures, but also has broad practical implications for disciplines that span econometrics, operations research, clinical trials, climatology, differential privacy, and bioinformatics. Software implementations of our algorithms in Cython-wrapped C are made available at //github.com/marc-harary/sensitivity for reproducibility, practical deployment, and future theoretical development.

相關內容

相關系數

關注 0

Learning · Cognition · Processing（編程語言） · Engineering · 論文 ·

2024 年 7 月 3 日

A Review of the Applications of Deep Learning-Based Emergent Communication

Brendon Boldt,David Mortensen

from arxiv, 49 pages, 15 figures

Emergent communication, or emergent language, is the field of research which studies how human language-like communication systems emerge de novo in deep multi-agent reinforcement learning environments. The possibilities of replicating the emergence of a complex behavior like language have strong intuitive appeal, yet it is necessary to complement this with clear notions of how such research can be applicable to other fields of science, technology, and engineering. This paper comprehensively reviews the applications of emergent communication research across machine learning, natural language processing, linguistics, and cognitive science. Each application is illustrated with a description of its scope, an explication of emergent communication's unique role in addressing it, a summary of the extant literature working towards the application, and brief recommendations for near-term research directions.

MoDELS · OCT · 去噪 · 生成模型 · Performer ·

2024 年 7 月 3 日

An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis

Marawan Elbatel,Konstantinos Kamnitsas,Xiaomeng Li

from arxiv, MICCAI 2024

Generative modeling seeks to approximate the statistical properties of real data, enabling synthesis of new data that closely resembles the original distribution. Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models (DDPMs) represent significant advancements in generative modeling, drawing inspiration from game theory and thermodynamics, respectively. Nevertheless, the exploration of generative modeling through the lens of biological evolution remains largely untapped. In this paper, we introduce a novel family of models termed Generative Cellular Automata (GeCA), inspired by the evolution of an organism from a single cell. GeCAs are evaluated as an effective augmentation tool for retinal disease classification across two imaging modalities: Fundus and Optical Coherence Tomography (OCT). In the context of OCT imaging, where data is scarce and the distribution of classes is inherently skewed, GeCA significantly boosts the performance of 11 different ophthalmological conditions, achieving a 12% increase in the average F1 score compared to conventional baselines. GeCAs outperform both diffusion methods that incorporate UNet or state-of-the art variants with transformer-based denoising models, under similar parameter constraints. Code is available at: //github.com/xmed-lab/GeCA.

變換 · 可理解性 · MoDELS · 層 · 近似 ·

2024 年 7 月 3 日

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

Mingze Wang,Weinan E

from arxiv, 70 pages

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.

連結 · 評論員 · 講稿 · 設計 · 表示 ·

2024 年 7 月 2 日

Dynamically Maintaining the Persistent Homology of Time Series

Sebastiano Cultrera di Montesano,Herbert Edelsbrunner,Monika Henzinger,Lara Ost

from arxiv, Corrected the statement and proof of Theorem 5.2; added a missing edge-case to the anti-cancellation algorithm

We present a dynamic data structure for maintaining the persistent homology of a time series of real numbers. The data structure supports local operations, including the insertion and deletion of an item and the cutting and concatenating of lists, each in time $O(\log n + k)$, in which $n$ counts the critical items and $k$ the changes in the augmented persistence diagram. To achieve this, we design a tailor-made tree structure with an unconventional representation, referred to as banana tree, which may be useful in its own right.

蒙特卡羅 · tuning · 優化器 · 調參 · 穩健性 ·

2024 年 7 月 1 日

Parameter Tuning of the Firefly Algorithm by Standard Monte Carlo and Quasi-Monte Carlo Methods

Geethu Joy,Christian Huyck,Xin-She Yang

from arxiv, International Conference on Computational Science (ICCS2024)

Almost all optimization algorithms have algorithm-dependent parameters, and the setting of such parameter values can significantly influence the behavior of the algorithm under consideration. Thus, proper parameter tuning should be carried out to ensure that the algorithm used for optimization performs well and is sufficiently robust for solving different types of optimization problems. In this study, the Firefly Algorithm (FA) is used to evaluate the influence of its parameter values on its efficiency. Parameter values are randomly initialized using both the standard Monte Carlo method and the Quasi Monte-Carlo method. The values are then used for tuning the FA. Two benchmark functions and a spring design problem are used to test the robustness of the tuned FA. From the preliminary findings, it can be deduced that both the Monte Carlo method and Quasi-Monte Carlo method produce similar results in terms of optimal fitness values. Numerical experiments using the two different methods on both benchmark functions and the spring design problem showed no major variations in the final fitness values, irrespective of the different sample values selected during the simulations. This insensitivity indicates the robustness of the FA.

數據集 · 流形 · 優化器 · 潛在 · Principle ·

2024 年 7 月 1 日

Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets

Boris Landa,Yuval Kluger,Rong Ma

Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studies or experimental conditions. Such datasets may share underlying structures of interest but exhibit individual distortions, resulting in misaligned embeddings using traditional techniques. In this work, we propose \textit{Entropic Optimal Transport (EOT) eigenmaps}, a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees. Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure and align the datasets accordingly in a common embedding space. We interpret our approach as an inter-data variant of the classical Laplacian eigenmaps and diffusion maps embeddings, showing that it enjoys many favorable analogous properties. We then analyze a data-generative model where two observed high-dimensional datasets share latent variables on a common low-dimensional manifold, but each dataset is subject to data-specific translation, scaling, nuisance structures, and noise. We show that in a high-dimensional asymptotic regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables. Subsequently, we provide a geometric interpretation of our embedding by relating it to the eigenfunctions of population-level operators encoding the density and geometry of the shared manifold. Finally, we showcase the performance of our approach for data integration and embedding through simulations and analyses of real-world biological data, demonstrating its advantages over alternative methods in challenging scenarios.

估計/估計量 · 線性的 · 高斯混合（模型） · 均值 · Analysis ·

2024 年 7 月 1 日

Linear and Nonlinear MMSE Estimation in One-Bit Quantized Systems under a Gaussian Mixture Prior

Benedikt Fesl,Wolfgang Utschick

We present new fundamental results for the mean square error (MSE)-optimal conditional mean estimator (CME) in one-bit quantized systems for a Gaussian mixture model (GMM) distributed signal of interest, possibly corrupted by additive white Gaussian noise (AWGN). We first derive novel closed-form analytic expressions for the Bussgang estimator, the well-known linear minimum mean square error (MMSE) estimator in quantized systems. Afterward, closed-form analytic expressions for the CME in special cases are presented, revealing that the optimal estimator is linear in the one-bit quantized observation, opposite to higher resolution cases. Through a comparison to the recently studied Gaussian case, we establish a novel MSE inequality and show that that the signal of interest is correlated with the auxiliary quantization noise. We extend our analysis to multiple observation scenarios, examining the MSE-optimal transmit sequence and conducting an asymptotic analysis, yielding analytic expressions for the MSE and its limit. These contributions have broad impact for the analysis and design of various signal processing applications.

有限差分 · 確切的 · Analysis · 線性的 · 樣例 ·

2024 年 6 月 29 日

Stability and Convergence Analysis of an Exact Finite Difference Scheme for Fredholm Integro-Differential Equations

Mehebub Alam,Rajni Kant Pandey

This report addresses the boundary value problem for a second-order linear singularly perturbed FIDE. Traditional methods for solving these equations often face stability issues when dealing with small perturbation parameters. We propose an exact finite difference method to solve these equations and provide a detailed stability and $\varepsilon$-uniform convergence analysis. Our approach is validated with an example, demonstrating its uniform convergence and applicability, with a convergence order of 1. The results illustrate the method's robustness in handling perturbation effects efficiently.

可辨認的 · Integration · 潛在 · 優化器 · 可約的 ·

2024 年 6 月 27 日

Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data

Yubai Yuan,Babak Shahbaba,Norbert Fortin,Keiland Cooper,Qing Nie,Annie Qu

Detecting dynamic patterns of task-specific responses shared across heterogeneous datasets is an essential and challenging problem in many scientific applications in medical science and neuroscience. In our motivating example of rodent electrophysiological data, identifying the dynamical patterns in neuronal activity associated with ongoing cognitive demands and behavior is key to uncovering the neural mechanisms of memory. One of the greatest challenges in investigating a cross-subject biological process is that the systematic heterogeneity across individuals could significantly undermine the power of existing machine learning methods to identify the underlying biological dynamics. In addition, many technically challenging neurobiological experiments are conducted on only a handful of subjects where rich longitudinal data are available for each subject. The low sample sizes of such experiments could further reduce the power to detect common dynamic patterns among subjects. In this paper, we propose a novel heterogeneous data integration framework based on optimal transport to extract shared patterns in complex biological processes. The key advantages of the proposed method are that it can increase discriminating power in identifying common patterns by reducing heterogeneity unrelated to the signal by aligning the extracted latent spatiotemporal information across subjects. Our approach is effective even with a small number of subjects, and does not require auxiliary matching information for the alignment. In particular, our method can align longitudinal data across heterogeneous subjects in a common latent space to capture the dynamics of shared patterns while utilizing temporal dependency within subjects.

MoDELS · 語言模型化 · Learning · 逼真度 · 講稿 ·

2024 年 4 月 11 日

Best Practices and Lessons Learned on Synthetic Data for Language Models

Ruibo Liu,Jerry Wei,Fangyu Liu,Chenglei Si,Yanzhe Zhang,Jinmeng Rao,Steven Zheng,Daiyi Peng,Diyi Yang,Denny Zhou,Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.