欧美狂野视频一区国产精品-高清一级A爱免费视频

We develop a novel multiple hypothesis testing correction with family-wise error rate (FWER) control that efficiently exploits positive dependencies between potentially correlated statistical hypothesis tests. Our proposed algorithm $\texttt{max-rank}$ is conceptually straight-forward, relying on the use of a $\max$-operator in the rank domain of computed test statistics. We compare our approach to the frequently employed Bonferroni correction, theoretically and empirically demonstrating its superiority over Bonferroni in the case of existing positive dependency, and its equivalence otherwise. Our advantage over Bonferroni increases as the number of tests rises, and we maintain high statistical power whilst ensuring FWER control. We specifically frame our algorithm in the context of parallel permutation testing, a scenario that arises in our primary application of conformal prediction, a recently popularized approach for quantifying uncertainty in complex predictive settings.

相關內容

統計量

關注 3

類別 · INFORMS · 最優化 · CASE · 情景 ·

2024 年 1 月 10 日

A class of locally recoverable codes over finite chain rings

Giulia Cavicchioni,Eleonora Guerrini,Alessio Meneghetti

Locally recoverable codes deal with the task of reconstructing a lost symbol by relying on a portion of the remaining coordinates smaller than an information set. We consider the case of codes over finite chain rings, generalizing known results and bounds for codes over fields. In particular, we propose a new family of locally recoverable codes by extending a construction proposed in 2014 by Tamo and Barg, and we discuss its optimality.

MoDELS · Better · 噪聲 · 有向 · 近似 ·

2024 年 1 月 10 日

QuantNAS for super resolution: searching for efficient quantization-friendly architectures against quantization noise

Egor Shvetsov,Dmitry Osin,Alexey Zaytsev,Ivan Koryakovskiy,Valentin Buchnev,Ilya Trofimov,Evgeny Burnaev

There is a constant need for high-performing and computationally efficient neural network models for image super-resolution: computationally efficient models can be used via low-capacity devices and reduce carbon footprints. One way to obtain such models is to compress models, e.g. quantization. Another way is a neural architecture search that automatically discovers new, more efficient solutions. We propose a novel quantization-aware procedure, the QuantNAS that combines pros of these two approaches. To make QuantNAS work, the procedure looks for quantization-friendly super-resolution models. The approach utilizes entropy regularization, quantization noise, and Adaptive Deviation for Quantization (ADQ) module to enhance the search procedure. The entropy regularization technique prioritizes a single operation within each block of the search space. Adding quantization noise to parameters and activations approximates model degradation after quantization, resulting in a more quantization-friendly architectures. ADQ helps to alleviate problems caused by Batch Norm blocks in super-resolution models. Our experimental results show that the proposed approximations are better for search procedure than direct model quantization. QuantNAS discovers architectures with better PSNR/BitOps trade-off than uniform or mixed precision quantization of fixed architectures. We showcase the effectiveness of our method through its application to two search spaces inspired by the state-of-the-art SR models and RFDN. Thus, anyone can design a proper search space based on an existing architecture and apply our method to obtain better quality and efficiency. The proposed procedure is 30\% faster than direct weight quantization and is more stable.

估計/估計量 · 優化器 · 泛函 · 近似 · Processing（編程語言） ·

2024 年 1 月 9 日

Efficient estimation for ergodic diffusion processes sampled at high frequency

Michael S?rensen

A general theory of efficient estimation for ergodic diffusion processes sampled at high frequency with an infinite time horizon is presented. High frequency sampling is common in many applications, with finance as a prominent example. The theory is formulated in term of approximate martingale estimating functions and covers a large class of estimators including most of the previously proposed estimators for diffusion processes. Easily checked conditions ensuring that an estimating function is an approximate martingale are derived, and general conditions ensuring consistency and asymptotic normality of estimators are given. Most importantly, simple conditions are given that ensure rate optimality and efficiency. Rate optimal estimators of parameters in the diffusion coefficient converge faster than estimators of drift coefficient parameters because they take advantage of the information in the quadratic variation. The conditions facilitate the choice among the multitude of estimators that have been proposed for diffusion models. Optimal martingale estimating functions in the sense of Godambe and Heyde and their high frequency approximations are, under weak conditions, shown to satisfy the conditions for rate optimality and efficiency. This provides a natural feasible method of constructing explicit rate optimal and efficient estimating functions by solving a linear equation.

樣本 · 生成模型 · MoDELS · 情景 · Performer ·

2024 年 1 月 9 日

Stable generative modeling using diffusion maps

Georg Gottwald,Fengyi Li,Youssef Marzouk,Sebastian Reich

from arxiv, 23 pages, 25 figures

We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling. In this paper, we propose a generative model combining diffusion maps and Langevin dynamics. Diffusion maps are used to approximate the drift term from the available training samples, which is then implemented in a discrete-time Langevin sampler to generate new samples. By setting the kernel bandwidth to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with time-stepping stiff stochastic differential equations. More precisely, we introduce a novel split-step scheme, ensuring that the generated samples remain within the convex hull of the training samples. Our framework can be naturally extended to generate conditional samples. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets with increasing dimensions and on a stochastic subgrid-scale parametrization conditional sampling problem.

MoDELS · INTERACT · INFORMS · 樸素貝葉斯分類器 · 話題模型 ·

2024 年 1 月 9 日

Probabilistic emotion and sentiment modelling of patient-reported experiences

Curtis Murray,Lewis Mitchell,Jonathan Tuke,Mark Mackay

from arxiv, 23 pages, 10 figures, 5 tables

This study introduces a novel methodology for modelling patient emotions from online patient experience narratives. We employed metadata network topic modelling to analyse patient-reported experiences from Care Opinion, revealing key emotional themes linked to patient-caregiver interactions and clinical outcomes. We develop a probabilistic, context-specific emotion recommender system capable of predicting both multilabel emotions and binary sentiments using a naive Bayes classifier using contextually meaningful topics as predictors. The superior performance of our predicted emotions under this model compared to baseline models was assessed using the information retrieval metrics nDCG and Q-measure, and our predicted sentiments achieved an F1 score of 0.921, significantly outperforming standard sentiment lexicons. This method offers a transparent, cost-effective way to understand patient feedback, enhancing traditional collection methods and informing individualised patient care. Our findings are accessible via an R package and interactive dashboard, providing valuable tools for healthcare researchers and practitioners.

有偏 · Performer · 相關系數 · 得分 · 推斷 ·

2024 年 1 月 9 日

Adjusting for indirectly measured confounding using large-scale propensity scores

Linying Zhang,Yixin Wang,Martijn Schuemie,David Blei,George Hripcsak

Confounding remains one of the major challenges to causal inference with observational data. This problem is paramount in medicine, where we would like to answer causal questions from large observational datasets like electronic health records (EHRs) and administrative claims. Modern medical data typically contain tens of thousands of covariates. Such a large set carries hope that many of the confounders are directly measured, and further hope that others are indirectly measured through their correlation with measured covariates. How can we exploit these large sets of covariates for causal inference? To help answer this question, this paper examines the performance of the large-scale propensity score (LSPS) approach on causal analysis of medical data. We demonstrate that LSPS may adjust for indirectly measured confounders by including tens of thousands of covariates that may be correlated with them. We present conditions under which LSPS removes bias due to indirectly measured confounders, and we show that LSPS may avoid bias when inadvertently adjusting for variables (like colliders) that otherwise can induce bias. We demonstrate the performance of LSPS with both simulated medical data and real medical data.

預測器/決策函數 · INFORMS · 不變 · 可辨認的 · 原點 ·

2024 年 1 月 8 日

Simultaneous false discovery bounds for invariant causal prediction

Jinzhou Li

Invariant causal prediction (ICP, Peters et al. (2016)) provides a novel way to identify causal predictors of a response by utilizing heterogeneous data from different environments. One advantage of ICP is that it guarantees to make no false causal discoveries with high probability. Such a guarantee, however, can be too conservative in some applications, resulting in few or no discoveries. To address this, we propose simultaneous false discovery bounds for ICP, which provides users with extra flexibility in exploring causal predictors and can extract more informative results. These additional inferences come for free, in the sense that they do not require additional assumptions, and the same information obtained by the original ICP is retained. We demonstrate the practical usage of our method through simulations and a real dataset.

統計量 · 推斷 · MoDELS · 線性的 · Performer ·

2024 年 1 月 8 日

Inference on testing the number of spikes in a high-dimensional generalized two-sample spiked model and its applications

Rui Wang,Dandan Jiang

Two-sample spiked model is an important issue in multivariate statistical inference. This paper focuses on testing the number of spikes in a high-dimensional generalized two-sample spiked model, which is free of Gaussian population assumption and the diagonal or block-wise diagonal restriction of population covariance matrix, and the spiked eigenvalues are not necessary required to be bounded. In order to determine the number of spikes, we first propose a general test, which relies on the partial linear spectral statistics. We establish its asymptotic normality under the null hypothesis. Then we apply the conclusion to two statistical problem, variable selection in large-dimensional linear regression and change point detection when change points and additive outliers exist simultaneously. Simulations and empirical analysis are conducted to illustrate the good performance of our methods.

雅克比 · 查準率/準確率 · CASE · 操作 · 可約的 ·

2024 年 1 月 5 日

On the numerical reliability of nonsmooth autodiff: a MaxPool case study

Ryan Boustany

This paper considers the reliability of automatic differentiation (AD) for neural networks involving the nonsmooth MaxPool operation. We investigate the behavior of AD across different precision levels (16, 32, 64 bits) and convolutional architectures (LeNet, VGG, and ResNet) on various datasets (MNIST, CIFAR10, SVHN, and ImageNet). Although AD can be incorrect, recent research has shown that it coincides with the derivative almost everywhere, even in the presence of nonsmooth operations (such as MaxPool and ReLU). On the other hand, in practice, AD operates with floating-point numbers (not real numbers), and there is, therefore, a need to explore subsets on which AD can be numerically incorrect. These subsets include a bifurcation zone (where AD is incorrect over reals) and a compensation zone (where AD is incorrect over floating-point numbers but correct over reals). Using SGD for the training process, we study the impact of different choices of the nonsmooth Jacobian for the MaxPool function on the precision of 16 and 32 bits. These findings suggest that nonsmooth MaxPool Jacobians with lower norms help maintain stable and efficient test accuracy, whereas those with higher norms can result in instability and decreased performance. We also observe that the influence of MaxPool's nonsmooth Jacobians on learning can be reduced by using batch normalization, Adam-like optimizers, or increasing the precision level.

語音識別 · 有向 · 端到端 · Attention · 優化器 ·

2024 年 1 月 5 日

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Dongdi Zhao,Jianbo Ma,Lu Lu,Jinke Li,Xuan Ji,Lei Zhu,Fuming Fang,Ming Liu,Feijun Jiang

Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.