亚洲色偷偷色噜噜狠狠99网VR,18GAY国产小鲜肉可播放

Motivated by the simultaneous association analysis with the presence of latent confounders, this paper studies the large-scale hypothesis testing problem for the high-dimensional confounded linear models with both non-asymptotic and asymptotic false discovery control. Such model covers a wide range of practical settings where both the response and the predictors may be confounded. In the presence of the high-dimensional predictors and the unobservable confounders, the simultaneous inference with provable guarantees becomes highly challenging, and the unknown strong dependence among the confounded covariates makes the challenge even more pronounced. This paper first introduces a decorrelating procedure that shrinks the confounding effect and weakens the correlations among the predictors, then performs debiasing under the decorrelated design based on some biased initial estimator. Following that, an asymptotic normality result for the debiased estimator is established and standardized test statistics are then constructed. Furthermore, a simultaneous inference procedure is proposed to identify significant associations, and both the finite-sample and asymptotic false discovery bounds are provided. The non-asymptotic result is general and model-free, and is of independent interest. We also prove that, under minimal signal strength condition, all associations can be successfully detected with probability tending to one. Simulation and real data studies are carried out to evaluate the performance of the proposed approach and compare it with other competing methods.

相關內容

推斷

關注 5

Learning · 優化器 · 模型評估 · 聯邦學習 · Processing（編程語言） ·

2023 年 10 月 11 日

Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing

Minh Ngoc Luu,Minh-Duong Nguyen,Ebrahim Bedeer,Van Duc Nguyen,Dinh Thai Hoang,Diep N. Nguyen,Quoc-Viet Pham

from arxiv, 17 pages, 5 figures

In the domain of Federated Learning (FL) systems, recent cutting-edge methods heavily rely on ideal conditions convergence analysis. Specifically, these approaches assume that the training datasets on IoT devices possess similar attributes to the global data distribution. However, this approach fails to capture the full spectrum of data characteristics in real-time sensing FL systems. In order to overcome this limitation, we suggest a new approach system specifically designed for IoT networks with real-time sensing capabilities. Our approach takes into account the generalization gap due to the user's data sampling process. By effectively controlling this sampling process, we can mitigate the overfitting issue and improve overall accuracy. In particular, We first formulate an optimization problem that harnesses the sampling process to concurrently reduce overfitting while maximizing accuracy. In pursuit of this objective, our surrogate optimization problem is adept at handling energy efficiency while optimizing the accuracy with high generalization. To solve the optimization problem with high complexity, we introduce an online reinforcement learning algorithm, named Sample-driven Control for Federated Learning (SCFL) built on the Soft Actor-Critic (A2C) framework. This enables the agent to dynamically adapt and find the global optima even in changing environments. By leveraging the capabilities of SCFL, our system offers a promising solution for resource allocation in FL systems with real-time sensing capabilities.

帶符號距離 · Networking · 塑造 · 推斷 · 前向 ·

2023 年 10 月 10 日

Zero-Level-Set Encoder for Neural Distance Fields

Stefan Rhys Jeske,Jonathan Klein,Dominik L. Michels,Jan Bender

Neural shape representation generally refers to representing 3D geometry using neural networks, e.g., to compute a signed distance or occupancy value at a specific spatial position. Previous methods tend to rely on the auto-decoder paradigm, which often requires densely-sampled and accurate signed distances to be known during training and testing, as well as an additional optimization loop during inference. This introduces a lot of computational overhead, in addition to having to compute signed distances analytically, even during testing. In this paper, we present a novel encoder-decoder neural network for embedding 3D shapes in a single forward pass. Our architecture is based on a multi-scale hybrid system incorporating graph-based and voxel-based components, as well as a continuously differentiable decoder. Furthermore, the network is trained to solve the Eikonal equation and only requires knowledge of the zero-level set for training and inference. Additional volumetric samples can be generated on-the-fly, and incorporated in an unsupervised manner. This means that in contrast to most previous work, our network is able to output valid signed distance fields without explicit prior knowledge of non-zero distance values or shape occupancy. In other words, our network computes approximate solutions to the boundary-valued Eikonal equation. It also requires only a single forward pass during inference, instead of the common latent code optimization. We further propose a modification of the loss function in case that surface normals are not well defined, e.g., in the context of non-watertight surface-meshes and non-manifold geometry. We finally demonstrate the efficacy, generalizability and scalability of our method on datasets consisting of deforming 3D shapes, single class encoding and multiclass encoding, showcasing a wide range of possible applications.

估計/估計量 · 回合 · Kronecker積 · 清華大學智能產業研究院 · 預測器/決策函數 ·

2023 年 10 月 9 日

Cokrig-and-Regress for Spatially Misaligned Environmental Data

Z. Y. Tho,F. K. C. Hui,A. H. Welsh,T. Zou

Spatially misaligned data, where the response and covariates are observed at different spatial locations, commonly arise in many environmental studies. Much of the statistical literature on handling spatially misaligned data has been devoted to the case of a single covariate and a linear relationship between the response and this covariate. Motivated by spatially misaligned data collected on air pollution and weather in China, we propose a cokrig-and-regress (CNR) method to estimate spatial regression models involving multiple covariates and potentially non-linear associations. The CNR estimator is constructed by replacing the unobserved covariates (at the response locations) by their cokriging predictor derived from the observed but misaligned covariates under a multivariate Gaussian assumption, where a generalized Kronecker product covariance is used to account for spatial correlations within and between covariates. A parametric bootstrap approach is employed to bias-correct the CNR estimates of the spatial covariance parameters and for uncertainty quantification. Simulation studies demonstrate that CNR outperforms several existing methods for handling spatially misaligned data, such as nearest-neighbor interpolation. Applying CNR to the spatially misaligned air pollution and weather data in China reveals a number of non-linear relationships between PM$_{2.5}$ concentration and several meteorological covariates.

解碼 · 穩健性 · MoDELS · FAST · 語言模型化 ·

2023 年 10 月 9 日

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding

Sangmin Bae,Jongwoo Ko,Hwanjun Song,Se-Young Yun

from arxiv, EMNLP 2023 (Long)

To tackle the high inference latency exhibited by autoregressive language models, previous studies have proposed an early-exiting framework that allocates adaptive computation paths for each token based on the complexity of generating the subsequent token. However, we observed several shortcomings, including performance degradation caused by a state copying mechanism or numerous exit paths, and sensitivity to exit confidence thresholds. Consequently, we propose a Fast and Robust Early-Exiting (FREE) framework, which incorporates a shallow-deep module and a synchronized parallel decoding. Our framework enables faster inference by synchronizing the decoding process of the current token with previously stacked early-exited tokens. Furthermore, as parallel decoding allows us to observe predictions from both shallow and deep models, we present a novel adaptive threshold estimator that exploits a Beta mixture model to determine suitable confidence thresholds. We empirically demonstrated the superiority of our proposed framework on extensive generation tasks.

語音識別 · 表示 · 端到端 · 模態 · Continuity ·

2023 年 10 月 7 日

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Jiaxu Zhu,Weinan Tong,Yaoxun Xu,Changhe Song,Zhiyong Wu,Zhao You,Dan Su,Dong Yu,Helen Meng

from arxiv, Proceedings of Interspeech. arXiv admin note: text overlap with arXiv:2309.01437

Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains. However, the length of speech representation and text representation is inconsistent. Although the previous method up-samples the text representation to align with acoustic modality, it may not match the expected actual duration. In this paper, we proposed novel representations match strategy through down-sampling acoustic representation to align with text modality. By introducing a continuous integrate-and-fire (CIF) module generating acoustic representations consistent with token length, our ASR model can learn unified representations from both modalities better, allowing for domain adaptation using text-only data of the target domain. Experiment results of new domain data demonstrate the effectiveness of the proposed method.

Weight · 估計/估計量 · 分解的 · 推斷 · Nuance ·

2023 年 10 月 7 日

Balancing Weights for Causal Inference in Observational Factorial Studies

Ruoqi Yu,Peng Ding

Many scientific questions in biomedical, environmental, and psychological research involve understanding the impact of multiple factors on outcomes. While randomized factorial experiments are ideal for this purpose, randomization is infeasible in many empirical studies. Therefore, investigators often rely on observational data, where drawing reliable causal inferences for multiple factors remains challenging. As the number of treatment combinations grows exponentially with the number of factors, some treatment combinations can be rare or even missing by chance in observed data, further complicating factorial effects estimation. To address these challenges, we propose a novel weighting method tailored to observational studies with multiple factors. Our approach uses weighted observational data to emulate a randomized factorial experiment, enabling simultaneous estimation of the effects of multiple factors and their interactions. Our investigations reveal a crucial nuance: achieving balance among covariates, as in single-factor scenarios, is necessary but insufficient for unbiasedly estimating factorial effects. Our findings suggest that balancing the factors is also essential in multi-factor settings. Moreover, we extend our weighting method to handle missing treatment combinations in observed data. Finally, we study the asymptotic behavior of the new weighting estimators and propose a consistent variance estimator, providing reliable inferences on factorial effects in observational studies.

有偏 · 語音識別 · 端到端 · MoDELS · Performer ·

2023 年 10 月 7 日

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

Kaixun Huang,Ao Zhang,Binbin Zhang,Tianyi Xu,Xingchen Song,Lei Xie

from arxiv, Accepted by ASRU2023

The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases.

Facebook AI Research · ML · 得分 · 有偏 · 評論員 ·

2023 年 10 月 6 日

Fair Feature Importance Scores for Interpreting Tree-Based Methods and Surrogates

Camille Olivia Little,Debolina Halder Lina,Genevera I. Allen

Across various sectors such as healthcare, criminal justice, national security, finance, and technology, large-scale machine learning (ML) and artificial intelligence (AI) systems are being deployed to make critical data-driven decisions. Many have asked if we can and should trust these ML systems to be making these decisions. Two critical components are prerequisites for trust in ML systems: interpretability, or the ability to understand why the ML system makes the decisions it does, and fairness, which ensures that ML systems do not exhibit bias against certain individuals or groups. Both interpretability and fairness are important and have separately received abundant attention in the ML literature, but so far, there have been very few methods developed to directly interpret models with regard to their fairness. In this paper, we focus on arguably the most popular type of ML interpretation: feature importance scores. Inspired by the use of decision trees in knowledge distillation, we propose to leverage trees as interpretable surrogates for complex black-box ML models. Specifically, we develop a novel fair feature importance score for trees that can be used to interpret how each feature contributes to fairness or bias in trees, tree-based ensembles, or tree-based surrogates of any complex ML system. Like the popular mean decrease in impurity for trees, our Fair Feature Importance Score is defined based on the mean decrease (or increase) in group bias. Through simulations as well as real examples on benchmark fairness datasets, we demonstrate that our Fair Feature Importance Score offers valid interpretations for both tree-based ensembles and tree-based surrogates of other ML systems.

優化器 · Continuity · 連續優化 · Conformer · 估計/估計量 ·

2023 年 10 月 6 日

Adaptive Computation of an Elliptic Eigenvalue Optimization Problem with a Phase-Field Approach

Jing Li,Yifeng Xu,Shengfeng Zhu

from arxiv, 36 pages, 24 figures, 2 tables

In this paper, we discuss adaptive approximations of an elliptic eigenvalue optimization problem in a phase-field setting by a conforming finite element method. An adaptive algorithm is proposed and implemented in several two dimensional numerical examples for illustration of efficiency and accuracy. Theoretical findings consist in the vanishing limit of a subsequence of estimators and the convergence of the relevant subsequence of adaptively-generated solutions to a solution to the continuous optimality system.

數據增強 · Taxonomy · 文本分類 · Machine Learning · 訓練數據 ·

2021 年 7 月 7 日

A Survey on Data Augmentation for Text Classification

Markus Bayer,Marc-André Kaufhold,Christian Reuter

from arxiv, 35 pages, 6 figures, 8 tables

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).