日本欧美岛国一线网站,精品人妻无码专区视频

This paper is motivated by the need to quantify human immune responses to environmental challenges. Specifically, the genome of the selected cell population from a blood sample is amplified by the well-known PCR process of successive heating and cooling, producing a large number of reads. They number roughly 30,000 to 300,000. Each read corresponds to a particular rearrangement of so-called V(D)J sequences. In the end, the observation consists of a set of numbers of reads corresponding to different V(D)J sequences. The underlying relative frequencies of distinct V(D)J sequences can be summarized by a probability vector, with the cardinality being the number of distinct V(D)J rearrangements present in the blood. Statistical question is to make inferences on a summary parameter of the probability vector based on a single multinomial-type observation of a large dimension. Popular summary of the diversity of a cell population includes clonality and entropy, or more generally, is a suitable function of the probability vector. A point estimator of the clonality based on multiple replicates from the same blood sample has been proposed previously. After obtaining a point estimator of a particular function, the remaining challenge is to construct a confidence interval of the parameter to appropriately reflect its uncertainty. In this paper, we have proposed to couple the empirical Bayes method with a resampling-based calibration procedure to construct a robust confidence interval for different population diversity parameters. The method has been illustrated via extensive numerical study and real data examples.

相關內容

置信度

關注 1

多樣性 · 度量 · 回合 · 一致 · 數據集 ·

2023 年 4 月 6 日

Pragmatically Appropriate Diversity for Dialogue Evaluation

Katherine Stasaski,Marti A. Hearst

Linguistic pragmatics state that a conversation's underlying speech acts can constrain the type of response which is appropriate at each turn in the conversation. When generating dialogue responses, neural dialogue agents struggle to produce diverse responses. Currently, dialogue diversity is assessed using automatic metrics, but the underlying speech acts do not inform these metrics. To remedy this, we propose the notion of Pragmatically Appropriate Diversity, defined as the extent to which a conversation creates and constrains the creation of multiple diverse responses. Using a human-created multi-response dataset, we find significant support for the hypothesis that speech acts provide a signal for the diversity of the set of next responses. Building on this result, we propose a new human evaluation task where creative writers predict the extent to which conversations inspire the creation of multiple diverse responses. Our studies find that writers' judgments align with the Pragmatically Appropriate Diversity of conversations. Our work suggests that expectations for diversity metric scores should vary depending on the speech act.

因果效應 · 度量 · 因果推斷 · 推斷 · 關鍵特性 ·

2023 年 4 月 6 日

Independence weights for causal inference with continuous treatments

Jared D. Huling,Noah Greifer,Guanhua Chen

Studying causal effects of continuous treatments is important for gaining a deeper understanding of many interventions, policies, or medications, yet researchers are often left with observational studies for doing so. In the observational setting, confounding is a barrier to the estimation of causal effects. Weighting approaches seek to control for confounding by reweighting samples so that confounders are comparable across different treatment values. Yet, for continuous treatments, weighting methods are highly sensitive to model misspecification. In this paper we elucidate the key property that makes weights effective in estimating causal quantities involving continuous treatments. We show that to eliminate confounding, weights should make treatment and confounders independent on the weighted scale. We develop a measure that characterizes the degree to which a set of weights induces such independence. Further, we propose a new model-free method for weight estimation by optimizing our measure. We study the theoretical properties of our measure and our weights, and prove that our weights can explicitly mitigate treatment-confounder dependence. The empirical effectiveness of our approach is demonstrated in a suite of challenging numerical experiments, where we find that our weights are quite robust and work well under a broad range of settings.

反向傳播 · 預測編碼 · 形態學 · 分配算法 · 變分貝葉斯 ·

2023 年 4 月 5 日

Predictive Coding as a Neuromorphic Alternative to Backpropagation: A Critical Evaluation

Umais Zahid,Qinghai Guo,Zafeirios Fountas

Backpropagation has rapidly become the workhorse credit assignment algorithm for modern deep learning methods. Recently, modified forms of predictive coding (PC), an algorithm with origins in computational neuroscience, have been shown to result in approximately or exactly equal parameter updates to those under backpropagation. Due to this connection, it has been suggested that PC can act as an alternative to backpropagation with desirable properties that may facilitate implementation in neuromorphic systems. Here, we explore these claims using the different contemporary PC variants proposed in the literature. We obtain time complexity bounds for these PC variants which we show are lower-bounded by backpropagation. We also present key properties of these variants that have implications for neurobiological plausibility and their interpretations, particularly from the perspective of standard PC as a variational Bayes algorithm for latent probabilistic models. Our findings shed new light on the connection between the two learning frameworks and suggest that, in its current forms, PC may have more limited potential as a direct replacement of backpropagation than previously envisioned.

連續空間 · 度量 · 樣本空間 · 穩健 · 結構 ·

2023 年 4 月 5 日

Local Intrinsic Dimensional Entropy

Rohan Ghosh,Mehul Motani

from arxiv, Proceedings of the AAAI Conference on Artificial Intelligence 2023

Most entropy measures depend on the spread of the probability distribution over the sample space X, and the maximum entropy achievable scales proportionately with the sample space cardinality |X|. For a finite |X|, this yields robust entropy measures which satisfy many important properties, such as invariance to bijections, while the same is not true for continuous spaces (where |X|=infinity). Furthermore, since R and R^d (d in Z+) have the same cardinality (from Cantor's correspondence argument), cardinality-dependent entropy measures cannot encode the data dimensionality. In this work, we question the role of cardinality and distribution spread in defining entropy measures for continuous spaces, which can undergo multiple rounds of transformations and distortions, e.g., in neural networks. We find that the average value of the local intrinsic dimension of a distribution, denoted as ID-Entropy, can serve as a robust entropy measure for continuous spaces, while capturing the data dimensionality. We find that ID-Entropy satisfies many desirable properties and can be extended to conditional entropy, joint entropy and mutual-information variants. ID-Entropy also yields new information bottleneck principles and also links to causality. In the context of deep learning, for feedforward architectures, we show, theoretically and empirically, that the ID-Entropy of a hidden layer directly controls the generalization gap for both classifiers and auto-encoders, when the target function is Lipschitz continuous. Our work primarily shows that, for continuous spaces, taking a structural rather than a statistical approach yields entropy measures which preserve intrinsic data dimensionality, while being relevant for studying various architectures.

貝葉斯 · 基函數 · 分形 · 數值求解 · 估計誤差 ·

2023 年 4 月 4 日

A Bayesian Collocation Integral Method for Parameter Estimation in Ordinary Differential Equations

Mingwei Xu,Samuel W. K. Wong,Peijun Sang

Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these methods can be hindered by estimation error, especially when only sparse time-course observations are available. We present a Bayesian collocation framework that operates on the integrated form of the ODEs and also avoids the expensive use of numerical solvers. Our methodology has the capability to handle general nonlinear ODE systems. We demonstrate the accuracy of the proposed method through a simulation study, where the estimated parameters and recovered system trajectories are compared with other recent methods. A real data example is also provided.

度量空間 · 度量 · 經驗風險 · 概率 · 估計誤差 ·

2023 年 4 月 3 日

On the Concentration of the Minimizers of Empirical Risks

Paul Escande

Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.

文本質量 · 無參考 · ChatGPT · 實證研究 · 大型語言模型 ·

2023 年 4 月 3 日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Yi Chen,Rui Wang,Haiyun Jiang,Shuming Shi,Ruifeng Xu

from arxiv, Technical Report, 13 pages

Evaluating the quality of generated text is a challenging task in natural language processing. This difficulty arises from the inherent complexity and diversity of text. Recently, OpenAI's ChatGPT, a powerful large language model (LLM), has garnered significant attention due to its impressive performance in various tasks. Therefore, we present this report to investigate the effectiveness of LLMs, especially ChatGPT, and explore ways to optimize their use in assessing text quality. We compared three kinds of reference-free evaluation methods based on ChatGPT or similar LLMs. The experimental results prove that ChatGPT is capable to evaluate text quality effectively from various perspectives without reference and demonstrates superior performance than most existing automatic metrics. In particular, the Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches. However, directly comparing the quality of two texts using ChatGPT may lead to suboptimal results. We hope this report will provide valuable insights into selecting appropriate methods for evaluating text quality with LLMs such as ChatGPT.

統計建模 · 極大似然估計 · 穩健統計 · 穩健 · 擬合 ·

2023 年 4 月 2 日

Robust statistical modeling of monthly rainfall: The minimum density power divergence approach

Arnab Hazra,Abhik Ghosh

from arxiv, 30 pages, 5 Tables, 7 Figures

Statistical modeling of monthly, seasonal, or annual total rainfall is a crucial area of research in meteorology, mainly from the perspective of rainfed agriculture, where a proper assessment of the future availability of rainwater is necessary. The rainfall amount during a wet period can take any positive value and some simple (one or two-parameter) probability models supported over the positive real line that are generally used for rainfall modeling are exponential, gamma, Weibull, lognormal, Pearson Type-V/VI, log-logistic, etc., where the unknown model parameters are routinely estimated using the maximum likelihood estimator (MLE). However, the presence of outliers or extreme observations is a common issue in rainfall data and the MLEs being highly sensitive to them often leads to spurious inference. Here, we discuss a robust parameter estimation approach based on the minimum density power divergence estimator (MDPDE). We fit the above four parametric models to the areally-weighted monthly rainfall data from the 36 meteorological subdivisions of India for the years 1951-2014 and compare the fits based on MLE and the proposed optimum MDPDE; the superior performance of MDPDE is showcased for several cases. For all month-subdivision combinations, we discuss the best-fit models and the estimated median rainfall amounts.

物理層 · 并發傳輸 · 載波頻偏 · 射頻干擾 · 傳輸 ·

2023 年 4 月 1 日

Understanding Concurrent Transmissions: The Impact of Carrier Frequency Offset and RF Interference on Physical Layer Performance

Michael Baddeley,Carlo Alberto Boano,Antonio Escobar-Molero,Ye Liu,Xiaoyuan Ma,Victor Marot,Usman Raza,Kay R?mer,Markus Schuss,Aleksandar Stanoev

from arxiv, arXiv admin note: substantial text overlap with arXiv:2005.13816

The popularity of concurrent transmissions (CT) has soared after recent studies have shown their feasibility on the four physical layers specified by BLE 5, hence providing an alternative to the use of IEEE 802.15.4 for the design of reliable and efficient low-power wireless protocols. However, to date, the extent to which physical layer properties affect the performance of CT has not yet been investigated in detail. This paper fills this gap and provides an extensive study on the impact of the physical layer on CT-based solutions using IEEE 802.15.4 and BLE 5. We first highlight through simulation how the impact of errors induced by relative carrier frequency offsets on the performance of CT highly depends on the choice of the underlying physical layer. We then confirm these observations experimentally on real hardware and with varying environmental conditions through an analysis of the bit error distribution across received packets, unveiling possible techniques to effectively handle these errors. We further study the performance of CT-based data collection and dissemination protocols in the presence of RF interference on a large-scale testbed, deriving insights on how the employed physical layer affects their dependability.

Machine Learning · 學成 · 可辨認的 · 統計量 · 話題 ·

2020 年 4 月 3 日

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Eyke Hüllermeier,Willem Waegeman

from arxiv, 52 pages

The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often refereed to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of hitherto attempts at handling uncertainty in general and formalizing this distinction in particular.