18禁不卡无毒免费网站入口-国产在线精品99一区

Homogeneous normalized random measures with independent increments (hNRMIs) represent a broad class of Bayesian nonparametric priors and thus are widely used. In this paper, we obtain the strong law of large numbers, the central limit theorem and the functional central limit theorem of hNRMIs when the concentration parameter $a$ approaches infinity. To quantify the convergence rate of the obtained central limit theorem, we further study the Berry-Esseen bound, which turns out to be of the form $O \left( \frac{1}{\sqrt{a}}\right)$. As an application of the central limit theorem, we present the functional delta method, which can be employed to obtain the limit of the quantile process of hNRMIs. As an illustration of the central limit theorems, we demonstrate the convergence numerically for the Dirichlet processes and the normalized inverse Gaussian processes with various choices of the concentration parameters.

相關內容

規范化的

關注 0

泛函 · Analysis · 高斯混合（模型） · 勢函數 · 統計量 ·

2024 年 5 月 6 日

A quantitative and typological study of Early Slavic participle clauses and their competition

Nilo Pedrazzini

from arxiv, 259 pages, 138 figures. DPhil Thesis in Linguistics submitted and defended at the University of Oxford (December 2023). This manuscript is a version formatted for improved readability and broader dissemination

This thesis is a corpus-based, quantitative, and typological analysis of the functions of Early Slavic participle constructions and their finite competitors ($jegda$-'when'-clauses). The first part leverages detailed linguistic annotation on Early Slavic corpora at the morphosyntactic, dependency, information-structural, and lexical levels to obtain indirect evidence for different potential functions of participle clauses and their main finite competitor and understand the roles of compositionality and default discourse reasoning as explanations for the distribution of participle constructions and $jegda$-clauses in the corpus. The second part uses massively parallel data to analyze typological variation in how languages express the semantic space of English $when$, whose scope encompasses that of Early Slavic participle constructions and $jegda$-clauses. Probabilistic semantic maps are generated and statistical methods (including Kriging, Gaussian Mixture Modelling, precision and recall analysis) are used to induce cross-linguistically salient dimensions from the parallel corpus and to study conceptual variation within the semantic space of the hypothetical concept WHEN.

GROUP · CASE · Analysis · 標量 · Extensibility ·

2024 年 5 月 2 日

Sensitivity analysis for matching on high-dimensional predictors: A case study of racial disparity in US mortality

Marina Hernandez,Ciprian Crainiceanu

Matching on a low dimensional vector of scalar covariates consists of constructing groups of individuals in which each individual in a group is within a pre-specified distance from an individual in another group. However, matching in high dimensional spaces is more challenging because the distance can be sensitive to implementation details, caliper width, and measurement error of observations. To partially address these problems, we propose to use extensive sensitivity analyses and identify the main sources of variation and bias. We illustrate these concepts by examining the racial disparity in all-cause mortality in the US using the National Health and Nutrition Examination Survey (NHANES 2003-2006). In particular, we match African Americans to Caucasian Americans on age, gender, BMI and objectively measured physical activity (PA). PA is measured every minute using accelerometers for up to seven days and then transformed into an empirical distribution of all of the minute-level observations. The Wasserstein metric is used as the measure of distance between these participant-specific distributions.

調和平均 · 均值 · Copulas · 統計量 · 閾值 ·

2024 年 5 月 2 日

Sub-uniformity of harmonic mean p-values

Yuyu Chen,Ruodu Wang,Yuming Wang,Wenhao Zhu

We obtain several inequalities on the generalized means of dependent p-values. In particular, the weighted harmonic mean of p-values is strictly sub-uniform under several dependence assumptions of p-values, including independence, weak negative association, the class of extremal mixture copulas, and some Clayton copulas. Sub-uniformity of the harmonic mean of p-values has an important implication in multiple hypothesis testing: It is statistically invalid to merge p-values using the harmonic mean unless a proper threshold or multiplier adjustment is used, and this invalidity applies across all significance levels. The required multiplier adjustment on the harmonic mean explodes as the number of p-values increases, and hence there does not exist a constant multiplier that works for any number of p-values, even under independence.

可辨認的 · 線性的 · 泛化理論 · CASE · 基 ·

2024 年 5 月 2 日

A first efficient algorithm for enumerating all the extreme points of a bisubmodular polyhedron

Yasuko Matsui,Takeshi Naitoh,Ping Zhan

from arxiv, 20 pagers, 3 figures

Efficiently enumerating all the extreme points of a polytope identified by a system of linear inequalities is a well-known challenge issue. We consider a special case and present an algorithm that enumerates all the extreme points of a bisubmodular polyhedron in $\mathcal{O}(n^4|V|)$ time and $\mathcal{O}(n^2)$ space complexity, where $n$ is the dimension of underlying space and $V$ is the set of outputs. We use the reverse search and signed poset linked to extreme points to avoid the redundant search. Our algorithm is a generalization of enumerating all the extreme points of a base polyhedron which comprises some combinatorial enumeration problems.

簇 · 直徑 · 分離的 · contrastive · 近似 ·

2024 年 5 月 2 日

New bounds on the cohesion of complete-link and other linkage methods for agglomeration clustering

Sanjoy Dasgupta,Eduardo Laber

Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces. One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters. We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.

估計/估計量 · Analysis · 噪聲 · Signal Processing · Processing（編程語言） ·

2024 年 5 月 1 日

A perturbative analysis for noisy spectral estimation

Lexing Ying

Spectral estimation is a fundamental task in signal processing. Recent algorithms in quantum phase estimation are concerned with the large noise, large frequency regime of the spectral estimation problem. The recent work in Ding-Epperly-Lin-Zhang shows that the ESPRIT algorithm exhibits superconvergence behavior for the spike locations in terms of the maximum frequency. This note provides a perturbative analysis to explain this behavior. It also extends the discussion to the case where the noise grows with the sampling frequency.

Color · 邊 · 可約的 · CASES · 相互獨立的 ·

2024 年 5 月 1 日

Improved linearly ordered colorings of hypergraphs via SDP rounding

Anand Louis,Alantha Newman,Arka Ray

from arxiv, 19 pages; 13 pages for the main body

We consider the problem of linearly ordered (LO) coloring of hypergraphs. A hypergraph has an LO coloring if there is a vertex coloring, using a set of ordered colors, so that (i) no edge is monochromatic, and (ii) each edge has a unique maximum color. It is an open question as to whether or not a 2-LO colorable 3-uniform hypergraph can be LO colored with 3 colors in polynomial time. Nakajima and Zivn\'{y} recently gave a polynomial-time algorithm to color such hypergraphs with $\widetilde{O}(n^{1/3})$ colors and asked if SDP methods can be used directly to obtain improved bounds. Our main result is to show how to use SDP-based rounding methods to produce an LO coloring with $\widetilde{O}(n^{1/5})$ colors for such hypergraphs. We first show that we can reduce the problem to cases with highly structured SDP solutions, which we call balanced hypergraphs. Then we show how to apply classic SDP-rounding tools in this case. We believe that the reduction to balanced hypergraphs is novel and could be of independent interest.

優化器 · 樣本 · 統計量 · 蒙特卡羅 · 離散化 ·

2024 年 5 月 1 日

Optimizing the diffusion coefficient of overdamped Langevin dynamics

Tony Lelièvre,Grigorios A. Pavliotis,Geneviève Robin,Régis Santet,Gabriel Stoltz

from arxiv, 76 pages, 11 figures

Overdamped Langevin dynamics are reversible stochastic differential equations which are commonly used to sample probability measures in high-dimensional spaces, such as the ones appearing in computational statistical physics and Bayesian inference. By varying the diffusion coefficient, there are in fact infinitely many overdamped Langevin dynamics which are reversible with respect to the target probability measure at hand. This suggests to optimize the diffusion coefficient in order to increase the convergence rate of the dynamics, as measured by the spectral gap of the generator associated with the stochastic differential equation. We analytically study this problem here, obtaining in particular necessary conditions on the optimal diffusion coefficient. We also derive an explicit expression of the optimal diffusion in some appropriate homogenized limit. Numerical results, both relying on discretizations of the spectral gap problem and Monte Carlo simulations of the stochastic dynamics, demonstrate the increased quality of the sampling arising from an appropriate choice of the diffusion coefficient.

Conformer · 得分 · 推斷 · 代價 · 預測器/決策函數 ·

2024 年 5 月 1 日

Conformal inference for random objects

Hang Zhou,Hans-Georg Müller

We develop an inferential toolkit for analyzing object-valued responses, which correspond to data situated in general metric spaces, paired with Euclidean predictors within the conformal framework. To this end we introduce conditional profile average transport costs, where we compare distance profiles that correspond to one-dimensional distributions of probability mass falling into balls of increasing radius through the optimal transport cost when moving from one distance profile to another. The average transport cost to transport a given distance profile to all others is crucial for statistical inference in metric spaces and underpins the proposed conditional profile scores. A key feature of the proposed approach is to utilize the distribution of conditional profile average transport costs as conformity score for general metric space-valued responses, which facilitates the construction of prediction sets by the split conformal algorithm. We derive the uniform convergence rate of the proposed conformity score estimators and establish asymptotic conditional validity for the prediction sets. The finite sample performance for synthetic data in various metric spaces demonstrates that the proposed conditional profile score outperforms existing methods in terms of both coverage level and size of the resulting prediction sets, even in the special case of scalar and thus Euclidean responses. We also demonstrate the practical utility of conditional profile scores for network data from New York taxi trips and for compositional data reflecting energy sourcing of U.S. states.

模型評估 · 多峰值 · 語言模型化 · 大語言模型 · 可辨認的 ·

2024 年 4 月 30 日

Correcting misinformation on social media with a large language model

Xinyi Zhou,Ashish Sharma,Amy X. Zhang,Tim Althoff

from arxiv, 53 pages

Real-world misinformation can be partially correct and even factual but misleading. It undermines public trust in science and democracy, particularly on social media, where it can spread rapidly. High-quality and timely correction of misinformation that identifies and explains its (in)accuracies has been shown to effectively reduce false beliefs. Despite the wide acceptance of manual correction, it is difficult to be timely and scalable, a concern as technologies like large language models (LLMs) make misinformation easier to produce. LLMs also have versatile capabilities that could accelerate misinformation correction-however, they struggle due to a lack of recent information, a tendency to produce false content, and limitations in addressing multimodal information. We propose MUSE, an LLM augmented with access to and credibility evaluation of up-to-date information. By retrieving evidence as refutations or contexts, MUSE identifies and explains (in)accuracies in a piece of content-not presupposed to be misinformation-with references. It also describes images and conducts multimodal searches to verify and correct multimodal content. Fact-checking experts evaluate responses to social media content that are not presupposed to be (non-)misinformation but broadly include incorrect, partially correct, and correct posts, that may or may not be misleading. We propose and evaluate 13 dimensions of misinformation correction quality, ranging from the accuracy of identifications and factuality of explanations to the relevance and credibility of references. The results demonstrate MUSE's ability to promptly write high-quality responses to potential misinformation on social media-overall, MUSE outperforms GPT-4 by 37% and even high-quality responses from laypeople by 29%. This work reveals LLMs' potential to help combat real-world misinformation effectively and efficiently.