人人干人人摸人人操,亚洲日本午夜一区二区三区,92国产福利一区二区三区

This paper studies distribution-free inference in settings where the data set has a hierarchical structure -- for example, groups of observations, or repeated measurements. In such settings, standard notions of exchangeability may not hold. To address this challenge, a hierarchical form of exchangeability is derived, facilitating extensions of distribution-free methods, including conformal prediction and jackknife+. While the standard theoretical guarantee obtained by the conformal prediction framework is a marginal predictive coverage guarantee, in the special case of independent repeated measurements, it is possible to achieve a stronger form of coverage -- the "second-moment coverage" property -- to provide better control of conditional miscoverage rates, and distribution-free prediction sets that achieve this property are constructed. Simulations illustrate that this guarantee indeed leads to uniformly small conditional miscoverage rates. Empirically, this stronger guarantee comes at the cost of a larger width of the prediction set in scenarios where the fitted model is poorly calibrated, but this cost is very mild in cases where the fitted model is accurate.

相關內容

可交換的

關注 0

Copulas · MoDELS · 分解的 · Performer · 潛在 ·

2024 年 4 月 15 日

Factor copula models for non-Gaussian longitudinal data

Subhajit Chattopadhyay

from arxiv, 24 pages, 4 figures and 10 tables

This article presents factor copula approaches to model temporal dependency of non-Gaussian (continuous/discrete) longitudinal data. Factor copula models are canonical vine copulas which explain the underlying dependence structure of a multivariate data through latent variables, and therefore can be easily interpreted and implemented to unbalanced longitudinal data. We develop regression models for continuous, binary and ordinal longitudinal data including covariates, by using factor copula constructions with subject-specific latent variables. Considering homogeneous within-subject dependence, our proposed models allow for feasible parametric inference in moderate to high dimensional situations, using two-stage (IFM) estimation method. We assess the finite sample performance of the proposed models with extensive simulation studies. In the empirical analysis, the proposed models are applied for analysing different longitudinal responses of two real world data sets. Moreover, we compare the performances of these models with some widely used random effect models using standard model selection techniques and find substantial improvements. Our studies suggest that factor copula models can be good alternatives to random effect models and can provide better insights to temporal dependency of longitudinal data of arbitrary nature.

離散化 · Performer · 自頂向下 · 自下而上 · 模型評估 ·

2024 年 4 月 15 日

Discrete forecast reconciliation

Bohan Zhang,Anastasios Panagiotelis,Yanfei Kang

This paper presents a formal framework and proposes algorithms to extend forecast reconciliation to discrete-valued data to extend forecast reconciliation to discrete-valued data, including low counts. A novel method is introduced based on recasting the optimisation of scoring rules as an assignment problem, which is solved using quadratic programming. The proposed framework produces coherent joint probabilistic forecasts for count hierarchical time series. Two discrete reconciliation algorithms are also proposed and compared against generalisations of the top-down and bottom-up approaches for count data. Two simulation experiments and two empirical examples are conducted to validate that the proposed reconciliation algorithms improve forecast accuracy. The empirical applications are forecasting criminal offences in Washington D.C. and product unit sales in the M5 dataset. Compared to benchmarks, the proposed framework shows superior performance in both simulations and empirical studies.

原點 · 類別 · 模態 ·

2024 年 4 月 12 日

Many-valued coalgebraic logic over semi-primal varieties

Alexander Kurz,Wolfgang Poiger,Bruno Teheux

We study many-valued coalgebraic logics with semi-primal algebras of truth-degrees. We provide a systematic way to lift endofunctors defined on the variety of Boolean algebras to endofunctors on the variety generated by a semi-primal algebra. We show that this can be extended to a technique to lift classical coalgebraic logics to many-valued ones, and that (one-step) completeness and expressivity are preserved under this lifting. For specific classes of endofunctors, we also describe how to obtain an axiomatization of the lifted many-valued logic directly from an axiomatization of the original classical one. In particular, we apply all of these techniques to classical modal logic.

推斷 · MoDELS · INFORMS · 情景 · 樣本 ·

2024 年 4 月 11 日

Diffusion posterior sampling for simulation-based inference in tall data settings

Julia Linhart,Gabriel Victorino Cardoso,Alexandre Gramfort,Sylvain Le Corff,Pedro L. C. Rodrigues

from arxiv, 38 pages, 20 figures, 3 tables, 11 appendices

Determining which parameters of a non-linear model could best describe a set of experimental data is a fundamental problem in science and it has gained much traction lately with the rise of complex large-scale simulators (a.k.a. black-box simulators). The likelihood of such models is typically intractable, which is why classical MCMC methods can not be used. Simulation-based inference (SBI) stands out in this context by only requiring a dataset of simulations to train deep generative models capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available and one wishes to leverage their shared information to better infer the parameters of the model. The method we propose is built upon recent developments from the flourishing score-based diffusion literature and allows us to estimate the tall data posterior distribution simply using information from the score network trained on individual observations. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.

模型評估 · 均值 · 線性的 · FAST · CASE ·

2024 年 4 月 5 日

A mean correction for improved phase-averaging accuracy in oscillatory, multiscale, differential equations

Timothy Charles Andrews,Beth A. Wingate

from arxiv, 25 pages

This paper introduces a new algorithm to improve the accuracy of numerical phase-averaging in oscillatory, multiscale, differential equations. Phase-averaging is a technique that applies averaging to a mapped variable to remove highly oscillatory linear terms from the differential equation. This retains the main contribution of fast oscillations on the low frequencies without needing to resolve the rapid oscillations themselves. However, this comes at the cost of an averaging error, which we aim to offset with a modified mapping. The new mapping includes a mean correction which encodes an average measure of the nonlinear interactions. This mapping was introduced in Tao (2019) for weak nonlinearity and relied on classical time averaging. Our algorithm extends this work to the case where 1) the nonlinearity is not weak but the linear oscillations are fast and 2) finite averaging windows are applied via a smooth kernel, which has the advantage of retaining low frequencies whilst still eliminating the fastest oscillations. We show that the new algorithm reduces phase errors in the mapped variable for the swinging spring ODE. We also demonstrate accuracy improvements compared to standard phase-averaging in numerical experiments with the one-dimensional Rotating Shallow Water Equations, a useful test case for weather and climate applications. As the mean correction term can be computed in parallel, this new mapping has potential as a more accurate, yet still computationally cheap, coarse propagator for the oscillatory parareal method.

2024 年 4 月 2 日

Asymptotics of resampling without replacement in robust and logistic regression

Pierre C Bellec,Takuya Koriyama

from arxiv, 25 pages, 10 figures

This paper studies the asymptotics of resampling without replacement in the proportional regime where dimension $p$ and sample size $n$ are of the same order. For a given dataset $(\bm{X},\bm{y})\in\mathbb{R}^{n\times p}\times \mathbb{R}^n$ and fixed subsample ratio $q\in(0,1)$, the practitioner samples independently of $(\bm{X},\bm{y})$ iid subsets $I_1,...,I_M$ of $\{1,...,n\}$ of size $q n$ and trains estimators $\bm{\hat{\beta}}(I_1),...,\bm{\hat{\beta}}(I_M)$ on the corresponding subsets of rows of $(\bm{X},\bm{y})$. Understanding the performance of the bagged estimate $\bm{\bar{\beta}} = \frac1M\sum_{m=1}^M \bm{\hat{\beta}}(I_1),...,\bm{\hat{\beta}}(I_M)$, for instance its squared error, requires us to understand correlations between two distinct $\bm{\hat{\beta}}(I_m)$ and $\bm{\hat{\beta}}(I_{m'})$ trained on different subsets $I_m$ and $I_{m'}$. In robust linear regression and logistic regression, we characterize the limit in probability of the correlation between two estimates trained on different subsets of the data. The limit is characterized as the unique solution of a simple nonlinear equation. We further provide data-driven estimators that are consistent for estimating this limit. These estimators of the limiting correlation allow us to estimate the squared error of the bagged estimate $\bm{\bar{\beta}}$, and for instance perform parameter tuning to choose the optimal subsample ratio $q$. As a by-product of the proof argument, we obtain the limiting distribution of the bivariate pair $(\bm{x}_i^T \bm{\hat{\beta}}(I_m), \bm{x}_i^T \bm{\hat{\beta}}(I_{m'}))$ for observations $i\in I_m\cap I_{m'}$, i.e., for observations used to train both estimates.

泛函 · 離散化 · Analysis · PCA · 散度 ·

2024 年 4 月 2 日

Theory of functional principal component analysis for discretely observed data

Hang Zhou,Dongyi Wei,Fang Yao

Functional data analysis is an important research field in statistics which treats data as random functions drawn from some infinite-dimensional functional space, and functional principal component analysis (FPCA) based on eigen-decomposition plays a central role for data reduction and representation. After nearly three decades of research, there remains a key problem unsolved, namely, the perturbation analysis of covariance operator for diverging number of eigencomponents obtained from noisy and discretely observed data. This is fundamental for studying models and methods based on FPCA, while there has not been substantial progress since Hall, M\"uller and Wang (2006)'s result for a fixed number of eigenfunction estimates. In this work, we aim to establish a unified theory for this problem, obtaining upper bounds for eigenfunctions with diverging indices in both the $\mathcal{L}^2$ and supremum norms, and deriving the asymptotic distributions of eigenvalues for a wide range of sampling schemes. Our results provide insight into the phenomenon when the $\mathcal{L}^{2}$ bound of eigenfunction estimates with diverging indices is minimax optimal as if the curves are fully observed, and reveal the transition of convergence rates from nonparametric to parametric regimes in connection to sparse or dense sampling. We also develop a double truncation technique to handle the uniform convergence of estimated covariance and eigenfunctions. The technical arguments in this work are useful for handling the perturbation series with noisy and discretely observed functional data and can be applied in models or those involving inverse problems based on FPCA as regularization, such as functional linear regression.

INFORMS · 廣義函數 · 泛函 · 香農熵 · 估計/估計量 ·

2024 年 3 月 31 日

On cumulative and relative cumulative past information generating function

Santosh Kumar Chaudhary,Nitin Gupta,Achintya Roy

In this paper, we introduce the cumulative past information generating function (CPIG) and relative cumulative past information generating function (RCPIG). We study its properties. We establish its relation with generalized cumulative past entropy (GCPE). We defined CPIG stochastic order and its relation with dispersive order. We provide the results for the CPIG measure of the convoluted random variables in terms of the measures of its components. We found some inequality relating to Shannon entropy, CPIG and GCPE. Some characterization and estimation results are also discussed regarding CPIG. We defined divergence measures between two random variables, Jensen-cumulative past information generating function(JCPIG), Jensen fractional cumulative past entropy measure, cumulative past Taneja entropy, and Jensen cumulative past Taneja entropy information measure.

集成 · 平滑 · 樣本 · 模型評估 · MoDELS ·

2024 年 3 月 29 日

Sampling error mitigation through spectrum smoothing in ensemble data assimilation

Bosu Choi,Yoonsang Lee

In data assimilation, an ensemble provides a nonintrusive way to evolve a probability density described by a nonlinear prediction model. Although a large ensemble size is required for statistical accuracy, the ensemble size is typically limited to a small number due to the computational cost of running the prediction model, which leads to a sampling error. Several methods, such as localization, exist to mitigate the sampling error, often requiring problem-dependent fine-tuning and design. This work introduces another sampling error mitigation method using a smoothness constraint in the Fourier space. In particular, this work smoothes out the spectrum of the system to increase the stability and accuracy even under a small ensemble size. The efficacy of the new idea is validated through a suite of stringent test problems, including Lorenz 96 and Kuramoto-Sivashinsky turbulence models.

估計/估計量 · 穩健性 · 推斷 · binary · Nuance ·

2024 年 3 月 29 日

Doubly robust estimation and inference for a log-concave counterfactual density

Daeyoung Ham,Ted Westling,Charles R. Doss

We consider the problem of causal inference based on observational data (or the related missing data problem) with a binary or discrete treatment variable. In that context we study counterfactual density estimation, which provides more nuanced information than counterfactual mean estimation (i.e., the average treatment effect). We impose the shape-constraint of log-concavity (a unimodality constraint) on the counterfactual densities, and then develop doubly robust estimators of the log-concave counterfactual density (based on an augmented inverse-probability weighted pseudo-outcome), and show the consistency in various global metrics of that estimator. Based on that estimator we also develop asymptotically valid pointwise confidence intervals for the counterfactual density.