青青国产成人久久激情91-一区二区三区精品国产亚洲

High-dimensional variable selection, with many more covariates than observations, is widely documented in standard regression models, but there are still few tools to address it in non-linear mixed-effects models where data are collected repeatedly on several individuals. In this work, variable selection is approached from a Bayesian perspective and a selection procedure is proposed, combining the use of a spike-and-slab prior and the SAEM algorithm. Similarly to Lasso regression, the set of relevant covariates is selected by exploring a grid of values for the penalisation parameter. The SAEM approach is much faster than a classical MCMC algorithm and our method shows very good selection performances on simulated data. Its flexibility is demonstrated by implementing it for a variety of nonlinear mixed effects models. The usefulness of the proposed method is illustrated on a problem of genetic markers identification, relevant for genomic-assisted selection in plant breeding.

相關內容

MoDELS

關注 0

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Analysis · 泛函 · 可約的 · 極大值 ·

2023 年 6 月 28 日

Functional and variables selection in extreme value models for regional flood frequency analysis

Aldo Gardini

The problem of estimating return levels of river discharge, relevant in flood frequency analysis, is tackled by relying on the extreme value theory. The Generalized Extreme Value (GEV) distribution is assumed to model annual maxima values of river discharge registered at multiple gauging stations belonging to the same river basin. The specific features of the data from the Upper Danube basin drive the definition of the proposed statistical model. Firstly, Bayesian P-splines are considered to account for the non-linear effects of station-specific covariates on the GEV parameters. Secondly, the problem of functional and variable selection is addressed by imposing a grouped horseshoe prior on the coefficients, to encourage the shrinkage of non-relevant components to zero. A cross-validation study is organized to compare the proposed modeling solution to other models, showing its potential in reducing the uncertainty of the ungauged predictions without affecting their calibration.

Performer · 蒙特卡羅 · 縮放 · MoDELS · 邊緣化 ·

2023 年 6 月 28 日

Generalized Bayesian Multidimensional Scaling and Model Comparison

Jiarui Zhang,Liangliang Wang

Multidimensional scaling is widely used to reconstruct a map with the points' coordinates in a low-dimensional space from the original high-dimensional space while preserving the pairwise distances. In a Bayesian framework, the current approach using Markov chain Monte Carlo algorithms has limitations in terms of model generalization and performance comparison. To address these limitations, a general framework that incorporates non-Gaussian errors and robustness to fit different types of dissimilarities is developed. Then, an adaptive inference method using annealed Sequential Monte Carlo algorithm for Bayesian multidimensional scaling is proposed. This algorithm performs inference sequentially in time and provides an approximate posterior distribution over the points' coordinates in a low-dimensional space and an unbiased estimator for the marginal likelihood. In this study, we compare the performance of different models based on marginal likelihoods, which are produced as a byproduct of the adaptive annealed Sequential Monte Carlo algorithm. Using synthetic and real data, we demonstrate the effectiveness of the proposed algorithm. Our results show that the proposed algorithm outperforms other benchmark algorithms under the same computational budget based on common metrics used in the literature. The implementation of our proposed method and applications are available at //github.com/nunujiarui/GBMDS.

貝葉斯網/貝葉斯網絡 · Networking · 結構化學習 · Performer · Learning ·

2023 年 6 月 27 日

The Dual PC Algorithm and the Role of Gaussianity for Structure Learning of Bayesian Networks

Enrico Giudice,Jack Kuipers,Giusi Moffa

Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. The dual PC algorithm is a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can also perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual PC algorithm outperforms the classic PC algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Additionally, we show that the dual PC algorithm applies for Gaussian copula models, and demonstrate its performance in that setting.

可辨認的 · 成對型 · 可理解性 · Analysis · Networking ·

2023 年 6 月 27 日

A new classification framework for high-dimensional data

Xiangbo Mo,Hao Chen

Classification is a classic problem but encounters lots of challenges when dealing with a large number of features, which is common in many modern applications, such as identifying tumor sub-types from genomic data or categorizing customer attitudes based on on-line reviews. We propose a new framework that utilizes the ranks of pairwise distances among observations and identifies a common pattern under moderate to high dimensions that has been overlooked before. The proposed method exhibits superior classification power over existing methods under a variety of scenarios. Furthermore, the proposed method can be applied to non-Euclidean data objects, such as network data. We illustrate the method through an analysis of Neuropixels data where neurons are classified based on their firing activities. Additionally, we explore a related approach that is simpler to understand and investigates key quantities that play essential roles in our novel approach.

隨機場 · binary · Performer · 傳感器 · FC ·

2023 年 6 月 26 日

Binary Spatial Random Field Reconstruction from Non-Gaussian Inhomogeneous Time-series Observations

Shunan Sheng,Qikun Xiang,Ido Nevat,Ariel Neufeld

We develop a new model for spatial random field reconstruction of a binary-valued spatial phenomenon. In our model, sensors are deployed in a wireless sensor network across a large geographical region. Each sensor measures a non-Gaussian inhomogeneous temporal process which depends on the spatial phenomenon. Two types of sensors are employed: one collects point observations at specific time points, while the other collects integral observations over time intervals. Subsequently, the sensors transmit these time-series observations to a Fusion Center (FC), and the FC infers the spatial phenomenon from these observations. We show that the resulting posterior predictive distribution is intractable and develop a tractable two-step procedure to perform inference. Firstly, we develop algorithms to perform approximate Likelihood Ratio Tests on the time-series observations, compressing them to a single bit for both point sensors and integral sensors. Secondly, once the compressed observations are transmitted to the FC, we utilize a Spatial Best Linear Unbiased Estimator (S-BLUE) to reconstruct the binary spatial random field at any desired spatial location. The performance of the proposed approach is studied using simulation. We further illustrate the effectiveness of our method using a weather dataset from the National Environment Agency (NEA) of Singapore with fields including temperature and relative humidity.

可辨認的 · Markov · CASES · AIM · 估計/估計量 ·

2023 年 6 月 25 日

The Proximal ID Algorithm

Ilya Shpitser,Zach Wood-Doughty,Eric J. Tchetgen Tchetgen

Unobserved confounding is a fundamental obstacle to establishing valid causal conclusions from observational data. Two complementary types of approaches have been developed to address this obstacle: obtaining identification using fortuitous external aids, such as instrumental variables or proxies, or by means of the ID algorithm, using Markov restrictions on the full data distribution encoded in graphical causal models. In this paper we aim to develop a synthesis of the former and latter approaches to identification in causal inference to yield the most general identification algorithm in multivariate systems currently known -- the proximal ID algorithm. In addition to being able to obtain nonparametric identification in all cases where the ID algorithm succeeds, our approach allows us to systematically exploit proxies to adjust for the presence of unobserved confounders that would have otherwise prevented identification. In addition, we outline a class of estimation strategies for causal parameters identified by our method in an important special case. We illustrate our approach by simulation studies and a data application.

INFORMS · 假陽性 · 過估計 · MoDELS · 估計/估計量 ·

2023 年 6 月 24 日

Information criteria for structured parameter selection in high dimensional tree and graph models

Maarten Jansen

Parameter selection in high-dimensional models is typically finetuned in a way that keeps the (relative) number of false positives under control. This is because otherwise the few true positives may be dominated by the many possible false positives. This happens, for instance, when the selection follows from a naive optimisation of an information criterion, such as AIC or Mallows's Cp. It can be argued that the overestimation of the selection comes from the optimisation process itself changing the statistics of the selected variables, in a way that the information criterion no longer reflects the true divergence between the selection and the data generating process. In lasso, the overestimation can also be linked to the shrinkage estimator, which makes the selection too tolerant of false positive selections. For these reasons, this paper works on refined information criteria, carefully balancing false positives and false negatives, for use with estimators without shrinkage. In particular, the paper develops corrected Mallows's Cp criteria for structured selection in trees and graphical models.

異常點 · 估計/估計量 · 均值 · 穩健性 · Weight ·

2023 年 6 月 24 日

High-dimensional outlier detection and variable selection via adaptive weighted mean regression

Jiaqi Li,Linglong Kong,Bei Jiang,Wei Tu

from arxiv, 8 Tables, 4 figures

This paper proposes an adaptive penalized weighted mean regression for outlier detection of high-dimensional data. In comparison to existing approaches based on the mean shift model, the proposed estimators demonstrate robustness against outliers present in both response variables and/or covariates. By utilizing the adaptive Huber loss function, the proposed method is effective in high-dimensional linear models characterized by heavy-tailed and heteroscedastic error distributions. The proposed framework enables simultaneous and collaborative estimation of regression parameters and outlier detection. Under regularity conditions, outlier detection consistency and oracle inequalities of robust estimates in high-dimensional settings are established. Additionally, theoretical robustness properties, such as the breakdown point and a smoothed limiting influence function, are ascertained. Extensive simulation studies and a breast cancer survival data are used to evaluate the numerical performance of the proposed method, demonstrating comparable or superior variable selection and outlier detection capabilities.

Group Lasso · GROUP · 估計/估計量 · 推斷 · MoDELS ·

2023 年 6 月 24 日

Selective inference using randomized group lasso estimators for general models

Yiling Huang,Sarah Pirenne,Snigdha Panigrahi,Gerda Claeskens

from arxiv, 51 pages, 4 figures, 1 table

Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimization problem is studied. The added randomization allows us to construct a post-selection likelihood which we show to be adequate for selective inference when conditioning on the event of the selection of the grouped covariates. This likelihood also provides a selective point estimator, accounting for the selection by the group lasso. Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume. The selective inference method for grouped lasso is illustrated on data from the national health and nutrition examination survey while simulations showcase its behaviour and favorable comparison with other methods.

似然 · Analysis · Extensibility · 平穩的 · 馬爾可夫鏈蒙特卡羅 ·

2023 年 6 月 23 日

A nonparametrically corrected likelihood for Bayesian spectral analysis of multivariate time series

Yixuan Liu,Claudia Kirch,Jeong Eun Lee,Renate Meyer

This paper presents a novel approach to Bayesian nonparametric spectral analysis of stationary multivariate time series. Starting with a parametric vector-autoregressive model, the parametric likelihood is nonparametrically adjusted in the frequency domain to account for potential deviations from parametric assumptions. We show mutual contiguity of the nonparametrically corrected likelihood, the multivariate Whittle likelihood approximation and the exact likelihood for Gaussian time series. A multivariate extension of the nonparametric Bernstein-Dirichlet process prior for univariate spectral densities to the space of Hermitian positive definite spectral density matrices is specified directly on the correction matrices. An infinite series representation of this prior is then used to develop a Markov chain Monte Carlo algorithm to sample from the posterior distribution. The code is made publicly available for ease of use and reproducibility. With this novel approach we provide a generalization of the multivariate Whittle-likelihood-based method of Meier et al. (2020) as well as an extension of the nonparametrically corrected likelihood for univariate stationary time series of Kirch et al. (2019) to the multivariate case. We demonstrate that the nonparametrically corrected likelihood combines the efficiencies of a parametric with the robustness of a nonparametric model. Its numerical accuracy is illustrated in a comprehensive simulation study. We illustrate its practical advantages by a spectral analysis of two environmental time series data sets: a bivariate time series of the Southern Oscillation Index and fish recruitment and time series of windspeed data at six locations in California.