97SE亚洲国产综合在线,亚洲精品无码国产爽快A片百度,91成人精品爽啪在线观看,色欲91精品国产免费观看

Undirected, binary network data consist of indicators of symmetric relations between pairs of actors. Regression models of such data allow for the estimation of effects of exogenous covariates on the network and for prediction of unobserved data. Ideally, estimators of the regression parameters should account for the inherent dependencies among relations in the network that involve the same actor. To account for such dependencies, researchers have developed a host of latent variable network models, however, estimation of many latent variable network models is computationally onerous and which model is best to base inference upon may not be clear. We propose the Probit Exchangeable (PX) model for undirected binary network data that is based on an assumption of exchangeability, which is common to many of the latent variable network models in the literature. The PX model can represent the first two moments of any exchangeable network model. We leverage the EM algorithm to obtain an approximate maximum likelihood estimator of the PX model that is extremely computationally efficient. Using simulation studies, we demonstrate the improvement in estimation of regression coefficients of the proposed model over existing latent variable network models. In an analysis of purchases of politically-aligned books, we demonstrate political polarization in purchase behavior and show that the proposed estimator significantly reduces runtime relative to estimators of latent variable network models, while maintaining predictive performance.

相關內容

Networking

關注 22

Networking：IFIP International Conferences on Networking。 Explanation：國際網絡會議。 Publisher：IFIP。 SIT：

Analysis · 訓練數據 · 模型評估 · MoDELS · 輸出 ·

2023 年 7 月 14 日

Global sensitivity analysis in the limited data setting with application to char combustion

Dongjin Lee,Elle Lavichant,Boris Kramer

from arxiv, 26 pages, 11 figures

In uncertainty quantification, variance-based global sensitivity analysis quantitatively determines the effect of each input random variable on the output by partitioning the total output variance into contributions from each input. However, computing conditional expectations can be prohibitively costly when working with expensive-to-evaluate models. Surrogate models can accelerate this, yet their accuracy depends on the quality and quantity of training data, which is expensive to generate (experimentally or computationally) for complex engineering systems. Thus, methods that work with limited data are desirable. We propose a diffeomorphic modulation under observable response preserving homotopy (D-MORPH) regression to train a polynomial dimensional decomposition surrogate of the output that minimizes the number of training data. The new method first computes a sparse Lasso solution and uses it to define the cost function. A subsequent D-MORPH regression minimizes the difference between the D-MORPH and Lasso solution. The resulting D-MORPH surrogate is more robust to input variations and more accurate with limited training data. We illustrate the accuracy and computational efficiency of the new surrogate for global sensitivity analysis using mathematical functions and an expensive-to-simulate model of char combustion. The new method is highly efficient, requiring only 15% of the training data compared to conventional regression.

估計/估計量 · 似然 · 廣義線性模型 · 線性的 · 線性模型 ·

2023 年 7 月 14 日

Bounded-memory adjusted scores estimation in generalized linear models with large data sets

Patrick Zietkiewicz,Ioannis Kosmidis

The widespread use of maximum Jeffreys'-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (2021, Biometrika), who show that the resulting estimates are also always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically linearly and slower than the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing $O(n)$ quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. We assess the procedures through a real-data application with millions of observations, and in high-dimensional logistic regression, where a large-scale simulation experiment produces concrete evidence for the existence of a simple adjustment to the maximum Jeffreys'-penalized likelihood estimates that delivers high accuracy in terms of signal recovery even in cases where estimates from ML and other recently-proposed corrective methods do not exist.

估計/估計量 · 分解的 · Copulas · MoDELS · 潛變量/隱變量 ·

2023 年 7 月 14 日

On factor copula-based mixed regression models

Pavel Krupskii,Bouchra R Nasri,Bruno N Remillard

In this article, a copula-based method for mixed regression models is proposed, where the conditional distribution of the response variable, given covariates, is modelled by a parametric family of continuous or discrete distributions, and the effect of a common latent variable pertaining to a cluster is modelled with a factor copula. We show how to estimate the parameters of the copula and the parameters of the margins, and we find the asymptotic behaviour of the estimation errors. Numerical experiments are performed to assess the precision of the estimators for finite samples. An example of an application is given using COVID-19 vaccination hesitancy from several countries. Computations are based on R package CopulaGAMM.

MoDELS · 樣本 · 頻率主義學派 · 可約的 · 樣本方差 ·

2023 年 7 月 13 日

Extreme data compression for Bayesian model comparison

Alan F. Heavens,Arrykrishna Mootoovaloo,Roberto Trotta,Elena Sellentin

from arxiv, 15 pages, 5 figures. Invited paper for JCAP 20th anniversary edition

We develop extreme data compression for use in Bayesian model comparison via the MOPED algorithm, as well as more general score compression. We find that Bayes factors from data compressed with the MOPED algorithm are identical to those from their uncompressed datasets when the models are linear and the errors Gaussian. In other nonlinear cases, whether nested or not, we find negligible differences in the Bayes factors, and show this explicitly for the Pantheon-SH0ES supernova dataset. We also investigate the sampling properties of the Bayesian Evidence as a frequentist statistic, and find that extreme data compression reduces the sampling variance of the Evidence, but has no impact on the sampling distribution of Bayes factors. Since model comparison can be a very computationally-intensive task, MOPED extreme data compression may present significant advantages in computational time.

估計/估計量 · 線性的 · 情景 · 優化器 · 線性回歸 ·

2023 年 7 月 13 日

A zero-estimator approach for estimating the signal level in a high-dimensional regression setting

Ilan Livne,David Azriel,Yair Goldberg

from arxiv, arXiv admin note: text overlap with arXiv:2205.05341, arXiv:2102.07203

Analysis of high-dimensional data, where the number of covariates is larger than the sample size, is a topic of current interest. In such settings, an important goal is to estimate the signal level $\tau^2$ and noise level $\sigma^2$, i.e., to quantify how much variation in the response variable can be explained by the covariates, versus how much of the variation is left unexplained. This thesis considers the estimation of these quantities in a semi-supervised setting, where for many observations only the vector of covariates $X$ is given with no responses $Y$. Our main research question is: how can one use the unlabeled data to better estimate $\tau^2$ and $\sigma^2$? We consider two frameworks: a linear regression model and a linear projection model in which linearity is not assumed. In the first framework, while linear regression is used, no sparsity assumptions on the coefficients are made. In the second framework, the linearity assumption is also relaxed and we aim to estimate the signal and noise levels defined by the linear projection. We first propose a naive estimator which is unbiased and consistent, under some assumptions, in both frameworks. We then show how the naive estimator can be improved by using zero-estimators, where a zero-estimator is a statistic arising from the unlabeled data, whose expected value is zero. In the first framework, we calculate the optimal zero-estimator improvement and discuss ways to approximate the optimal improvement. In the second framework, such optimality does no longer hold and we suggest two zero-estimators that improve the naive estimator although not necessarily optimally. Furthermore, we show that our approach reduces the variance for general initial estimators and we present an algorithm that potentially improves any initial estimator. Lastly, we consider four datasets and study the performance of our suggested methods.

MoDELS · 預測器/決策函數 · 方差 · 線性的 · 情景 ·

2023 年 7 月 13 日

Spatial regression modeling via the R2D2 framework

Eric Yanchenko,Howard D. Bondell,Brian J. Reich

Spatially dependent data arises in many applications, and Gaussian processes are a popular modelling choice for these scenarios. While Bayesian analyses of these problems have proven to be successful, selecting prior distributions for these complex models remains a difficult task. In this work, we propose a principled approach for setting prior distributions on model variance components by placing a prior distribution on a measure of model fit. In particular, we derive the distribution of the prior coefficient of determination. Placing a beta prior distribution on this measure induces a generalized beta prime prior distribution on the global variance of the linear predictor in the model. This method can also be thought of as shrinking the fit towards the intercept-only (null) model. We derive an efficient Gibbs sampler for the majority of the parameters and use Metropolis-Hasting updates for the others. Finally, the method is applied to a marine protection area data set. We estimate the effect of marine policies on biodiversity and conclude that no-take restrictions lead to a slight increase in biodiversity and that the majority of the variance in the linear predictor comes from the spatial effect.\vspace{12pt}

高斯分布 · MoDELS · 線性的 · CASE · 變換 ·

2023 年 7 月 12 日

Distribution-on-Distribution Regression with Wasserstein Metric: Multivariate Gaussian Case

Ryo Okano,Masaaki Imaizumi

from arxiv, 32 pages

Distribution data refers to a data set where each sample is represented as a probability distribution, a subject area receiving burgeoning interest in the field of statistics. Although several studies have developed distribution-to-distribution regression models for univariate variables, the multivariate scenario remains under-explored due to technical complexities. In this study, we introduce models for regression from one Gaussian distribution to another, utilizing the Wasserstein metric. These models are constructed using the geometry of the Wasserstein space, which enables the transformation of Gaussian distributions into components of a linear matrix space. Owing to their linear regression frameworks, our models are intuitively understandable, and their implementation is simplified because of the optimal transport problem's analytical solution between Gaussian distributions. We also explore a generalization of our models to encompass non-Gaussian scenarios. We establish the convergence rates of in-sample prediction errors for the empirical risk minimizations in our models. In comparative simulation experiments, our models demonstrate superior performance over a simpler alternative method that transforms Gaussian distributions into matrices. We present an application of our methodology using weather data for illustration purposes.

異常點 · 線性的 · binary · Continuity · Performer ·

2023 年 7 月 12 日

Outlier detection in regression: conic quadratic formulations

Andrés Gómez,José Neto

In many applications, when building linear regression models, it is important to account for the presence of outliers, i.e., corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic terms, each given by the product of a binary variable and a quadratic term of the continuous variables. Existing approaches in the literature, typically relying on the linearization of the cubic terms using big-M constraints, suffer from weak relaxation and poor performance in practice. In this work we derive stronger second-order conic relaxations that do not involve big-M constraints. Our computational experiments indicate that the proposed formulations are several orders-of-magnitude faster than existing big-M formulations in the literature for this problem.

Networking · 泛函 · binary · 分解的 · 可約的 ·

2023 年 7 月 12 日

fabisearch: A Package for Change Point Detection in and Visualization of the Network Structure of Multivariate High-Dimensional Time Series in R

Martin Ondrus,Ivor Cribben

Change point detection is a commonly used technique in time series analysis, capturing the dynamic nature in which many real-world processes function. With the ever increasing troves of multivariate high-dimensional time series data, especially in neuroimaging and finance, there is a clear need for scalable and data-driven change point detection methods. Currently, change point detection methods for multivariate high-dimensional data are scarce, with even less available in high-level, easily accessible software packages. To this end, we introduce the R package fabisearch, available on the Comprehensive R Archive Network (CRAN), which implements the factorized binary search (FaBiSearch) methodology. FaBiSearch is a novel statistical method for detecting change points in the network structure of multivariate high-dimensional time series which employs non-negative matrix factorization (NMF), an unsupervised dimension reduction and clustering technique. Given the high computational cost of NMF, we implement the method in C++ code and use parallelization to reduce computation time. Further, we also utilize a new binary search algorithm to efficiently identify multiple change points and provide a new method for network estimation for data between change points. We show the functionality of the package and the practicality of the method by applying it to a neuroimaging and a finance data set. Lastly, we provide an interactive, 3-dimensional, brain-specific network visualization capability in a flexible, stand-alone function. This function can be conveniently used with any node coordinate atlas, and nodes can be color coded according to community membership (if applicable). The output is an elegantly displayed network laid over a cortical surface, which can be rotated in the 3-dimensional space.

binary · 通道 · 相互獨立的 · 情景 · Less ·

2023 年 7 月 7 日

On Pseudolinear Codes for Correcting Adversarial Errors

Eric Ruzomberka,Homa Nikbakht,Christopher G. Brinton,H. Vincent Poor

We consider error-correction coding schemes for adversarial wiretap channels (AWTCs) in which the channel can a) read a fraction of the codeword bits up to a bound $r$ and b) flip a fraction of the bits up to a bound $p$. The channel can freely choose the locations of the bit reads and bit flips via a process with unbounded computational power. Codes for the AWTC are of broad interest in the area of information security, as they can provide data resiliency in settings where an attacker has limited access to a storage or transmission medium. We investigate a family of non-linear codes known as pseudolinear codes, which were first proposed by Guruswami and Indyk (FOCS 2001) for constructing list-decodable codes independent of the AWTC setting. Unlike general non-linear codes, pseudolinear codes admit efficient encoders and have succinct representations. We focus on unique decoding and show that random pseudolinear codes can achieve rates up to the binary symmetric channel (BSC) capacity $1-H_2(p)$ for any $p,r$ in the less noisy region: $p<1/2$ and $r<1-H_2(p)$ where $H_2(\cdot)$ is the binary entropy function. Thus, pseudolinear codes are the first known optimal-rate binary code family for the less noisy AWTC that admit efficient encoders. The above result can be viewed as a derandomization result of random general codes in the AWTC setting, which in turn opens new avenues for applying derandomization techniques to randomized constructions of AWTC codes. Our proof applies a novel concentration inequality for sums of random variables with limited independence which may be of interest as an analysis tool more generally.