唯美清纯另类亚洲一区二区,激情欧美综合

One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is a widely used model, for which various estimation methods have been developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce a pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community detection under SCWA and show that it is community detection consistent. In addition, we investigate model selection in terms of the number of communities and feature selection for the pairwise covariates, and propose two corresponding algorithms. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.

相關內容

社區檢測

關注 1

估計/估計量 · Analysis · 正交 · 有偏 · MoDELS ·

2023 年 5 月 28 日

Regression analysis of longitudinal data with mixed synchronous and asynchronous longitudinal covariates

Zhuowei Sun,Hongyuan Cao,Li Chen,Jason P. Fine

In linear models, omitting a covariate that is orthogonal to covariates in the model does not result in biased coefficient estimation. This in general does not hold for longitudinal data, where additional assumptions are needed to get unbiased coefficient estimation in addition to the orthogonality between omitted longitudinal covariates and longitudinal covariates in the model. We propose methods to mitigate the omitted variable bias under weaker assumptions. A two-step estimation procedure is proposed for inference about the asynchronous longitudinal covariates, when such covariates are observed. For mixed synchronous and asynchronous longitudinal covariates, we get parametric rate of convergence for the coefficient estimation of the synchronous longitudinal covariates by the two-step method. Extensive simulation studies provide numerical support for the theoretical findings. We illustrate the performance of our method on dataset from the Alzheimers Disease Neuroimaging Initiative study.

簇 · 近似 · 穩健性 · MoDELS · 貪心逐層預訓練 ·

2023 年 5 月 28 日

Overlapping and Robust Edge-Colored Clustering in Hypergraphs

Alex Crane,Brian Lavallee,Blair D. Sullivan,Nate Veldt

A recent trend in data mining has explored (hyper)graph clustering algorithms for data with categorical relationship types. Such algorithms have applications in the analysis of social, co-authorship, and protein interaction networks, to name a few. Many such applications naturally have some overlap between clusters, a nuance which is missing from current combinatorial models. Additionally, existing models lack a mechanism for handling noise in datasets. We address these concerns by generalizing Edge-Colored Clustering, a recent framework for categorical clustering of hypergraphs. Our generalizations allow for a budgeted number of either (a) overlapping cluster assignments or (b) node deletions. For each new model we present a greedy algorithm which approximately minimizes an edge mistake objective, as well as bicriteria approximations where the second approximation factor is on the budget. Additionally, we address the parameterized complexity of each problem, providing FPT algorithms and hardness results.

INFORMS · 簇 · 可辨認的 · MoDELS · 優化器 ·

2023 年 5 月 27 日

Overlapping Indices for Dynamic Information Borrowing in Bayesian Hierarchical Modeling

Xuetao Lu,J. Jack Lee

Bayesian hierarchical model (BHM) has been widely used in synthesizing information across subgroups. Identifying heterogeneity in the data and determining proper strength of borrow have long been central goals pursued by researchers. Because these two goals are interconnected, we must consider them together. This joint consideration presents two fundamental challenges: (1) How can we balance the trade-off between homogeneity within the cluster and information gain through borrowing? (2) How can we determine the borrowing strength dynamically in different clusters? To tackle challenges, first, we develop a theoretical framework for heterogeneity identification and dynamic information borrowing in BHM. Then, we propose two novel overlapping indices: the overlapping clustering index (OCI) for identifying the optimal clustering result and the overlapping borrowing index (OBI) for assigning proper borrowing strength to clusters. By incorporating these indices, we develop a new method BHMOI (Bayesian hierarchical model with overlapping indices). BHMOI includes a novel weighted K-Means clustering algorithm by maximizing OCI to obtain optimal clustering results, and embedding OBI into BHM for dynamically borrowing within clusters. BHMOI can achieve efficient and robust information borrowing with desirable properties. Examples and simulation studies are provided to demonstrate the effectiveness of BHMOI in heterogeneity identification and dynamic information borrowing.

噪聲 · 異常檢測 · 穩健性 · AIM · Bagging ·

2023 年 5 月 26 日

Discussion of Features for Acoustic Anomaly Detection under Industrial Disturbing Noise in an End-of-Line Test of Geared Motors

Peter Wissbrock,David Pelkmann,Yvonne Richter

from arxiv, \c{opyright} 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

In the end-of-line test of geared motors, the evaluation of product qual-ity is important. Due to time constraints and the high diversity of variants, acous-tic measurements are more economical than vibration measurements. However, the acoustic data is affected by industrial disturbing noise. Therefore, the aim of this study is to investigate the robustness of features used for anomaly detection in geared motor end-of-line testing. A real-world dataset with typical faults and acoustic disturbances is recorded by an acoustic array. This includes industrial noise from the production and systematically produced disturbances, used to compare the robustness. Overall, it is proposed to apply features extracted from a log-envelope spectrum together with psychoacoustic features. The anomaly de-tection is done by using the isolation forest or the more universal bagging random miner. Most disturbances can be circumvented, while the use of a hammer or air pressure often causes problems. In general, these results are important for condi-tion monitoring tasks that are based on acoustic or vibration measurements. Fur-thermore, a real-world problem description is presented to improve common sig-nal processing and machine learning tasks.

列 · MoDELS · 數據集 · 可辨認的 · INFORMS ·

2023 年 5 月 26 日

Detecting Errors in Numerical Data via any Regression Model

Hang Zhou,Jonas Mueller,Mayank Kumar,Jane-Ling Wang,Jing Lei

Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates. Here we consider estimating \emph{which} data values are incorrect along a numerical column. We present a model-agnostic approach that can utilize \emph{any} regressor (i.e.\ statistical or machine learning model) which was fit to predict values in this column based on the other variables in the dataset. By accounting for various uncertainties, our approach distinguishes between genuine anomalies and natural data fluctuations, conditioned on the available information in the dataset. We establish theoretical guarantees for our method and show that other approaches like conformal inference struggle to detect errors. We also contribute a new error detection benchmark involving 5 regression datasets with real-world numerical errors (for which the true values are also known). In this benchmark and additional simulation studies, our method identifies incorrect values with better precision/recall than other approaches.

Networking · 潛變量/隱變量 · 估計/估計量 · 可交換的 · binary ·

2023 年 5 月 25 日

Regression of binary network data with exchangeable latent errors

Frank W. Marrs,Bailey K. Fosdick

Undirected, binary network data consist of indicators of symmetric relations between pairs of actors. Regression models of such data allow for the estimation of effects of exogenous covariates on the network and for prediction of unobserved data. Ideally, estimators of the regression parameters should account for the inherent dependencies among relations in the network that involve the same actor. To account for such dependencies, researchers have developed a host of latent variable network models, however, estimation of many latent variable network models is computationally onerous and which model is best to base inference upon may not be clear. We propose the Probit Exchangeable (PX) model for undirected binary network data that is based on an assumption of exchangeability, which is common to many of the latent variable network models in the literature. The PX model can represent the first two moments of any exchangeable network model. We leverage the EM algorithm to obtain an approximate maximum likelihood estimator of the PX model that is extremely computationally efficient. Using simulation studies, we demonstrate the improvement in estimation of regression coefficients of the proposed model over existing latent variable network models. In an analysis of purchases of politically-aligned books, we demonstrate political polarization in purchase behavior and show that the proposed estimator significantly reduces runtime relative to estimators of latent variable network models, while maintaining predictive performance.

CASES · 分解的 · Analysis · MoDELS · 可辨認的 ·

2023 年 5 月 25 日

Exploring Turn Signal Usage Patterns in Lane Changes: A Bayesian Hierarchical Modelling Analysis of Realistic Driving Data

Sarang Jokhio,Pierluigi Olleja,Jonas B?rgman,Fei Yan,Martin Baumann

Using turn signals to convey a driver's intention to change lanes provides a direct and unambiguous way of communicating with nearby drivers. Nonetheless, past research has indicated that drivers may not always use their turn signals prior to starting a lane change. In this study, we analyze realistic driving data to investigate turn signal usage during lane changes on highways in and around Gothenburg, Sweden. We examine turn signal usage and identify factors that have an influence on it by employing Bayesian hierarchical modelling (BHM). The results showed that a turn signal was used in approximately 60% of cases before starting a lane change, while it was only used after the start of a lane change in 33% of cases. In 7% of cases, a turn signal was not used at all. Additionally, the BHM results reveal that various factors influence turn signal usage. The study concludes that understanding the factors that affect turn signal usage is crucial for improving traffic safety through policy-making and designing algorithms for autonomous vehicles for future mixed traffic.

GROUP · 穩健性 · 可行 · 可約的 · Less ·

2023 年 5 月 25 日

Rectifying Group Irregularities in Explanations for Distribution Shift

Adam Stein,Yinjun Wu,Eric Wong,Mayur Naik

from arxiv, 19 pages, 5 figures

It is well-known that real-world changes constituting distribution shift adversely affect model performance. How to characterize those changes in an interpretable manner is poorly understood. Existing techniques to address this problem take the form of shift explanations that elucidate how to map samples from the original distribution toward the shifted one by reducing the disparity between these two distributions. However, these methods can introduce group irregularities, leading to explanations that are less feasible and robust. To address these issues, we propose Group-aware Shift Explanations (GSE), a method that produces interpretable explanations by leveraging worst-group optimization to rectify group irregularities. We demonstrate how GSE not only maintains group structures, such as demographic and hierarchical subpopulations, but also enhances feasibility and robustness in the resulting explanations in a wide range of tabular, language, and image settings.

流形 · 估計/估計量 · MoDELS · 評分函數 · 有向 ·

2023 年 5 月 25 日

Your diffusion model secretly knows the dimension of the data manifold

Jan Stanczuk,Georgios Batzolis,Teo Deveney,Carola-Bibiane Sch?nlieb

from arxiv, arXiv admin note: text overlap with arXiv:2207.09786

In this work, we propose a novel framework for estimating the dimension of the data manifold using a trained diffusion model. A diffusion model approximates the score function i.e. the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that, if the data concentrates around a manifold embedded in the high-dimensional ambient space, then as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximal likelihood increase. Therefore, for small levels of corruption, the diffusion model provides us with access to an approximation of the normal bundle of the data manifold. This allows us to estimate the dimension of the tangent space, thus, the intrinsic dimension of the data manifold. To the best of our knowledge, our method is the first estimator of the data manifold dimension based on diffusion models and it outperforms well established statistical estimators in controlled experiments on both Euclidean and image data.

目標檢測 · Extensibility · Vision · 計算機視覺 · 張成子空間 ·

2019 年 5 月 13 日

Object Detection in 20 Years: A Survey

Zhengxia Zou,Zhenwei Shi,Yuhong Guo,Jieping Ye

from arxiv, This work has been submitted to the IEEE TPAMI for possible publication

Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.