久草精品视频在线观看_亚洲人色大成年网站在线观看_GOGOGO免费视频观看高清韩国_人人摸人人操人人肏_高清不卡日本一不道综合_亚洲师生无码另类久久久_日本H纯肉无遮掩3D动漫

Statistical inference with non-probability survey samples is an emerging topic in survey sampling and official statistics and has gained increased attention from researchers and practitioners in the field. Much of the existing literature, however, assumes that the participation mechanism for non-probability samples is ignorable. In this paper, we develop a pseudo-likelihood approach to estimate participation probabilities for nonignorable non-probability samples when auxiliary information is available from an existing reference probability sample. We further construct three estimators for the finite population mean using regression-based prediction, inverse probability weighting (IPW), and augmented IPW estimators, and study their asymptotic properties. Variance estimation for the proposed methods is considered within the same framework. The efficiency of our proposed methods is demonstrated through simulation studies and a real data analysis using the ESPACOV survey on the effects of the COVID-19 pandemic in Spain.

相關內容

樣本

關注 2

簇 · 穩健性 · Learning · 標注 · 向量化 ·

2024 年 11 月 12 日

Robust Clustering on High-Dimensional Data with Stochastic Quantization

Anton Kozyriev,Vladimir Norkin

from arxiv, 22 pages, 5 figures, to be published in the International Scientific Technical Journal "Problems of Control and Informatics"

This paper addresses the limitations of conventional vector quantization algorithms, particularly K-Means and its variant K-Means++, and investigates the Stochastic Quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning tasks. Traditional clustering algorithms often suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as Mini-Batch K-Means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, the Stochastic Quantization algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data, comparing model accuracy across various ratios of labeled to unlabeled data. To address the challenge of high dimensionality, we employ a Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both the Stochastic Quantization algorithm and traditional quantization algorithms. Furthermore, we enhance the algorithm's convergence speed by introducing modifications with an adaptive learning rate.

統計量 · 推斷 · Extensibility · 泛函 · Less ·

2024 年 11 月 11 日

Statistical Inference with Limited Memory: A Survey

Tomer Berg,Or Ordentlich,Ofer Shayevitz

from arxiv, Published in JSAIT Special Issue

The problem of statistical inference in its various forms has been the subject of decades-long extensive research. Most of the effort has been focused on characterizing the behavior as a function of the number of available samples, with far less attention given to the effect of memory limitations on performance. Recently, this latter topic has drawn much interest in the engineering and computer science literature. In this survey paper, we attempt to review the state-of-the-art of statistical inference under memory constraints in several canonical problems, including hypothesis testing, parameter estimation, and distribution property testing/estimation. We discuss the main results in this developing field, and by identifying recurrent themes, we extract some fundamental building blocks for algorithmic construction, as well as useful techniques for lower bound derivations.

數據集 · GaN · MoDELS · state-of-the-art · Facebook AI Research ·

2024 年 11 月 11 日

Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data Generation

Wilhelm ?gren,Victorio úbeda Sosa

The generation of synthetic data is a state-of-the-art approach to leverage when access to real data is limited or privacy regulations limit the usability of sensitive data. A fair amount of research has been conducted on synthetic data generation for single-tabular datasets, but only a limited amount of research has been conducted on multi-tabular datasets with complex table relationships. In this paper we propose the algorithm HCTGAN to synthesize multi-tabular data from complex multi-tabular datasets. We compare our results to the probabilistic model HMA1. Our findings show that our proposed algorithm can more efficiently sample large amounts of synthetic data for deep and complex multi-tabular datasets, whilst achieving adequate data quality and always guaranteeing referential integrity. We conclude that the HCTGAN algorithm is suitable for generating large amounts of synthetic data efficiently for deep multi-tabular datasets with complex relationships. We additionally suggest that the HMA1 model should be used on smaller datasets when emphasis is on data quality.

圖 · 分解的 · 相似度 · TOOLS · ONCE ·

2024 年 11 月 11 日

Finite Variable Counting Logics with Restricted Requantification

Simon Ra?mann,Georg Schindling,Pascal Schweitzer

Counting logics with a bounded number of variables form one of the central concepts in descriptive complexity theory. Although they restrict the number of variables that a formula can contain, the variables can be nested within scopes of quantified occurrences of themselves. In other words, the variables can be requantified. We study the fragments obtained from counting logics by restricting requantification for some but not necessarily all the variables. Similar to the logics without limitation on requantification, we develop tools to investigate the restricted variants. Specifically, we introduce a bijective pebble game in which certain pebbles can only be placed once and for all, and a corresponding two-parametric family of Weisfeiler-Leman algorithms. We show close correspondences between the three concepts. By using a suitable cops-and-robber game and adaptations of the Cai-F\"urer-Immerman construction, we completely clarify the relative expressive power of the new logics. We show that the restriction of requantification has beneficial algorithmic implications in terms of graph identification. Indeed, we argue that with regard to space complexity, non-requantifiable variables only incur an additive polynomial factor when testing for equivalence. In contrast, for all we know, requantifiable variables incur a multiplicative linear factor. Finally, we observe that graphs of bounded tree-depth and 3-connected planar graphs can be identified using no, respectively, only a very limited number of requantifiable variables.

MoDELS · 似然 · 樣本 · Learning · 極大 ·

2024 年 11 月 10 日

Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

Yaxuan Zhu,Jianwen Xie,Yingnian Wu,Ruiqi Gao

Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM. At each noise level, the two models are jointly estimated within a cooperative training framework: samples from the initializer serve as starting points that are refined by a few MCMC sampling steps from the EBM. The EBM is then optimized by maximizing recovery likelihood, while the initializer model is optimized by learning from the difference between the refined samples and the initial samples. In addition, we made several practical designs for EBM training to further improve the sample quality. Combining these advances, our approach significantly boost the generation performance compared to existing EBM methods on CIFAR-10 and ImageNet datasets. We also demonstrate the effectiveness of our models for several downstream tasks, including classifier-free guided generation, compositional generation, image inpainting and out-of-distribution detection.

流 · 正則化項 · CASE · 正則表達式 · 在線 ·

2024 年 11 月 8 日

Querying Perception Streams with Spatial Regular Expressions

Jacob Anderson,Georgios Fainekos,Bardh Hoxha,Hideki Okamoto,Danil Prokhorov

from arxiv, This work has been submitted to the International Journal on Software Tools for Technology Transfer

Perception in fields like robotics, manufacturing, and data analysis generates large volumes of temporal and spatial data to effectively capture their environments. However, sorting through this data for specific scenarios is a meticulous and error-prone process, often dependent on the application, and lacks generality and reproducibility. In this work, we introduce SpREs as a novel querying language for pattern matching over perception streams containing spatial and temporal data derived from multi-modal dynamic environments. To highlight the capabilities of SpREs, we developed the STREM tool as both an offline and online pattern matching framework for perception data. We demonstrate the offline capabilities of STREM through a case study on a publicly available AV dataset (Woven Planet Perception) and its online capabilities through a case study integrating STREM in ROS with the CARLA simulator. We also conduct performance benchmark experiments on various SpRE queries. Using our matching framework, we are able to find over 20,000 matches within 296 ms making STREM applicable in runtime monitoring applications.

Neural Networks · 概率圖模型 · Networking · GM · 無限 ·

2024 年 11 月 8 日

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

Boyao Li,Alexandar J. Thomson,Houssam Nassif,Matthew M. Engelhard,David Page

from arxiv, Accepted to NeurIPS 2024

Deep neural networks (DNNs) lack the precise semantics and definitive probabilistic interpretation of probabilistic graphical models (PGMs). In this paper, we propose an innovative solution by constructing infinite tree-structured PGMs that correspond exactly to neural networks. Our research reveals that DNNs, during forward propagation, indeed perform approximations of PGM inference that are precise in this alternative PGM structure. Not only does our research complement existing studies that describe neural networks as kernel machines or infinite-sized Gaussian processes, it also elucidates a more direct approximation that DNNs make to exact inference in PGMs. Potential benefits include improved pedagogy and interpretation of DNNs, and algorithms that can merge the strengths of PGMs and DNNs.

2024 年 11 月 7 日

Multivariate Probabilistic Time Series Forecasting with Correlated Errors

Vincent Zhihao Zheng,Lijun Sun

from arxiv, 38th Conference on Neural Information Processing Systems (NeurIPS 2024). This paper extends the work presented in arXiv:2305.17028 to a multivariate setting

Accurately modeling the correlation structure of errors is critical for reliable uncertainty quantification in probabilistic time series forecasting. While recent deep learning models for multivariate time series have developed efficient parameterizations for time-varying contemporaneous covariance, but they often assume temporal independence of errors for simplicity. However, real-world data often exhibit significant error autocorrelation and cross-lag correlation due to factors such as missing covariates. In this paper, we introduce a plug-and-play method that learns the covariance structure of errors over multiple steps for autoregressive models with Gaussian-distributed errors. To ensure scalable inference and computational efficiency, we model the contemporaneous covariance using a low-rank-plus-diagonal parameterization and capture cross-covariance through a group of independent latent temporal processes. The learned covariance matrix is then used to calibrate predictions based on observed residuals. We evaluate our method on probabilistic models built on RNNs and Transformer architectures, and the results confirm the effectiveness of our approach in improving predictive accuracy and uncertainty quantification without significantly increasing the parameter size.

知識 (knowledge) · Machine Learning · MoDELS · 學成 · Conformer ·

2022 年 5 月 10 日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Julian W?rmann,Daniel Bogdoll,Etienne Bührle,Han Chen,Evaristus Fuh Chuo,Kostadin Cvejoski,Ludger van Elst,Tobias Glei?ner,Philip Gottschall,Stefan Griesche,Christian Hellert,Christian Hesels,Sebastian Houben,Tim Joseph,Niklas Keil,Johann Kelsch,Hendrik K?nigshof,Erwin Kraft,Leonie Kreuser,Kevin Krone,Tobias Latka,Denny Mattern,Stefan Matthes,Mohsin Munir,Moritz Nekolla,Adrian Paschke,Maximilian Alexander Pintz,Tianming Qiu,Faraz Qureishi,Syed Tahseen Raza Rizvi,J?rg Reichardt,Laura von Rueden,Stefan Rudolph,Alexander Sagel,Gerhard Schunk,Hao Shen,Hendrik Stapelbroek,Vera Stehr,Gurucharan Srinivas,Anh Tuan Tran,Abhishek Vivekanandan,Ya Wang,Florian Wasserrab,Tino Werner,Christian Wirth,Stefan Zwicklbauer

from arxiv, 93 pages

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

圖 · Networking · INTERACT · INFORMS · 圖形處理器 ·

2020 年 11 月 25 日

Time-Series Event Prediction with Evolutionary State Graph

Wenjie Hu,Yang Yang,Ziqiang Cheng,Carl Yang,Xiang Ren

from arxiv, A long version of EvoNet (WSDM 2021)

The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.