亚洲色偷偷色噜噜狠狠99网VR,东京热加勒比中文无码,国产午夜在线观看一区二区三区,国产精品久久久尹人久久

A popular heuristic method for improving clustering results is to apply dimensionality reduction before running clustering algorithms. It has been observed that spectral-based dimensionality reduction tools, such as PCA or SVD, improve the performance of clustering algorithms in many applications. This phenomenon indicates that spectral method not only serves as a dimensionality reduction tool, but also contributes to the clustering procedure in some sense. It is an interesting question to understand the behavior of spectral steps in clustering problems. As an initial step in this direction, this paper studies the power of vanilla-SVD algorithm in the stochastic block model (SBM). We show that, in the symmetric setting, vanilla-SVD algorithm recovers all clusters correctly. This result answers an open question posed by Van Vu (Combinatorics Probability and Computing, 2018) in the symmetric setting.

相關內容

簇

關注 1

Tensor · MoDELS · Analysis · 推斷 · Markov ·

2023 年 11 月 10 日

Bayesian Tensor Factorisations for Time Series of Counts

Zhongzhen Wang,Petros Dellaportas,Ioannis Kosmidis

We propose a flexible nonparametric Bayesian modelling framework for multivariate time series of count data based on tensor factorisations. Our models can be viewed as infinite state space Markov chains of known maximal order with non-linear serial dependence through the introduction of appropriate latent variables. Alternatively, our models can be viewed as Bayesian hierarchical models with conditionally independent Poisson distributed observations. Inference about the important lags and their complex interactions is achieved via MCMC. When the observed counts are large, we deal with the resulting computational complexity of Bayesian inference via a two-step inferential strategy based on an initial analysis of a training set of the data. Our methodology is illustrated using simulation experiments and analysis of real-world data.

Processing（編程語言） · Analysis · 張成子空間 · Networking · 極小點 ·

2023 年 11 月 10 日

On the Design and Analysis of Parallel and Distributed Algorithms

Rajendra Purohit,K R Chowdhary,S D Purohit

from arxiv, 9 pages

Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide distributed processing using distributed file systems and processing units, while network is modeled as minimum cost spanning tree. On the other hand, the parallel processing chooses different language platforms, data parallel vs. parallel programming, and GPUs. Processing units, memory elements and storage are connected through dynamic distributed networks in the form of spanning trees. The article presents foundational algorithms, analysis, and efficiency considerations.

估計/估計量 · INFORMS · Integration · MoDELS · CASES ·

2023 年 11 月 9 日

Integration of Summary Information from External Studies for Semiparametric Models

Jianxuan Zang,K. C. G. Chan,Fei Gao

With the development of biomedical science, researchers have increasing access to an abundance of studies focusing on similar research questions. There is a growing interest in the integration of summary information from those studies to enhance the efficiency of estimation in their own internal studies. In this work, we present a comprehensive framework on integration of summary information from external studies when the data are modeled by semiparametric models. Our novel framework offers straightforward estimators that update conventional estimations with auxiliary information. It addresses computational challenges by capitalizing on the intricate mathematical structure inherent to the problem. We demonstrate the conditions when the proposed estimators are theoretically more efficient than initial estimate based solely on internal data. Several special cases such as proportional hazards model in survival analysis are provided with numerical examples.

Performer · Analysis · PARCO · 分解的 · 可理解性 ·

2023 年 11 月 9 日

Analysis and Characterization of Performance Variability for OpenMP Runtime

Minyu Cui,Nikela Papadopoulou,Miquel Pericàs

from arxiv, To appear at ROSS 2023 (International Workshop on Runtime and Operating Systems for Supercomputers), held in conjunction with SC23

In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance analysis of OpenMP benchmarks regarding the variation of execution time, and determine the potential factors causing performance variability. Our work offers some understanding of performance distributions and directions for future work on how to mitigate variability for OpenMP-based applications. Two representative OpenMP benchmarks from the EPCC OpenMP micro-benchmark suite and BabelStream are run across two x86 multicore platforms featuring up to 256 threads. From the obtained results, we characterize and explain the execution time variability as a function of thread-pinning, simultaneous multithreading (SMT) and core frequency variation.

估計/估計量 · 極大似然 · 似然 · 最大似然估計 · PCA ·

2023 年 11 月 8 日

On the Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis

Arghya Datta,Sayak Chakrabarty

from arxiv, 15 pages, 1 figure, to appear in NeurIPS 2023

Probabilistic principal component analysis (PPCA) is currently one of the most used statistical tools to reduce the ambient dimension of the data. From multidimensional scaling to the imputation of missing data, PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance. Despite this wide applicability in various fields, hardly any theoretical guarantees exist to justify the soundness of the maximal likelihood (ML) solution for this model. In fact, it is well known that the maximum likelihood estimation (MLE) can only recover the true model parameters up to a rotation. The main obstruction is posed by the inherent identifiability nature of the PPCA model resulting from the rotational symmetry of the parameterization. To resolve this ambiguity, we propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space. Furthermore, our consistency results encompass a more general class of estimators beyond the MLE. Strong consistency of the ML estimate and consequently strong covariance estimation of the PPCA model have also been established under a compactness assumption.

Learning · 特征空間 · 機器人 · 潛在 · 數據可用性 ·

2023 年 11 月 8 日

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

Garrett Thomas,Ching-An Cheng,Ricky Loynd,Felipe Vieira Frujeri,Vibhav Vineet,Mihai Jalobeanu,Andrey Kolobov

A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos -- a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX's generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX's transformers greatly helps in low-data regimes of learning from human-collected demonstrations. The paper's accompanying code and data are available at //microsoft.github.io/PLEX.

估計/估計量 · 泛函 · 參數化模型 · 傳感器 · MoDELS ·

2023 年 11 月 8 日

Stochastic Nonparametric Estimation of the Density-Flow Curve

Iaroslav Kriuchkov,Timo Kuosmanen

from arxiv, Revision submitted to Transportation Research Part B: Methodological. Results have not changed, additional figures and tables added. Section ordering reworked. Comparison between near-stationary and time-aggregated data added

The fundamental diagram serves as the foundation of traffic flow modeling for almost a century. With the increasing availability of road sensor data, deterministic parametric models have proved inadequate in describing the variability of real-world data, especially in congested area of the density-flow diagram. In this paper we estimate the stochastic density-flow relation introducing a nonparametric method called convex quantile regression. The proposed method does not depend on any prior functional form assumptions, but thanks to the concavity constraints, the estimated function satisfies the theoretical properties of the density-flow curve. The second contribution is to develop the new convex quantile regression with bags (CQRb) approach to facilitate practical implementation of CQR to the real-world data. We illustrate the CQRb estimation process using the road sensor data from Finland in years 2016-2018. Our third contribution is to demonstrate the excellent out-of-sample predictive power of the proposed CQRb method in comparison to the standard parametric deterministic approach.

簇 · Extensibility · Performer · Guidance · 教程 ·

2023 年 11 月 8 日

Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering

Ravil Mussabayev,Rustam Mussabayev

from arxiv, arXiv admin note: text overlap with arXiv:2310.09819

This study focuses on the optimization of the Big-means algorithm for clustering large-scale datasets, exploring four distinct parallelization strategies. We conducted extensive experiments to assess the computational efficiency, scalability, and clustering performance of each approach, revealing their benefits and limitations. The paper also delves into the trade-offs between computational efficiency and clustering quality, examining the impacts of various factors. Our insights provide practical guidance on selecting the best parallelization strategy based on available resources and dataset characteristics, contributing to a deeper understanding of parallelization techniques for the Big-means algorithm.

MoDELS · 講稿 · Learning · Sphering · 表示 ·

2023 年 11 月 2 日

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Hang Chen,Keqing Du,Chenguang Li,Xinyu Yang

from arxiv, under review

The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.

Analysis · SimPLe · Pair · FAST · 代碼 ·

2023 年 9 月 8 日

SciDataFlow: A Tool for Improving the Flow of Data through Science

Vince Buffalo

from arxiv, 3 pages, no figures

Managing data and code in open scientific research is complicated by two key problems: large datasets often cannot be stored alongside code in repository platforms like GitHub, and iterative analysis can lead to unnoticed changes to data, increasing the risk that analyses are based on older versions of data. Here, I introduce SciDataFlow: a fast, concurrent command-line tool paired with a simple Data Manifest specification. SciDataFlow streamlines tracking data changes, uploading data to remote repositories, and pulling in all data necessary to reproduce a computational analysis.