东京热加勒比中文无码-91 一区二区三区四季

We propose a new class of models for variable clustering called Asymptotic Independent block (AI-block) models, which defines population-level clusters based on the independence of the maxima of a multivariate stationary mixing random process among clusters. This class of models is identifiable, meaning that there exists a maximal element with a partial order between partitions, allowing for statistical inference. We also present an algorithm for recovering the clusters of variables without specifying the number of clusters \emph{a priori}. Our work provides some theoretical insights into the consistency of our algorithm, demonstrating that under certain conditions it can effectively identify clusters in the data with a computational complexity that is polynomial in the dimension. This implies that groups can be learned nonparametrically in which block maxima of a dependent process are only sub-asymptotic. To further illustrate the significance of our work, we applied our method to neuroscience and environmental real-datasets. These applications highlight the potential and versatility of the proposed approach.

相關內容

簇

關注 1

多峰值 · Learning · DAG · 表示 · 無監督 ·

2023 年 11 月 8 日

Causal disentanglement of multimodal data

Elise Walker,Jonas A. Actor,Carianne Martinez,Nathaniel Trask

Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.

優化器 · MoDELS · 操作 · 可辨認的 · 設計 ·

2023 年 11 月 8 日

Optimization approaches for the design and operation of open-loop shallow geothermal systems

S. Halilovic,F. B?ttcher,K. Zosseder,T. Hamacher

from arxiv, 16 pages, 3 figures; submitted to Advances in Geosciences

The optimization of open-loop shallow geothermal systems, which includes both design and operational aspects, is an important research area aimed at improving their efficiency and sustainability and the effective management of groundwater as a shallow geothermal resource. This paper investigates various approaches to address optimization problems arising from these research and implementation questions about GWHP systems. The identified optimization approaches are thoroughly analyzed based on criteria such as computational cost and applicability. Moreover, a novel classification scheme is introduced that categorizes the approaches according to the types of groundwater simulation model and the optimization algorithm used. Simulation models are divided into two types: numerical and simplified (analytical or data-driven) models, while optimization algorithms are divided into gradient-based and derivative-free algorithms. Finally, a comprehensive review of existing approaches in the literature is provided, highlighting their strengths and limitations and offering recommendations for both the use of existing approaches and the development of new, improved ones in this field.

有向 · GROUP · 向量化 · 情景 · 概率密度函數 ·

2023 年 11 月 8 日

Multivariate generalized Pareto distributions along extreme directions

Anas Mourahib,Anna Kiriliouk,Johan Segers

When modeling a vector of risk variables, extreme scenarios are often of special interest. The peaks-over-thresholds method hinges on the notion that, asymptotically, the excesses over a vector of high thresholds follow a multivariate generalized Pareto distribution. However, existing literature has primarily concentrated on the setting when all risk variables are always large simultaneously. In reality, this assumption is often not met, especially in high dimensions. In response to this limitation, we study scenarios where distinct groups of risk variables may exhibit joint extremes while others do not. These discernible groups are derived from the angular measure inherent in the corresponding max-stable distribution, whence the term extreme direction. We explore such extreme directions within the framework of multivariate generalized Pareto distributions, with a focus on their probability density functions in relation to an appropriate dominating measure. Furthermore, we provide a stochastic construction that allows any prespecified set of risk groups to constitute the distribution's extreme directions. This construction takes the form of a smoothed max-linear model and accommodates the full spectrum of conceivable max-stable dependence structures. Additionally, we introduce a generic simulation algorithm tailored for multivariate generalized Pareto distributions, offering specific implementations for extensions of the logistic and H\"usler-Reiss families capable of carrying arbitrary extreme directions.

Integration · 線性的 · 數值分析 ·

2023 年 11 月 7 日

Semi-explicit integration of second order for weakly coupled poroelasticity

R. Altmann,R. Maier,B. Unger

We introduce a semi-explicit time-stepping scheme of second order for linear poroelasticity satisfying a weak coupling condition. Here, semi-explicit means that the system, which needs to be solved in each step, decouples and hence improves the computational efficiency. The construction and the convergence proof are based on the connection to a differential equation with two time delays, namely one and two times the step size. Numerical experiments confirm the theoretical results and indicate the applicability to higher-order schemes.

MoDELS · INFORMS · Performer · 估計/估計量 · 簇 ·

2023 年 11 月 7 日

Joint model for longitudinal and spatio-temporal survival data

Victor Medina-Olivares,Finn Lindgren,Raffaella Calabrese,Jonathan Crook

In credit risk analysis, survival models with fixed and time-varying covariates are widely used to predict a borrower's time-to-event. When the time-varying drivers are endogenous, modelling jointly the evolution of the survival time and the endogenous covariates is the most appropriate approach, also known as the joint model for longitudinal and survival data. In addition to the temporal component, credit risk models can be enhanced when including borrowers' geographical information by considering spatial clustering and its variation over time. We propose the Spatio-Temporal Joint Model (STJM) to capture spatial and temporal effects and their interaction. This Bayesian hierarchical joint model reckons the survival effect of unobserved heterogeneity among borrowers located in the same region at a particular time. To estimate the STJM model for large datasets, we consider the Integrated Nested Laplace Approximation (INLA) methodology. We apply the STJM to predict the time to full prepayment on a large dataset of 57,258 US mortgage borrowers with more than 2.5 million observations. Empirical results indicate that including spatial effects consistently improves the performance of the joint model. However, the gains are less definitive when we additionally include spatio-temporal interactions.

Learning · 特征空間 · MoDELS · AIM · 輸入空間 ·

2023 年 11 月 6 日

FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry data

Lisa Weijler,Florian Kowarsch,Michael Reiter,Pedro Hermosilla,Margarita Maurer-Granofszky,Michael Dworzak

from arxiv, Accepted at WACV 2024

While model architectures and training strategies have become more generic and flexible with respect to different data modalities over the past years, a persistent limitation lies in the assumption of fixed quantities and arrangements of input features. This limitation becomes particularly relevant in scenarios where the attributes captured during data acquisition vary across different samples. In this work, we aim at effectively leveraging data with varying features, without the need to constrain the input space to the intersection of potential feature sets or to expand it to their union. We propose a novel architecture that can directly process data without the necessity of aligned feature modalities by learning a general embedding space that captures the relationship between features across data samples with varying sets of features. This is achieved via a set-transformer architecture augmented by feature-encoder layers, thereby enabling the learning of a shared latent feature space from data originating from heterogeneous feature spaces. The advantages of the model are demonstrated for automatic cancer cell detection in acute myeloid leukemia in flow cytometry data, where the features measured during acquisition often vary between samples. Our proposed architecture's capacity to operate seamlessly across incongruent feature spaces is particularly relevant in this context, where data scarcity arises from the low prevalence of the disease. The code is available for research purposes at //github.com/lisaweijler/FATE.

匯聚 · 優化器 · Learning · Processing（編程語言） · 可約的 ·

2023 年 11 月 6 日

Optimal data pooling for shared learning in maintenance operations

Collin Drent,Melvin Drent,Geert-Jan van Houtum

We study optimal data pooling for shared learning in two common maintenance operations: condition-based maintenance and spare parts management. We consider a set of systems subject to Poisson input -- the degradation or demand process -- that are coupled through an a-priori unknown rate. Decision problems involving these systems are high-dimensional Markov decision processes (MDPs) and hence notoriously difficult to solve. We present a decomposition result that reduces such an MDP to two-dimensional MDPs, enabling structural analyses and computations. Leveraging this decomposition, we (i) demonstrate that pooling data can lead to significant cost reductions compared to not pooling, and (ii) show that the optimal policy for the condition-based maintenance problem is a control limit policy, while for the spare parts management problem, it is an order-up-to level policy, both dependent on the pooled data.

近似 · 相互獨立的 · 推斷 · 近似誤差 · Automator ·

2023 年 11 月 5 日

Independent finite approximations for Bayesian nonparametric inference

Tin D. Nguyen,Jonathan Huggins,Lorenzo Masoero,Lester Mackey,Tamara Broderick

from arxiv, The paper has been accepted for publication in Bayesian Analysis. Currently, it is posted on Bayesian Analysis Advance Publication

Completely random measures (CRMs) and their normalizations (NCRMs) offer flexible models in Bayesian nonparametrics. But their infinite dimensionality presents challenges for inference. Two popular finite approximations are truncated finite approximations (TFAs) and independent finite approximations (IFAs). While the former have been well-studied, IFAs lack similarly general bounds on approximation error, and there has been no systematic comparison between the two options. In the present work, we propose a general recipe to construct practical finite-dimensional approximations for homogeneous CRMs and NCRMs, in the presence or absence of power laws. We call our construction the automated independent finite approximation (AIFA). Relative to TFAs, we show that AIFAs facilitate more straightforward derivations and use of parallel computing in approximate inference. We upper bound the approximation error of AIFAs for a wide class of common CRMs and NCRMs -- and thereby develop guidelines for choosing the approximation level. Our lower bounds in key cases suggest that our upper bounds are tight. We prove that, for worst-case choices of observation likelihoods, TFAs are more efficient than AIFAs. Conversely, we find that in real-data experiments with standard likelihoods, AIFAs and TFAs perform similarly. Moreover, we demonstrate that AIFAs can be used for hyperparameter estimation even when other potential IFA options struggle or do not apply.

相似度 · 簇 · 相互獨立的 · MoDELS · CASES ·

2023 年 11 月 3 日

Multilayer hypergraph clustering using the aggregate similarity matrix

Kalle Alaluusua,Konstantin Avrachenkov,B. R. Vinay Kumar,Lasse Leskel?

from arxiv, 16 pages, 3 tables. Reason for replacement on 3 Nov 2023: incorporating the possibility of non-uniform layers. Reason for replacement on 18 May 2023: improving clarity of the presentation and clarifying the contribution/novelty of the paper

We consider the community recovery problem on a multilayer variant of the hypergraph stochastic block model (HSBM). Each layer is associated with an independent realization of a d-uniform HSBM on N vertices. Given the similarity matrix containing the aggregated number of hyperedges incident to each pair of vertices, the goal is to obtain a partition of the N vertices into disjoint communities. In this work, we investigate a semidefinite programming (SDP) approach and obtain information-theoretic conditions on the model parameters that guarantee exact recovery both in the assortative and the disassortative cases.

估計/估計量 · 推斷 · 可理解性 · 分解的 · TOOLS ·

2023 年 11 月 2 日

Inference on summaries of a model-agnostic longitudinal variable importance trajectory

Brian D. Williamson,Erica E. M. Moodie,Susan M. Shortreed

from arxiv, 65 pages (29 main, 36 supplementary), 5 figures (3 main, 2 supplementary), 19 tables (2 main, 17 supplementary)

In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures, we define summaries of variable importance trajectories. These measures can be estimated and the same approaches for inference can be applied regardless of the choice of the algorithm(s) used to estimate the prediction function. We propose a nonparametric efficient estimation and inference procedure as well as a null hypothesis testing procedure that are valid even when complex machine learning tools are used for prediction. Through simulations, we demonstrate that our proposed procedures have good operating characteristics, and we illustrate their use by investigating the longitudinal importance of risk factors for suicide attempt.