男女一边脱一边亲一边膜-国产精品性爱视频亚洲国产黄片

Identifying replicable signals across different studies provides stronger scientific evidence and more powerful inference. Existing literature on high dimensional applicability analysis either imposes strong modeling assumptions or has low power. We develop a powerful and robust empirical Bayes approach for high dimensional replicability analysis. Our method effectively borrows information from different features and studies while accounting for heterogeneity. We show that the proposed method has better power than competing methods while controlling the false discovery rate, both empirically and theoretically. Analyzing datasets from the genome-wide association studies reveals new biological insights that otherwise cannot be obtained by using existing methods.

相關內容

Analysis

關注 2

優化器 · Performer · 子采樣 · 損失 · 黑盒 ·

2024 年 5 月 29 日

Diffeomorphic interpolation for efficient persistence-based topological optimization

Mathieu Carriere,Marc Theveneau,Théo Lacombe

Topological Data Analysis (TDA) provides a pipeline to extract quantitative topological descriptors from structured objects. This enables the definition of topological loss functions, which assert to what extent a given object exhibits some topological properties. These losses can then be used to perform topological optimizationvia gradient descent routines. While theoretically sounded, topological optimization faces an important challenge: gradients tend to be extremely sparse, in the sense that the loss function typically depends on only very few coordinates of the input object, yielding dramatically slow optimization schemes in practice.Focusing on the central case of topological optimization for point clouds, we propose in this work to overcome this limitation using diffeomorphic interpolation, turning sparse gradients into smooth vector fields defined on the whole space, with quantifiable Lipschitz constants. In particular, we show that our approach combines efficiently with subsampling techniques routinely used in TDA, as the diffeomorphism derived from the gradient computed on a subsample can be used to update the coordinates of the full input object, allowing us to perform topological optimization on point clouds at an unprecedented scale. Finally, we also showcase the relevance of our approach for black-box autoencoder (AE) regularization, where we aim at enforcing topological priors on the latent spaces associated to fixed, pre-trained, black-box AE models, and where we show thatlearning a diffeomorphic flow can be done once and then re-applied to new data in linear time (while vanilla topological optimization has to be re-run from scratch). Moreover, reverting the flow allows us to generate data by sampling the topologically-optimized latent space directly, yielding better interpretability of the model.

state-of-the-art · 有向 · Analysis · 應用統計 ·

2024 年 5 月 29 日

Categorization of 31 computational methods to detect spatially variable genes from spatially resolved transcriptomics data

Guanao Yan,Shuo Harper Hua,Jingyi Jessica Li

In the analysis of spatially resolved transcriptomics data, detecting spatially variable genes (SVGs) is crucial. Numerous computational methods exist, but varying SVG definitions and methodologies lead to incomparable results. We review 31 state-of-the-art methods, categorizing SVGs into three types: overall, cell-type-specific, and spatial-domain-marker SVGs. Our review explains the intuitions underlying these methods, summarizes their applications, and categorizes the hypothesis tests they use in the trade-off between generality and specificity for SVG detection. We discuss challenges in SVG detection and propose future directions for improvement. Our review offers insights for method developers and users, advocating for category-specific benchmarking.

近似 · MASS · FAST · 分段 · 線性的 ·

2024 年 5 月 28 日

Fast solution of incompressible flow problems with two-level pressure approximation

Jennifer Pestana,David J. Silvester

from arxiv, 21 pages, 9 figures

This paper develops efficient preconditioned iterative solvers for incompressible flow problems discretised by an enriched Taylor-Hood mixed approximation, in which the usual pressure space is augmented by a piecewise constant pressure to ensure local mass conservation. This enrichment process causes over-specification of the pressure when the pressure space is defined by the union of standard Taylor-Hood basis functions and piecewise constant pressure basis functions, which complicates the design and implementation of efficient solvers for the resulting linear systems. We first describe the impact of this choice of pressure space specification on the matrices involved. Next, we show how to recover effective solvers for Stokes problems, with preconditioners based on the singular pressure mass matrix, and for Oseen systems arising from linearised Navier-Stokes equations, by using a two-stage pressure convection-diffusion strategy. The codes used to generate the numerical results are available online.

多峰值 · INFORMS · Attention · 推薦系統 · Performer ·

2024 年 5 月 28 日

Attention-based sequential recommendation system using multimodal data

Hyungtaik Oh,Wonkeun Jo,Dongil Kim

from arxiv, 18 pages, 4 figures, preprinted

Sequential recommendation systems that model dynamic preferences based on a use's past behavior are crucial to e-commerce. Recent studies on these systems have considered various types of information such as images and texts. However, multimodal data have not yet been utilized directly to recommend products to users. In this study, we propose an attention-based sequential recommendation method that employs multimodal data of items such as images, texts, and categories. First, we extract image and text features from pre-trained VGG and BERT and convert categories into multi-labeled forms. Subsequently, attention operations are performed independent of the item sequence and multimodal representations. Finally, the individual attention information is integrated through an attention fusion function. In addition, we apply multitask learning loss for each modality to improve the generalization performance. The experimental results obtained from the Amazon datasets show that the proposed method outperforms those of conventional sequential recommendation systems.

Analysis · INFORMS · 信息先驗 · MASS · Processing（編程語言） ·

2024 年 5 月 27 日

Bayesian material flow analysis for systems with multiple levels of disaggregation and high dimensional data

Junyang Wang,Kolyan Ray,Pablo Brito-Parada,Yves Plancherel,Tom Bide,Joseph Mankelow,John Morley,Julia Stegemann,Rupert Myers

Material Flow Analysis (MFA) is used to quantify and understand the life cycles of materials from production to end of use, which enables environmental, social and economic impacts and interventions. MFA is challenging as available data is often limited and uncertain, leading to an underdetermined system with an infinite number of possible stocks and flows values. Bayesian statistics is an effective way to address these challenges by principally incorporating domain knowledge, and quantifying uncertainty in the data and providing probabilities associated with model solutions. This paper presents a novel MFA methodology under the Bayesian framework. By relaxing the mass balance constraints, we improve the computational scalability and reliability of the posterior samples compared to existing Bayesian MFA methods. We propose a mass based, child and parent process framework to model systems with disaggregated processes and flows. We show posterior predictive checks can be used to identify inconsistencies in the data and aid noise and hyperparameter selection. The proposed approach is demonstrated on case studies, including a global aluminium cycle with significant disaggregation, under weakly informative priors and significant data gaps to investigate the feasibility of Bayesian MFA. We illustrate just a weakly informative prior can greatly improve the performance of Bayesian methods, for both estimation accuracy and uncertainty quantification.

相關系數 · 泛函 · 估計/估計量 · Pair · 正則化項 ·

2024 年 5 月 27 日

Extremal correlation coefficient for functional data

Mihyun Kim,Piotr Kokoszka

We propose a coefficient that measures dependence in paired samples of functions. It has properties similar to the Pearson correlation, but differs in significant ways: 1) it is designed to measure dependence between curves, 2) it focuses only on extreme curves. The new coefficient is derived within the framework of regular variation in Banach spaces. A consistent estimator is proposed and justified by an asymptotic analysis and a simulation study. The usefulness of the new coefficient is illustrated on financial and and climate functional data.

2024 年 5 月 26 日

A flexible model for correlated count data, with application to multi-condition differential expression analyses of single-cell RNA sequencing data

Yusha Liu,Peter Carbonetto,Michihiro Takahama,Adam Gruenbaum,Dongyue Xie,Nicolas Chevrier,Matthew Stephens

Detecting differences in gene expression is an important part of single-cell RNA sequencing experiments, and many statistical methods have been developed for this aim. Most differential expression analyses focus on comparing expression between two groups (e.g., treatment vs. control). But there is increasing interest in multi-condition differential expression analyses in which expression is measured in many conditions, and the aim is to accurately detect and estimate expression differences in all conditions. We show that directly modeling single-cell RNA-seq counts in all conditions simultaneously, while also inferring how expression differences are shared across conditions, leads to greatly improved performance for detecting and estimating expression differences compared to existing methods. We illustrate the potential of this new approach by analyzing data from a single-cell experiment studying the effects of cytokine stimulation on gene expression. We call our new method "Poisson multivariate adaptive shrinkage", and it is implemented in an R package available online at //github.com/stephenslab/poisson.mash.alpha.

泛化理論 · MoDELS · 統計量 · 大語言模型 · 潛在 ·

2024 年 5 月 25 日

A statistical framework for weak-to-strong generalization

Seamus Somerstep,Felipe Maia Polo,Moulinath Banerjee,Ya'acov Ritov,Mikhail Yurochkin,Yuekai Sun

Modern large language model (LLM) alignment techniques rely on human feedback, but it is unclear whether the techniques fundamentally limit the capabilities of aligned LLMs. In particular, it is unclear whether it is possible to align (stronger) LLMs with superhuman capabilities with (weaker) human feedback without degrading their capabilities. This is an instance of the weak-to-strong generalization problem: using weaker (less capable) feedback to train a stronger (more capable) model. We prove that weak-to-strong generalization is possible by eliciting latent knowledge from pre-trained LLMs. In particular, we cast the weak-to-strong generalization problem as a transfer learning problem in which we wish to transfer a latent concept from a weak model to a strong pre-trained model. We prove that a naive fine-tuning approach suffers from fundamental limitations, but an alternative refinement-based approach suggested by the problem structure provably overcomes the limitations of fine-tuning. Finally, we demonstrate the practical applicability of the refinement approach with three LLM alignment tasks.

預測器/決策函數 · Extensibility · Performer · Continuity · 分解的 ·

2024 年 5 月 25 日

Sufficient dimension reduction for regression with metric space-valued responses

Abdul-Nasah Soale,Yuexiao Dong

Data visualization and dimension reduction for regression between a general metric space-valued response and Euclidean predictors is proposed. Current Fr\'ech\'et dimension reduction methods require that the response metric space be continuously embeddable into a Hilbert space, which imposes restriction on the type of metric and kernel choice. We relax this assumption by proposing a Euclidean embedding technique which avoids the use of kernels. Under this framework, classical dimension reduction methods such as ordinary least squares and sliced inverse regression are extended. An extensive simulation experiment demonstrates the superior performance of the proposed method on synthetic data compared to existing methods where applicable. The real data analysis of factors influencing the distribution of COVID-19 transmission in the U.S. and the association between BMI and structural brain connectivity of healthy individuals are also investigated.

優化器 · 數據集 · cancer · 超參數 · 分解的 ·

2024 年 5 月 24 日

Optimal transport for automatic alignment of untargeted metabolomic data

Marie Breeur,George Stepaniants,Pekka Keski-Rahkonen,Philippe Rigollet,Vivian Viallon

from arxiv, 47 pages, 16 figures

Untargeted metabolomic profiling through liquid chromatography-mass spectrometry (LC-MS) measures a vast array of metabolites within biospecimens, advancing drug development, disease diagnosis, and risk prediction. However, the low throughput of LC-MS poses a major challenge for biomarker discovery, annotation, and experimental comparison, necessitating the merging of multiple datasets. Current data pooling methods encounter practical limitations due to their vulnerability to data variations and hyperparameter dependence. Here we introduce GromovMatcher, a flexible and user-friendly algorithm that automatically combines LC-MS datasets using optimal transport. By capitalizing on feature intensity correlation structures, GromovMatcher delivers superior alignment accuracy and robustness compared to existing approaches. This algorithm scales to thousands of features requiring minimal hyperparameter tuning. Manually curated datasets for validating alignment algorithms are limited in the field of untargeted metabolomics, and hence we develop a dataset split procedure to generate pairs of validation datasets to test the alignments produced by GromovMatcher and other methods. Applying our method to experimental patient studies of liver and pancreatic cancer, we discover shared metabolic features related to patient alcohol intake, demonstrating how GromovMatcher facilitates the search for biomarkers associated with lifestyle risk factors linked to several cancer types.