高清一区二区三区视频在线观看_国内三级自拍小视频在线观看_免费无码成本人视频网站99_亚洲一级二一级黄色奶水多_99爱第一视频在线观看_日本播放一区二区三区日本国产_欧美大胆性爱在线

Principal component analysis (PCA) is one of the most popular dimension reduction techniques in statistics and is especially powerful when a multivariate distribution is concentrated near a lower-dimensional subspace. Multivariate extreme value distributions have turned out to provide challenges for the application of PCA since their constraint support impedes the detection of lower-dimensional structures and heavy-tails can imply that second moments do not exist, thereby preventing the application of classical variance-based techniques for PCA. We adapt PCA to max-stable distributions using a regression setting and employ max-linear maps to project the random vector to a lower-dimensional space while preserving max-stability. We also provide a characterization of those distributions which allow for a perfect reconstruction from the lower-dimensional representation. Finally, we demonstrate how an optimal projection matrix can be consistently estimated and show viability in practice with a simulation study and application to a benchmark dataset.

相關內容

PCA

關注 3

在統計中，主成分分析（PCA）是一種通過最大化每個維度的方差來將較高維度空間中的數據投影到較低維度空間中的方法。給定二維，三維或更高維空間中的點集合，可以將“最佳擬合”線定義為最小化從點到線的平均平方距離的線。可以從垂直于第一條直線的方向類似地選擇下一條最佳擬合線。重復此過程會產生一個正交的基礎，其中數據的不同單個維度是不相關的。這些基向量稱為主成分。

線性的 · MoDELS · 廣義線性模型 · 線性模型 · 相關系數 ·

2024 年 10 月 4 日

Permutation-based multiple testing when fitting many generalized linear models

Riccardo De Santis,Jelle J. Goeman,Samuel Davenport,Jesse Hemerik,Livio Finos

In many applied sciences a popular analysis strategy for high-dimensional data is to fit many multivariate generalized linear models in parallel. This paper presents a novel approach to address the resulting multiple testing problem by combining a recently developed sign-flip test with permutation-based multiple-testing procedures. Our method builds upon the univariate standardized flip-scores test which offers robustness against misspecified variances in generalized linear models, a crucial feature in high-dimensional settings where comprehensive model validation is particularly challenging. We extend this approach to the multivariate setting, enabling adaptation to unknown response correlation structures. This approach yields relevant power improvements over conventional multiple testing methods when correlation is present.

泛函 · PCA · Processing（編程語言） · Analysis · INFORMS ·

2024 年 10 月 4 日

Functional principal component analysis with informative observation times

Peijun Sang,Dehan Kong,Shu Yang

Functional principal component analysis has been shown to be invaluable for revealing variation modes of longitudinal outcomes, which serves as important building blocks for forecasting and model building. Decades of research have advanced methods for functional principal component analysis often assuming independence between the observation times and longitudinal outcomes. Yet such assumptions are fragile in real-world settings where observation times may be driven by outcome-related reasons. Rather than ignoring the informative observation time process, we explicitly model the observational times by a counting process dependent on time-varying prognostic factors. Identification of the mean, covariance function, and functional principal components ensues via inverse intensity weighting. We propose using weighted penalized splines for estimation and establish consistency and convergence rates for the weighted estimators. Simulation studies demonstrate that the proposed estimators are substantially more accurate than the existing ones in the presence of a correlation between the observation time process and the longitudinal outcome process. We further examine the finite-sample performance of the proposed method using the Acute Infection and Early Disease Research Program study.

代碼 · AI · 語言模型化 · 變換 · Learning ·

2024 年 10 月 3 日

The why, what, and how of AI-based coding in scientific research

Tonghe Zhuang,Zhicheng Lin

from arxiv, 23 pages, 7 figure, 3 boxes

Computer programming (coding) is indispensable for researchers across disciplines, yet it remains challenging to learn and time-consuming to carry out. Generative AI, particularly large language models (LLMs), has the potential to transform coding into intuitive conversations, but best practices and effective workflows are only emerging. We dissect AI-based coding through three key lenses: the nature and role of LLMs in coding (why), six types of coding assistance they provide (what), and a five-step workflow in action with practical implementation strategies (how). Additionally, we address the limitations and future outlook of AI in coding. By offering actionable insights, this framework helps to guide researchers in effectively leveraging AI to enhance coding practices and education, accelerating scientific progress.

泛函 · PCA · Processing（編程語言） · Analysis · 估計/估計量 ·

2024 年 10 月 2 日

Functional principal component analysis for longitudinal observations with sampling at random

Peijun Sang,Dehan Kong,Shu Yang

查準率/準確率 · MoDELS · 邊 · Performer · 容差 ·

2024 年 10 月 2 日

More precise edge detections

Hao Shu

from arxiv, 11 pages

Image Edge detection (ED) is a base task in computer vision. While the performance of the ED algorithm has been improved greatly by introducing CNN-based models, current models still suffer from unsatisfactory precision rates especially when only a low error toleration distance is allowed. Therefore, model architecture for more precise predictions still needs an investigation. On the other hand, the unavoidable noise training data provided by humans would lead to unsatisfactory model predictions even when inputs are edge maps themselves, which also needs a solution. In this paper, more precise ED models are presented with cascaded skipping density blocks (CSDB). Our models obtain state-of-the-art(SOTA) predictions in several datasets, especially in average precision rate (AP), over a high-standard benchmark, which is confirmed by extensive experiments. Also, a novel modification on data augmentation for training is employed, which allows noiseless data to be employed in model training for the first time, and thus further improves the model performance. The relative Python codes can be found on //github.com/Hao-B-Shu/SDPED.

MoDELS · 推斷 · 統計量 · 生成模型 · 相似度 ·

2024 年 10 月 1 日

Embedding-based statistical inference on generative models

Hayden Helm,Aranyak Acharyya,Brandon Duderstadt,Youngser Park,Carey E. Priebe

The recent cohort of publicly available generative models can produce human expert level content across a variety of topics and domains. Given a model in this cohort as a base model, methods such as parameter efficient fine-tuning, in-context learning, and constrained decoding have further increased generative capabilities and improved both computational and data efficiency. Entire collections of derivative models have emerged as a byproduct of these methods and each of these models has a set of associated covariates such as a score on a benchmark, an indicator for if the model has (or had) access to sensitive information, etc. that may or may not be available to the user. For some model-level covariates, it is possible to use "similar" models to predict an unknown covariate. In this paper we extend recent results related to embedding-based representations of generative models -- the data kernel perspective space -- to classical statistical inference settings. We demonstrate that using the perspective space as the basis of a notion of "similar" is effective for multiple model-level inference tasks.

Processing（編程語言） · 統計量 · Microsoft Surface · 泛函 · 相互獨立的 ·

2024 年 10 月 1 日

Functional summary statistics and testing for independence in marked point processes on the surface of three dimensional convex shapes

Scott Ward,Edward A. K. Cohen,Niall M. Adams

The fundamental functional summary statistics used for studying spatial point patterns are developed for marked homogeneous and inhomogeneous point processes on the surface of a sphere. These are extended to point processes on the surface of three dimensional convex shapes given the bijective mapping from the shape to the sphere is known. These functional summary statistics are used to test for independence between the marginals of multi-type spatial point processes with methods for sampling the null distribution proposed and discussed. This is illustrated on both simulated data and the RNGC galaxy point pattern, revealing attractive dependencies between different galaxy types.

簇 · 推斷 · Conformer · 試驗 · 近似 ·

2024 年 10 月 1 日

Conformal causal inference for cluster randomized trials: model-robust inference without asymptotic approximations

Bingkai Wang,Fan Li,Mengxin Yu

Traditional statistical inference in cluster randomized trials typically invokes the asymptotic theory that requires the number of clusters to approach infinity. In this article, we propose an alternative conformal causal inference framework for analyzing cluster randomized trials that achieves the target inferential goal in finite samples without the need for asymptotic approximations. Different from traditional inference focusing on estimating the average treatment effect, our conformal causal inference aims to provide prediction intervals for the difference of counterfactual outcomes, thereby providing a new decision-making tool for clusters and individuals in the same target population. We prove that this framework is compatible with arbitrary working outcome models -- including data-adaptive machine learning methods that maximally leverage information from baseline covariates, and enjoys robustness against misspecification of working outcome models. Under our conformal causal inference framework, we develop efficient computation algorithms to construct prediction intervals for treatment effects at both the cluster and individual levels, and further extend to address inferential targets defined based on pre-specified covariate subgroups. Finally, we demonstrate the properties of our methods via simulations and a real data application based on a completed cluster randomized trial for treating chronic pain.

Performer · 統計量 · 可辨認的 · 模型性能 · 穩健性 ·

2024 年 10 月 1 日

Targeted synthetic data generation for tabular data via hardness characterization

Tommaso Ferracci,Leonie Tabea Goldmann,Anton Hinel,Francesco Sanna Passino

Synthetic data generation has been proven successful in improving model performance and robustness in the context of scarce or low-quality data. Using the data valuation framework to statistically identify beneficial and detrimental observations, we introduce a novel augmentation pipeline that generates only high-value training points based on hardness characterization. We first demonstrate via benchmarks on real data that Shapley-based data valuation methods perform comparably with learning-based methods in hardness characterisation tasks, while offering significant theoretical and computational advantages. Then, we show that synthetic data generators trained on the hardest points outperform non-targeted data augmentation on simulated data and on a large scale credit default prediction task. In particular, our approach improves the quality of out-of-sample predictions and it is computationally more efficient compared to non-targeted methods.

Neural Networks · Networking · 振蕩 · 方差 · 方差減小 ·

2024 年 9 月 30 日

Neural network approaches for variance reduction in fluctuation formulas

Grigorios Pavliotis,Renato Spacek,Gabriel Stoltz,Urbain Vaes

from arxiv, 42 pages, 8 figures

We propose a method utilizing physics-informed neural networks (PINNs) to solve Poisson equations that serve as control variates in the computation of transport coefficients via fluctuation formulas, such as the Green--Kubo and generalized Einstein-like formulas. By leveraging approximate solutions to the Poisson equation constructed through neural networks, our approach significantly reduces the variance of the estimator at hand. We provide an extensive numerical analysis of the estimators and detail a methodology for training neural networks to solve these Poisson equations. The approximate solutions are then incorporated into Monte Carlo simulations as effective control variates, demonstrating the suitability of the method for moderately high-dimensional problems where fully deterministic solutions are computationally infeasible.