久久香蕉国产线看观看亚洲卡,成人亚洲国产综合精品夜色

估計/估計量 · 泛函 · 情景 · 可約的 · state-of-the-art ·

2023 年 6 月 1 日

A novel approach for estimating functions in the multivariate setting based on an adaptive knot selection for B-splines with an application to a chemical system used in geoscience

Mary E. Savino,Céline Lévy-Leduc

from arxiv, 29 pages, 29 figures

In this paper, we will outline a novel data-driven method for estimating functions in a multivariate nonparametric regression model based on an adaptive knot selection for B-splines. The underlying idea of our approach for selecting knots is to apply the generalized lasso, since the knots of the B-spline basis can be seen as changes in the derivatives of the function to be estimated. This method was then extended to functions depending on several variables by processing each dimension independently, thus reducing the problem to a univariate setting. The regularization parameters were chosen by means of a criterion based on EBIC. The nonparametric estimator was obtained using a multivariate B-spline regression with the corresponding selected knots. Our procedure was validated through numerical experiments by varying the number of observations and the level of noise to investigate its robustness. The influence of observation sampling was also assessed and our method was applied to a chemical system commonly used in geoscience. For each different framework considered in this paper, our approach performed better than state-of-the-art methods. Our completely data-driven method is implemented in the glober R package which will soon be available on the Comprehensive R Archive Network (CRAN).

相關內容

估計/估計量

關注 3

帶符號距離 · 泛函 · 圖像分割 · 統計量 · 生成模型 ·

2023 年 7 月 21 日

Score-Based Generative Models for Medical Image Segmentation using Signed Distance Functions

Lea Bogensperger,Dominik Narnhofer,Filip Ilic,Thomas Pock

Medical image segmentation is a crucial task that relies on the ability to accurately identify and isolate regions of interest in medical images. Thereby, generative approaches allow to capture the statistical properties of segmentation masks that are dependent on the respective structures. In this work we propose a conditional score-based generative modeling framework to represent the signed distance function (SDF) leading to an implicit distribution of segmentation masks. The advantage of leveraging the SDF is a more natural distortion when compared to that of binary masks. By learning the score function of the conditional distribution of SDFs we can accurately sample from the distribution of segmentation masks, allowing for the evaluation of statistical quantities. Thus, this probabilistic representation allows for the generation of uncertainty maps represented by the variance, which can aid in further analysis and enhance the predictive robustness. We qualitatively and quantitatively illustrate competitive performance of the proposed method on a public nuclei and gland segmentation data set, highlighting its potential utility in medical image segmentation applications.

估計/估計量 · 概率密度函數 · MoDELS · 預測準確率 · 模型選擇 ·

2023 年 7 月 21 日

Bayesian taut splines for estimating the number of modes

José E. Chacón,Javier Fernández Serrano

from arxiv, 20 pages, 8 figures (manuscript) + 19 pages, 16 figures (supplementary material)

The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.

泛函 · 樣本 · Subspace · 周期的 · 情景 ·

2023 年 7 月 21 日

Efficient recovery of non-periodic multivariate functions from few scattered samples

Felix Bartel,Kai Lüttgen,Nicolas Nagel,Tino Ullrich

from arxiv, 6 pages, 5 figures Published in the SampTA 2023 conference proceeding

It has been observed by several authors that well-known periodization strategies like tent or Chebychev transforms lead to remarkable results for the recovery of multivariate functions from few samples. So far, theoretical guarantees are missing. The goal of this paper is twofold. On the one hand, we give such guarantees and briefly describe the difficulties of the involved proof. On the other hand, we combine these periodization strategies with recent novel constructive methods for the efficient subsampling of finite frames in $\mathbb{C}$. As a result we are able to reconstruct non-periodic multivariate functions from very few samples. The used sampling nodes are the result of a two-step procedure. Firstly, a random draw with respect to the Chebychev measure provides an initial node set. A further sparsification technique selects a significantly smaller subset of these nodes with equal approximation properties. This set of sampling nodes scales linearly in the dimension of the subspace on which we project and works universally for the whole class of functions. The method is based on principles developed by Batson, Spielman, and Srivastava and can be numerically implemented. Samples on these nodes are then used in a (plain) least-squares sampling recovery step on a suitable hyperbolic cross subspace of functions resulting in a near-optimal behavior of the sampling error. Numerical experiments indicate the applicability of our results.

INFORMS · 泛函 · 估計/估計量 · 推斷 · 可辨認的 ·

2023 年 7 月 21 日

Estimating and using information in inverse problems

Wolfgang Bangerth,Chris R. Johnson,Dennis K. Njeru,Bart van Bloemen Waanders

In inverse problems, one attempts to infer spatially variable functions from indirect measurements of a system. To practitioners of inverse problems, the concept of "information" is familiar when discussing key questions such as which parts of the function can be inferred accurately and which cannot. For example, it is generally understood that we can identify system parameters accurately only close to detectors, or along ray paths between sources and detectors, because we have "the most information" for these places. Although referenced in many publications, the "information" that is invoked in such contexts is not a well understood and clearly defined quantity. Herein, we present a definition of information density that is based on the variance of coefficients as derived from a Bayesian reformulation of the inverse problem. We then discuss three areas in which this information density can be useful in practical algorithms for the solution of inverse problems, and illustrate the usefulness in one of these areas -- how to choose the discretization mesh for the function to be reconstructed -- using numerical experiments.

Subspace · MoDELS · 估計/估計量 · CASE · Obvious ·

2023 年 7 月 20 日

Discovering Active Subspaces for High-Dimensional Computer Models

Kellin N. Rumsey,Devin Francom,Scott Vander Wiel

Dimension reduction techniques have long been an important topic in statistics, and active subspaces (AS) have received much attention this past decade in the computer experiments literature. The most common approach towards estimating the AS is to use Monte Carlo with numerical gradient evaluation. While sensible in some settings, this approach has obvious drawbacks. Recent research has demonstrated that active subspace calculations can be obtained in closed form, conditional on a Gaussian process (GP) surrogate, which can be limiting in high-dimensional settings for computational reasons. In this paper, we produce the relevant calculations for a more general case when the model of interest is a linear combination of tensor products. These general equations can be applied to the GP, recovering previous results as a special case, or applied to the models constructed by other regression techniques including multivariate adaptive regression splines (MARS). Using a MARS surrogate has many advantages including improved scaling, better estimation of active subspaces in high dimensions and the ability to handle a large number of prior distributions in closed form. In one real-world example, we obtain the active subspace of a radiation-transport code with 240 inputs and 9,372 model runs in under half an hour.

情景 · 訓練集 · Machine Learning · Performer · MoDELS ·

2023 年 7 月 20 日

Investigating minimizing the training set fill distance in machine learning regression

Paolo Climaco,Jochen Garcke

Many machine learning regression methods leverage large datasets for training predictive models. However, using large datasets may not be feasible due to computational limitations or high labelling costs. Therefore, sampling small training sets from large pools of unlabelled data points is essential to maximize model performance while maintaining computational efficiency. In this work, we study a sampling approach aimed to minimize the fill distance of the selected set. We derive an upper bound for the maximum expected prediction error that linearly depends on the training set fill distance, conditional to the knowledge of data features. For empirical validation, we perform experiments using two regression models on two datasets. We empirically show that selecting a training set by aiming to minimize the fill distance, thereby minimizing the bound, significantly reduces the maximum prediction error of various regression models, outperforming existing sampling approaches by a large margin.

預測器/決策函數 · 優化器 · 近似 · Processing（編程語言） · 無限 ·

2023 年 7 月 20 日

On the asymptotic behavior of a finite-section of the optimal causal filter

Junho Yang

We establish an $L_1$-bound between the coefficients of the optimal causal filter applied to the data-generating process and its finite sample approximation. Here, we assume that the data-generating process is a second-order stationary time series with either short or long memory autocovariances. To derive the $L_1$-bound, we first provide an exact expression for the coefficients of the causal filter and their approximations in terms of the absolute convergent series of the multistep ahead infinite and finite predictor coefficients, respectively. Then, we prove a so-called uniform Baxter's inequality to obtain a bound for the difference between the infinite and finite multistep ahead predictor coefficients in both short and memory time series. The $L_1$-approximation error bound for the causal filter coefficients can be used to evaluate the performance of the linear predictions of time series through the mean squared error criterion.

INFORMS · PID · 有偏 · 估計/估計量 · INTERACT ·

2023 年 7 月 20 日

Gaussian Partial Information Decomposition: Bias Correction and Application to High-dimensional Data

Praveen Venkatesh,Corbett Bennett,Sam Gale,Tamina K. Ramirez,Greggory Heller,Severine Durand,Shawn Olsen,Stefan Mihalas

Recent advances in neuroscientific experimental techniques have enabled us to simultaneously record the activity of thousands of neurons across multiple brain regions. This has led to a growing need for computational tools capable of analyzing how task-relevant information is represented and communicated between several brain regions. Partial information decompositions (PIDs) have emerged as one such tool, quantifying how much unique, redundant and synergistic information two or more brain regions carry about a task-relevant message. However, computing PIDs is computationally challenging in practice, and statistical issues such as the bias and variance of estimates remain largely unexplored. In this paper, we propose a new method for efficiently computing and estimating a PID definition on multivariate Gaussian distributions. We show empirically that our method satisfies an intuitive additivity property, and recovers the ground truth in a battery of canonical examples, even at high dimensionality. We also propose and evaluate, for the first time, a method to correct the bias in PID estimates at finite sample sizes. Finally, we demonstrate that our Gaussian PID effectively characterizes inter-areal interactions in the mouse brain, revealing higher redundancy between visual areas when a stimulus is behaviorally relevant.

估計/估計量 · 閾值 · 罰項 · 向量化 · 粵港澳大灣區數字經濟研究院 ·

2023 年 7 月 19 日

Pattern Recovery in Penalized and Thresholded Estimation and its Geometry

Piotr Graczyk,Ulrike Schneider,Tomasz Skalski,Patrick Tardivel

We consider the framework of penalized estimation where the penalty term is given by a real-valued polyhedral gauge, which encompasses methods such as LASSO (and many variants thereof such as the generalized LASSO), SLOPE, OSCAR, PACS and others. Each of these estimators can uncover a different structure or ``pattern'' of the unknown parameter vector. We define a general notion of patterns based on subdifferentials and formalize an approach to measure their complexity. For pattern recovery, we provide a minimal condition for a particular pattern to be detected by the procedure with positive probability, the so-called accessibility condition. Using our approach, we also introduce the stronger noiseless recovery condition. For the LASSO, it is well known that the irrepresentability condition is necessary for pattern recovery with probability larger than $1/2$ and we show that the noiseless recovery plays exactly the same role, thereby extending and unifying the irrepresentability condition of the LASSO to a broad class of penalized estimators. We show that the noiseless recovery condition can be relaxed when turning to thresholded penalized estimators, extending the idea of the thresholded LASSO: we prove that the accessibility condition is already sufficient (and necessary) for sure pattern recovery by thresholded penalized estimation provided that the signal of the pattern is large enough. Throughout the article, we demonstrate how our findings can be interpreted through a geometrical lens.

Analysis · Networking · 損失函數（機器學習） · 成對型 · 異常點 ·

2023 年 7 月 19 日

Cryo-forum: A framework for orientation recovery with uncertainty measure with the application in cryo-EM image analysis

Szu-Chi Chung

from arxiv, 27 pages, 9 figures

In single-particle cryo-electron microscopy (cryo-EM), the efficient determination of orientation parameters for 2D projection images poses a significant challenge yet is crucial for reconstructing 3D structures. This task is complicated by the high noise levels present in the cryo-EM datasets, which often include outliers, necessitating several time-consuming 2D clean-up processes. Recently, solutions based on deep learning have emerged, offering a more streamlined approach to the traditionally laborious task of orientation estimation. These solutions often employ amortized inference, eliminating the need to estimate parameters individually for each image. However, these methods frequently overlook the presence of outliers and may not adequately concentrate on the components used within the network. This paper introduces a novel approach that uses a 10-dimensional feature vector to represent the orientation and applies a Quadratically-Constrained Quadratic Program to derive the predicted orientation as a unit quaternion, supplemented by an uncertainty metric. Furthermore, we propose a unique loss function that considers the pairwise distances between orientations, thereby enhancing the accuracy of our method. Finally, we also comprehensively evaluate the design choices involved in constructing the encoder network, a topic that has not received sufficient attention in the literature. Our numerical analysis demonstrates that our methodology effectively recovers orientations from 2D cryo-EM images in an end-to-end manner. Importantly, the inclusion of uncertainty quantification allows for direct clean-up of the dataset at the 3D level. Lastly, we package our proposed methods into a user-friendly software suite named cryo-forum, designed for easy accessibility by the developers.