亚洲AV永久无码精品九之,久久久久久久久国产精品网站,日本免费一区二区三区高清观看

Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This paper presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined, which can usually be provided by domain experts. Existing tests based on a similarity graph lack power either for location or for scale alternatives. The new test utilizes a common pattern that was overlooked previously, and works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large data sets. The new test is illustrated on two applications: The assessment of covariate balance in a matched observational study, and the comparison of network data under different conditions.

相關內容

相似度

關注 2

CASES · 相互獨立的 · Copulas · 估計/估計量 · 均方誤差 ·

2024 年 9 月 27 日

A variance-based importance index for systems with dependent components

Antonio Arriaza,Jorge Navarro,Miguel Angel Sordo,Alfonso Suárez-Llorens

This paper proposes a variance-based measure of importance for coherent systems with dependent and heterogeneous components. The particular cases of independent components and homogeneous components are also considered. We model the dependence structure among the components by the concept of copula. The proposed measure allows us to provide the best estimation of the system lifetime, in terms of the mean squared error, under the assumption that the lifetime of one of its components is known. We include theoretical results that are useful to calculate a closed-form of our measure and to compare two components of a system. We also provide some procedures to approximate the importance measure by Monte Carlo simulation methods. Finally, we illustrate the main results with several examples.

控制器 · 端到端 · 樣本 · 設計 · CASE ·

2024 年 9 月 26 日

End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic data

Nicolas Chatzikiriakos,Robin Str?sser,Frank Allg?wer,Andrea Iannelli

In this paper we propose an end-to-end algorithm for indirect data-driven control for bilinear systems with stability guarantees. We consider the case where the collected i.i.d. data is affected by probabilistic noise with possibly unbounded support and leverage tools from statistical learning theory to derive finite sample identification error bounds. To this end, we solve the bilinear identification problem by solving a set of linear and affine identification problems, by a particular choice of a control input during the data collection phase. We provide a priori as well as data-dependent finite sample identification error bounds on the individual matrices as well as ellipsoidal bounds, both of which are structurally suitable for control. Further, we integrate the structure of the derived identification error bounds in a robust controller design to obtain an exponentially stable closed-loop. By means of an extensive numerical study we showcase the interplay between the controller design and the derived identification error bounds. Moreover, we note appealing connections of our results to indirect data-driven control of general nonlinear systems through Koopman operator theory and discuss how our results may be applied in this setup.

優化器 · Minimax · Analysis · 可辨認的 · 參數空間 ·

2024 年 9 月 26 日

Optimal tests of the composite null hypothesis arising in mediation analysis

Caleb H. Miles,Antoine Chambaz

from arxiv, 66 pages, 12 figures

The indirect effect of an exposure on an outcome through an intermediate variable can be identified by a product of regression coefficients under certain causal and regression modeling assumptions. In this context, the null hypothesis of no indirect effect is a composite null hypothesis, as the null holds if either regression coefficient is zero. A consequence is that traditional hypothesis tests are severely underpowered near the origin (i.e., when both coefficients are small with respect to standard errors). We propose hypothesis tests that (i) preserve level alpha type 1 error, (ii) meaningfully improve power when both true underlying effects are small relative to sample size, and (iii) preserve power when at least one is not. One approach gives a closed-form test that is minimax optimal with respect to local power over the alternative parameter space. Another uses sparse linear programming to produce an approximately optimal test for a Bayes risk criterion. We discuss adaptations for performing large-scale hypothesis testing as well as modifications that yield improved interpretability. We provide an R package that implements the minimax optimal test.

矩陣乘積 · 圖 · Performer · GPUs · 路徑 ·

2024 年 9 月 26 日

HPC acceleration of large (min, +) matrix products to compute domination-type parameters in graphs

E. M. Garzón,J. A. Martínez,J. J. Moreno,M. L. Puertas

The computation of the domination-type parameters is a challenging problem in Cartesian product graphs. We present an algorithmic method to compute the $2$-domination number of the Cartesian product of a path with small order and any cycle, involving the $(\min,+)$ matrix product. We establish some theoretical results that provide the algorithms necessary to compute that parameter, and the main challenge to run such algorithms comes from the large size of the matrices used, which makes it necessary to improve the techniques to handle these objects. We analyze the performance of the algorithms on modern multicore CPUs and on GPUs and we show the advantages over the sequential implementation. The use of these platforms allows us to compute the $2$-domination number of cylinders such that their paths have at most $12$ vertices.

泛函 · state-of-the-art · MoDELS · 生成模型 · 計算學習理論 ·

2024 年 9 月 26 日

Preserving logical and functional dependencies in synthetic tabular data

Chaithra Umesh,Kristian Schultz,Manjunath Mahendra,Saparshi Bej,Olaf Wolkenhauer

from arxiv, Submitted to Pattern Recognition Journal

Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies. Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models.

Analysis · 估計/估計量 · 隨機場 · 傅立葉變換 · Processing（編程語言） ·

2024 年 9 月 26 日

Spectral estimation for spatial point processes and random fields

Jake P. Grainger,Tuomas A. Rajala,David J. Murrell,Sofia C. Olhede

Spatial variables can be observed in many different forms, such as regularly sampled random fields (lattice data), point processes, and randomly sampled spatial processes. Joint analysis of such collections of observations is clearly desirable, but complicated by the lack of an easily implementable analysis framework. It is well known that Fourier transforms provide such a framework, but its form has eluded data analysts. We formalize it by providing a multitaper analysis framework using coupled discrete and continuous data tapers, combined with the discrete Fourier transform for inference. Using this set of tools is important, as it forms the backbone for practical spectral analysis. In higher dimensions it is important not to be constrained to Cartesian product domains, and so we develop the methodology for spectral analysis using irregular domain data tapers, and the tapered discrete Fourier transform. We discuss its fast implementation, and the asymptotic as well as large finite domain properties. Estimators of partial association between different spatial processes are provided as are principled methods to determine their significance, and we demonstrate their practical utility on a large-scale ecological dataset.

機器閱讀理解 · 詞元分析器 · 大語言模型 · 解碼 · INFORMS ·

2024 年 9 月 26 日

Unused information in token probability distribution of generative LLM: improving LLM reading comprehension through calculation of expected values

Krystian Zawistowski

from arxiv, 7 pages, 1 figure, presented at FEDCSIS 2024 conference,

LLM text decoding is key component for perceived LLM quality. We demonstrate two experiments showing that decoding methods could be improved by manipulation of token probabilities. First, we test few LLM on SummEval summary scoring dataset, to measure reading comprehension. We compare scores from greedy decoding to expected values over the next token distribution. We scale logits by large temperature to increase the entropy of scores. This allows strong improvement of performance on SummEval (in terms of correlations to human judgement). We see improvement from 6-8% to 13-28% for 7B Mistral and from 20%-46% to 37%-56% for Mixtral, beating GPT 4 0314 result on two metrics. Part of the gain seems related to positional bias. Secondly, we use probability-based tree sampling algorithm, to examine all most probable generations for given prompt.

控制器 · GROUP · 劃分 · 可辨認的 · AIM ·

2024 年 9 月 25 日

A flexiable approach: variable selection procedures with multilayer FDR control via e-values

Chengyao Yu,Ruixing Ming,Min Xiao,Zhanfeng Wang

Consider a scenario where a large number of explanatory features targeting a response variable are analyzed, such that these features are partitioned into different groups according to their domain-specific structures. Furthermore, there may be several such partitions. Such multiple partitions may exist in many real-life scenarios. One such example is spatial genome-wide association studies. Researchers may not only be interested in identifying the features relevant to the response but also aim to determine the relevant groups within each partition. A group is considered relevant if it contains at least one relevant feature. To ensure the replicability of the findings at various resolutions, it is essential to provide false discovery rate (FDR) control for findings at multiple layers simultaneously. This paper presents a general approach that leverages various existing controlled selection procedures to generate more stable results using multilayer FDR control. The key contributions of our proposal are the development of a generalized e-filter that provides multilayer FDR control and the construction of a specific type of generalized e-values to evaluate feature importance. A primary application of our method is an improved version of Data Splitting (DS), called the eDS-filter. Furthermore, we combine the eDS-filter with the version of the group knockoff filter (gKF), resulting in a more flexible approach called the eDS+gKF filter. Simulation studies demonstrate that the proposed methods effectively control the FDR at multiple levels while maintaining or even improving power compared to other approaches. Finally, we apply the proposed method to analyze HIV mutation data.

可約的 · MoDELS · Processing（編程語言） · 數據集 · 近似 ·

2024 年 9 月 25 日

Domain decomposition for data-driven reduced modeling of large-scale systems

Ionut-Gabriel Farcas,Rayomand P. Gundevia,Ramakanth Munipalli,Karen E. Willcox

from arxiv, 24 pages, 15 figures

This paper focuses on the construction of accurate and predictive data-driven reduced models of large-scale numerical simulations with complex dynamics and sparse training datasets. In these settings, standard, single-domain approaches may be too inaccurate or may overfit and hence generalize poorly. Moreover, processing large-scale datasets typically requires significant memory and computing resources which can render single-domain approaches computationally prohibitive. To address these challenges, we introduce a domain decomposition formulation into the construction of a data-driven reduced model. In doing so, the basis functions used in the reduced model approximation become localized in space, which can increase the accuracy of the domain-decomposed approximation of the complex dynamics. The decomposition furthermore reduces the memory and computing requirements to process the underlying large-scale training dataset. We demonstrate the effectiveness and scalability of our approach in a large-scale three-dimensional unsteady rotating detonation rocket engine simulation scenario with over $75$ million degrees of freedom and a sparse training dataset. Our results show that compared to the single-domain approach, the domain-decomposed version reduces both the training and prediction errors for pressure by up to $13 \%$ and up to $5\%$ for other key quantities, such as temperature, and fuel and oxidizer mass fractions. Lastly, our approach decreases the memory requirements for processing by almost a factor of four, which in turn reduces the computing requirements as well.

Elevate · 優化器 · 穩健性 · MoDELS · Principle ·

2024 年 9 月 15 日

Robust optimization and uncertainty quantification in the nonlinear mechanics of an elevator brake system

Piotr Wolszczak,Pawel Lonkwic,Americo Cunha Jr,Grzegorz Litak,Szymon Molski

This paper deals with nonlinear mechanics of an elevator brake system subjected to uncertainties. A deterministic model that relates the braking force with uncertain parameters is deduced from mechanical equilibrium conditions. In order to take into account parameters variabilities, a parametric probabilistic approach is employed. In this stochastic formalism, the uncertain parameters are modeled as random variables, with distributions specified by the maximum entropy principle. The uncertainties are propagated by the Monte Carlo method, which provides a detailed statistical characterization of the response. This work still considers the optimum design of the brake system, formulating and solving nonlinear optimization problems, with and without the uncertainties effects.