销魂美女一区二区三区AV,中文字幕无码乱人伦漫画,538在线播放视频

Data augmentation is arguably the most important regularization technique commonly used to improve generalization performance of machine learning models. It primarily involves the application of appropriate data transformation operations to create new data samples with desired properties. Despite its effectiveness, the process is often challenging because of the time-consuming trial and error procedures for creating and testing different candidate augmentations and their hyperparameters manually. Automated data augmentation methods aim to automate the process. State-of-the-art approaches typically rely on automated machine learning (AutoML) principles. This work presents a comprehensive survey of AutoML-based data augmentation techniques. We discuss various approaches for accomplishing data augmentation with AutoML, including data manipulation, data integration and data synthesis techniques. We present extensive discussion of techniques for realizing each of the major subtasks of the data augmentation process: search space design, hyperparameter optimization and model evaluation. Finally, we carried out an extensive comparison and analysis of the performance of automated data augmentation techniques and state-of-the-art methods based on classical augmentation approaches. The results show that AutoML methods for data augmentation currently outperform state-of-the-art techniques based on conventional approaches.

相關內容

Automator

關注 0

Automator是蘋果公司為他們的Mac OS X系統開發的一款軟件。 只要通過點擊拖拽鼠標等操作就可以將一系列動作組合成一個工作流，從而幫助你自動的（可重復的）完成一些復雜的工作。Automator還能橫跨很多不同種類的程序，包括：查找器、Safari網絡瀏覽器、iCal、地址簿或者其他的一些程序。它還能和一些第三方的程序一起工作，如微軟的Office、Adobe公司的Photoshop或者Pixelmator等。

奇異的 · CASES · 論文 · 數值分析 ·

2024 年 4 月 24 日

Adapted Lie splitting method for convection-diffusion problems with singular convective term

Thi Tam Dang,Trung Hau Hoang,Giandomenico Orlandi

from arxiv, 14 pages, 6 figures

Splitting methods are a widely used numerical scheme for solving convection-diffusion problems. However, they may lose stability in some situations, particularly when applied to convection-diffusion problems in the presence of an unbounded convective term. In this paper, we propose a new splitting method, called the "Adapted Lie splitting method", which successfully overcomes the observed instability in certain cases. Assuming that the unbounded coefficient belongs to a suitable Lorentz space, we show that the adapted Lie splitting converges to first-order under the analytic semigroup framework. Furthermore, we provide numerical experiments to illustrate our newly proposed splitting approach.

BASIC · 線性組合 · 線性的 · 可約的 · Integration ·

2024 年 4 月 23 日

Generalized extrapolation methods based on compositions of a basic 2nd-order scheme

Sergio Blanes,Fernando Casas,Luke Shaw

from arxiv, 17 figures

We propose new linear combinations of compositions of a basic second-order scheme with appropriately chosen coefficients to construct higher order numerical integrators for differential equations. They can be considered as a generalization of extrapolation methods and multi-product expansions. A general analysis is provided and new methods up to order 8 are built and tested. The new approach is shown to reduce the latency problem when implemented in a parallel environment and leads to schemes that are significantly more efficient than standard extrapolation when the linear combination is delayed by a number of steps.

MoDELS · 統計量 · 穩健性 · 散度 · Processing（編程語言） ·

2024 年 4 月 23 日

Variational Bayesian surrogate modelling with application to robust design optimisation

Thomas A. Archbold,Ieva Kazlauskaite,Fehmi Cirak

from arxiv, 31 pages, 16 figures

Surrogate models provide a quick-to-evaluate approximation to complex computational models and are essential for multi-query problems like design optimisation. The inputs of current computational models are usually high-dimensional and uncertain. We consider Bayesian inference for constructing statistical surrogates with input uncertainties and intrinsic dimensionality reduction. The surrogates are trained by fitting to data from prevalent deterministic computational models. The assumed prior probability density of the surrogate is a Gaussian process. We determine the respective posterior probability density and parameters of the posited statistical model using variational Bayes. The non-Gaussian posterior is approximated by a simpler trial density with free variational parameters and the discrepancy between them is measured using the Kullback-Leibler (KL) divergence. We employ the stochastic gradient method to compute the variational parameters and other statistical model parameters by minimising the KL divergence. We demonstrate the accuracy and versatility of the proposed reduced dimension variational Gaussian process (RDVGP) surrogate on illustrative and robust structural optimisation problems with cost functions depending on a weighted sum of the mean and standard deviation of model outputs.

統計量 · 有偏 · 估計/估計量 · 設計 · 查準率/準確率 ·

2024 年 4 月 21 日

Test-negative designs with various reasons for testing: statistical bias and solution

Mengxin Yu,Kendrick Qijun Li,Nicholas Jewell,Eric Tchetgen Tchetgen,Dylan Small,Xu Shi,Bingkai Wang

Test-negative designs are widely used for post-market evaluation of vaccine effectiveness, particularly in cases where randomization is not feasible. Differing from classical test-negative designs where only healthcare-seekers with symptoms are included, recent test-negative designs have involved individuals with various reasons for testing, especially in an outbreak setting. While including these data can increase sample size and hence improve precision, concerns have been raised about whether they introduce bias into the current framework of test-negative designs, thereby demanding a formal statistical examination of this modified design. In this article, using statistical derivations, causal graphs, and numerical simulations, we show that the standard odds ratio estimator may be biased if various reasons for testing are not accounted for. To eliminate this bias, we identify three categories of reasons for testing, including symptoms, disease-unrelated reasons, and case contact tracing, and characterize associated statistical properties and estimands. Based on our characterization, we show how to consistently estimate each estimand via stratification. Furthermore, we describe when these estimands correspond to the same vaccine effectiveness parameter, and, when appropriate, propose a stratified estimator that can incorporate multiple reasons for testing and improve precision. The performance of our proposed method is demonstrated through simulation studies.

估計/估計量 · Performance · tuning · 泛函 · 優化器 ·

2024 年 4 月 21 日

A nonstandard application of cross-validation to estimate density functionals

José E. Chacón,Carlos Tenreiro

from arxiv, 20 pages main text + 14 pages, 2 figures supplementary material

Cross-validation is usually employed to evaluate the performance of a given statistical methodology. When such a methodology depends on a number of tuning parameters, cross-validation proves to be helpful to select the parameters that optimize the estimated performance. In this paper, however, a very different and nonstandard use of cross-validation is investigated. Instead of focusing on the cross-validated parameters, the main interest is switched to the estimated value of the error criterion at optimal performance. It is shown that this approach is able to provide consistent and efficient estimates of some density functionals, with the noteworthy feature that these estimates do not rely on the choice of any further tuning parameter, so that, in that sense, they can be considered to be purely empirical. Here, a base case of application of this new paradigm is developed in full detail, while many other possible extensions are hinted as well.

泛函 · 混合 · 極小點 · CC · Processing（編程語言） ·

2024 年 4 月 21 日

Circuit complexity and functionality: a thermodynamic perspective

Claudio Chamon,Andrei E. Ruckenstein,Eduardo R. Mucciolo,Ran Canetti

from arxiv, 11 pages

Circuit complexity, defined as the minimum circuit size required for implementing a particular Boolean computation, is a foundational concept in computer science. Determining circuit complexity is believed to be a hard computational problem [1]. Recently, in the context of black holes, circuit complexity has been promoted to a physical property, wherein the growth of complexity is reflected in the time evolution of the Einstein-Rosen bridge (``wormhole'') connecting the two sides of an AdS ``eternal'' black hole [2]. Here we explore another link between complexity and thermodynamics for circuits of given functionality, making the physics-inspired approach relevant to real computational problems, for which functionality is the key element of interest. In particular, our thermodynamic framework provides a new perspective on the obfuscation of programs of arbitrary length -- an important problem in cryptography -- as thermalization through recursive mixing of neighboring sections of a circuit, which can be viewed as the mixing of two containers with ``gases of gates''. This recursive process equilibrates the average complexity and leads to the saturation of the circuit entropy, while preserving functionality of the overall circuit. The thermodynamic arguments hinge on ergodicity in the space of circuits which we conjecture is limited to disconnected ergodic sectors due to fragmentation. The notion of fragmentation has important implications for the problem of circuit obfuscation as it implies that there are circuits with same size and functionality that cannot be connected via local moves. Furthermore, we argue that fragmentation is unavoidable unless the complexity classes NP and coNP coincide, a statement that implies the collapse of the polynomial hierarchy of computational complexity theory to its first level.

估計/估計量 · 集成 · 標量 · 訓練誤差 · 方陣 ·

2024 年 4 月 21 日

Corrected generalized cross-validation for finite ensembles of penalized estimators

Pierre C. Bellec,Jin-Hong Du,Takuya Koriyama,Pratik Patil,Kai Tan

from arxiv, 91 pages, 34 figures; this version adds general proof outlines (in Sections 4.3 and 5.3), add more experiments with non-Gaussian data (in Sections D and E), relaxes an assumption (in Section A.7), clarifies explanations at several places, and corrects minor typos at several places

Generalized cross-validation (GCV) is a widely-used method for estimating the squared out-of-sample prediction risk that employs a scalar degrees of freedom adjustment (in a multiplicative sense) to the squared training error. In this paper, we examine the consistency of GCV for estimating the prediction risk of arbitrary ensembles of penalized least-squares estimators. We show that GCV is inconsistent for any finite ensemble of size greater than one. Towards repairing this shortcoming, we identify a correction that involves an additional scalar correction (in an additive sense) based on degrees of freedom adjusted training errors from each ensemble component. The proposed estimator (termed CGCV) maintains the computational advantages of GCV and requires neither sample splitting, model refitting, or out-of-bag risk estimation. The estimator stems from a finer inspection of the ensemble risk decomposition and two intermediate risk estimators for the components in this decomposition. We provide a non-asymptotic analysis of the CGCV and the two intermediate risk estimators for ensembles of convex penalized estimators under Gaussian features and a linear response model. Furthermore, in the special case of ridge regression, we extend the analysis to general feature and response distributions using random matrix theory, which establishes model-free uniform consistency of CGCV.

估計/估計量 · 自助法/自舉法 · 置信度 · 覆蓋 · Copulas ·

2024 年 4 月 19 日

Bootstrap confidence intervals: A comparative simulation study

Vinícius Litvinoff Justus,Vitor Batista Rodrigues,Alex Rodrigo dos Santos Sousa

Bootstrap is a widely used technique that allows estimating the properties of a given estimator, such as its bias and standard error. In this paper, we evaluate and compare five bootstrap-based methods for making confidence intervals: two of them (Normal and Studentized) based on the bootstrap estimate of the standard error; another two (Quantile and Better) based on the estimated distribution of the parameter estimator; and finally, considering an interval constructed based on Bayesian bootstrap, relying on the notion of credible interval. The methods are compared through Monte Carlo simulations in different scenarios, including samples with autocorrelation induced by a copula model. The results are also compared with respect to the coverage rate, the median interval length and a novel indicator, proposed in this paper, combining both of them. The results show that the Studentized method has the best coverage rate, although the smallest intervals are attained by the Bayesian method. In general, all methods are appropriate and demonstrated good performance even in the scenarios violating the independence assumption.

INTERACT · Analysis · 噪聲 · 通道 · binary ·

2024 年 4 月 18 日

Improved bounds on the interactive capacity via error pattern analysis

Mudit Aggarwal,Manuj Mukherjee

from arxiv, Shorter version accepted at ISIT 2024

Any interactive protocol between a pair of parties can be reliably simulated in the presence of noise with a multiplicative overhead on the number of rounds (Schulman 1996). The reciprocal of the best (least) overhead is called the interactive capacity of the noisy channel. In this work, we present lower bounds on the interactive capacity of the binary erasure channel. Our lower bound improves the best known bound due to Ben-Yishai et al. 2021 by roughly a factor of 1.75. The improvement is due to a tighter analysis of the correctness of the simulation protocol using error pattern analysis. More precisely, instead of using the well-known technique of bounding the least number of erasures needed to make the simulation fail, we identify and bound the probability of specific erasure patterns causing simulation failure. We remark that error pattern analysis can be useful in solving other problems involving stochastic noise, such as bounding the interactive capacity of different channels.

有向 · Analysis · Performer · INFORMS · Better ·

2024 年 4 月 17 日

A multi-level analysis of data quality for formal software citation

David Schindler,Tazin Hossain,Sascha Spors,Frank Krüger

Software is a central part of modern science, and knowledge of its use is crucial for the scientific community with respect to reproducibility and attribution of its developers. Several studies have investigated in-text mentions of software and its quality, while the quality of formal software citations has only been analyzed superficially. This study performs an in-depth evaluation of formal software citation based on a set of manually annotated software references. It examines which resources are cited for software usage, to what extend they allow proper identification of software and its specific version, how this information is made available by scientific publishers, and how well it is represented in large-scale bibliographic databases. The results show that software articles are the most cited resource for software, while direct software citations are better suited for identification of software versions. Moreover, we found current practices by both, publishers and bibliographic databases, to be unsuited to represent these direct software citations, hindering large-scale analyses such as assessing software impact. We argue that current practices for representing software citations -- the recommended way to cite software by current citation standards -- stand in the way of their adaption by the scientific community, and urge providers of bibliographic data to explicitly model scientific software.