亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Run to run variability in parallel programs caused by floating-point non-associativity has been known to significantly affect reproducibility in iterative algorithms, due to accumulating errors. Non-reproducibility can critically affect the efficiency and effectiveness of correctness testing for stochastic programs. Recently, the sensitivity of deep learning training and inference pipelines to floating-point non-associativity has been found to sometimes be extreme. It can prevent certification for commercial applications, accurate assessment of robustness and sensitivity, and bug detection. New approaches in scientific computing applications have coupled deep learning models with high-performance computing, leading to an aggravation of debugging and testing challenges. Here we perform an investigation of the statistical properties of floating-point non-associativity within modern parallel programming models, and analyze performance and productivity impacts of replacing atomic operations with deterministic alternatives on GPUs. We examine the recently-added deterministic options in PyTorch within the context of GPU deployment for deep learning, uncovering and quantifying the impacts of input parameters triggering run to run variability and reporting on the reliability and completeness of the documentation. Finally, we evaluate the strategy of exploiting automatic determinism that could be provided by deterministic hardware, using the Groq accelerator for inference portions of the deep learning pipeline. We demonstrate the benefits that a hardware-based strategy can provide within reproducibility and correctness efforts.

相關內容

Many optimization problems require hyperparameters, i.e., parameters that must be pre-specified in advance, such as regularization parameters and parametric regularizers in variational regularization methods for inverse problems, and dictionaries in compressed sensing. A data-driven approach to determine appropriate hyperparameter values is via a nested optimization framework known as bilevel learning. Even when it is possible to employ a gradient-based solver to the bilevel optimization problem, construction of the gradients, known as hypergradients, is computationally challenging, each one requiring both a solution of a minimization problem and a linear system solve. These systems do not change much during the iterations, which motivates us to apply recycling Krylov subspace methods, wherein information from one linear system solve is re-used to solve the next linear system. Existing recycling strategies often employ eigenvector approximations called Ritz vectors. In this work we propose a novel recycling strategy based on a new concept, Ritz generalized singular vectors, which acknowledge the bilevel setting. Additionally, while existing iterative methods primarily terminate according to the residual norm, this new concept allows us to define a new stopping criterion that directly approximates the error of the associated hypergradient. The proposed approach is validated through extensive numerical testing in the context of an inverse problem in imaging.

This manuscript describes the notions of blocker and interdiction applied to well-known optimization problems. The main interest of these two concepts is the capability to analyze the existence of a combinatorial structure after some modifications. We focus on graph modification, like removing vertices or links in a network. In the interdiction version, we have a budget for modification to reduce as much as possible the size of a given combinatorial structure. Whereas, for the blocker version, we minimize the number of modifications such that the network does not contain a given combinatorial structure. Blocker and interdiction problems have some similarities and can be applied to well-known optimization problems. We consider matching, connectivity, shortest path, max flow, and clique problems. For these problems, we analyze either the blocker version or the interdiction one. Applying the concept of blocker or interdiction to well-known optimization problems can change their complexities. Some optimization problems become harder when one of these two notions is applied. For this reason, we propose some complexity analysis to show when an optimization problem, or the associated decision problem, becomes harder. Another fundamental aspect developed in the manuscript is the use of exact methods to tackle these optimization problems. The main way to solve these problems is to use integer linear programming to model them. An interesting aspect of integer linear programming is the possibility to analyze theoretically the strength of these models, using cutting planes. For most of the problems studied in this manuscript, a polyhedral analysis is performed to prove the strength of inequalities or describe new families of inequalities. The exact algorithms proposed are based on Branch-and-Cut or Branch-and-Price algorithm, where dedicated separation and pricing algorithms are proposed.

This paper explores the utility of diffusion-based models for anomaly detection, focusing on their efficacy in identifying deviations in both compact and high-resolution datasets. Diffusion-based architectures, including Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs), are evaluated for their performance using reconstruction objectives. By leveraging the strengths of these models, this study benchmarks their performance against traditional anomaly detection methods such as Isolation Forests, One-Class SVMs, and COPOD. The results demonstrate the superior adaptability, scalability, and robustness of diffusion-based methods in handling complex real-world anomaly detection tasks. Key findings highlight the role of reconstruction error in enhancing detection accuracy and underscore the scalability of these models to high-dimensional datasets. Future directions include optimizing encoder-decoder architectures and exploring multi-modal datasets to further advance diffusion-based anomaly detection.

Weighting with the inverse probability of censoring is an approach to deal with censoring in regression analyses where the outcome may be missing due to right-censoring. In this paper, three separate approaches involving this idea in a setting where the Kaplan--Meier estimator is used for estimating the censoring probability are compared. In more detail, the three approaches involve weighted regression, regression with a weighted outcome, and regression of a jack-knife pseudo-observation based on a weighted estimator. Expressions of the asymptotic variances are given in each case and the expressions are compared to each other and to the uncensored case. In terms of low asymptotic variance, a clear winner cannot be found. Which approach will have the lowest asymptotic variance depends on the censoring distribution. Expressions of the limit of the standard sandwich variance estimator in the three cases are also provided, revealing an overestimation under the implied assumptions.

We prove that a classifier with a Barron-regular decision boundary can be approximated with a rate of high polynomial degree by ReLU neural networks with three hidden layers when a margin condition is assumed. In particular, for strong margin conditions, high-dimensional discontinuous classifiers can be approximated with a rate that is typically only achievable when approximating a low-dimensional smooth function. We demonstrate how these expression rate bounds imply fast-rate learning bounds that are close to $n^{-1}$ where $n$ is the number of samples. In addition, we carry out comprehensive numerical experimentation on binary classification problems with various margins. We study three different dimensions, with the highest dimensional problem corresponding to images from the MNIST data set.

Tipping points occur in many real-world systems, at which the system shifts suddenly from one state to another. The ability to predict the occurrence of tipping points from time series data remains an outstanding challenge and a major interest in a broad range of research fields. Particularly, the widely used methods based on bifurcation theory are neither reliable in prediction accuracy nor applicable for irregularly-sampled time series which are commonly observed from real-world systems. Here we address this challenge by developing a deep learning algorithm for predicting the occurrence of tipping points in untrained systems, by exploiting information about normal forms. Our algorithm not only outperforms traditional methods for regularly-sampled model time series but also achieves accurate predictions for irregularly-sampled model time series and empirical time series. Our ability to predict tipping points for complex systems paves the way for mitigation risks, prevention of catastrophic failures, and restoration of degraded systems, with broad applications in social science, engineering, and biology.

Generalized linear models are a popular tool in applied statistics, with their maximum likelihood estimators enjoying asymptotic Gaussianity and efficiency. As all models are wrong, it is desirable to understand these estimators' behaviours under model misspecification. We study semiparametric multilevel generalized linear models, where only the conditional mean of the response is taken to follow a specific parametric form. Pre-existing estimators from mixed effects models and generalized estimating equations require specificaiton of a conditional covariance, which when misspecified can result in inefficient estimates of fixed effects parameters. It is nevertheless often computationally attractive to consider a restricted, finite dimensional class of estimators, as these models naturally imply. We introduce sandwich regression, that selects the estimator of minimal variance within a parametric class of estimators over all distributions in the full semiparametric model. We demonstrate numerically on simulated and real data the attractive improvements our sandwich regression approach enjoys over classical mixed effects models and generalized estimating equations.

This paper presents a loss-based generalized Bayesian methodology for high-dimensional robust regression with serially correlated errors and predictors. The proposed framework employs a novel scaled pseudo-Huber (SPH) loss function, which smooths the well-known Huber loss, achieving a balance between quadratic and absolute linear loss behaviors. This flexibility enables the framework to accommodate both thin-tailed and heavy-tailed data effectively. The generalized Bayesian approach constructs a working likelihood utilizing the SPH loss that facilitates efficient and stable estimation while providing rigorous estimation uncertainty quantification for all model parameters. Notably, this allows formal statistical inference without requiring ad hoc tuning parameter selection while adaptively addressing a wide range of tail behavior in the errors. By specifying appropriate prior distributions for the regression coefficients -- e.g., ridge priors for small or moderate-dimensional settings and spike-and-slab priors for high-dimensional settings -- the framework ensures principled inference. We establish rigorous theoretical guarantees for the accurate estimation of underlying model parameters and the correct selection of predictor variables under sparsity assumptions for a wide range of data generating setups. Extensive simulation studies demonstrate the superiority of our approach compared to traditional quadratic and absolute linear loss-based Bayesian regression methods, highlighting its flexibility and robustness in high-dimensional and challenging data contexts.

There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede handcrafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To tackle these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representations of edges, while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings, and scales much better than alternatives with a similar performance level or expressive power.

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.

北京阿比特科技有限公司