亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The training of modern machine learning models often consists in solving high-dimensional non-convex optimisation problems that are subject to large-scale data. In this context, momentum-based stochastic optimisation algorithms have become particularly widespread. The stochasticity arises from data subsampling which reduces computational cost. Both, momentum and stochasticity help the algorithm to converge globally. In this work, we propose and analyse a continuous-time model for stochastic gradient descent with momentum. This model is a piecewise-deterministic Markov process that represents the optimiser by an underdamped dynamical system and the data subsampling through a stochastic switching. We investigate longtime limits, the subsampling-to-no-subsampling limit, and the momentum-to-no-momentum limit. We are particularly interested in the case of reducing the momentum over time. Under convexity assumptions, we show convergence of our dynamical system to the global minimiser when reducing momentum over time and letting the subsampling rate go to infinity. We then propose a stable, symplectic discretisation scheme to construct an algorithm from our continuous-time dynamical system. In experiments, we study our scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network in an image classification problem. Our algorithm {attains} competitive results compared to stochastic gradient descent with momentum.

相關內容

動量方法 (Polyak, 1964) 旨在加速學習,特別是處理高曲率、小但一致的梯度,或是帶噪聲的梯度。 動量算法積累了之前梯度指數級衰減的移動平均,并且繼續沿該方向移動。

Statistical learning under distribution shift is challenging when neither prior knowledge nor fully accessible data from the target distribution is available. Distributionally robust learning (DRL) aims to control the worst-case statistical performance within an uncertainty set of candidate distributions, but how to properly specify the set remains challenging. To enable distributional robustness without being overly conservative, in this paper, we propose a shape-constrained approach to DRL, which incorporates prior information about the way in which the unknown target distribution differs from its estimate. More specifically, we assume the unknown density ratio between the target distribution and its estimate is isotonic with respect to some partial order. At the population level, we provide a solution to the shape-constrained optimization problem that does not involve the isotonic constraint. At the sample level, we provide consistency results for an empirical estimator of the target in a range of different settings. Empirical studies on both synthetic and real data examples demonstrate the improved accuracy of the proposed shape-constrained approach.

From environmental sciences to finance, there are growing needs for assessing the risk of more extreme events than those observed. Extrapolating extreme events beyond the range of the data is not obvious and requires advanced tools based on extreme value theory. Furthermore, the complexity of risk assessments often requires the inclusion of multiple variables. Extreme value theory provides very important tools for the analysis of multivariate or spatial extreme events, but these are not easily accessible to professionals without appropriate expertise. This article provides a minimal background on multivariate and spatial extremes and gives simple yet thorough instructions to analyse high-dimensional extremes using the R package ExtremalDep. After briefly introducing the statistical methodologies, we focus on road testing the package's toolbox through several real-world applications.

Online optimisation studies the convergence of optimisation methods as the data embedded in the problem changes. Based on this idea, we propose a primal dual online method for nonlinear time-discrete inverse problems. We analyse the method through regret theory and demonstrate its performance in real-time monitoring of moving bodies in a fluid with Electrical Impedance Tomography (EIT). To do so, we also prove the second-order differentiability of the Complete Electrode Model (CEM) solution operator on $L^\infty$.

We present a novel spatial discretization for the anisotropic heat conduction equation, aimed at improved accuracy at the high levels of anisotropy seen in a magnetized plasma, for example, for magnetic confinement fusion. The new discretization is based on a mixed formulation, introducing a form of the directional derivative along the magnetic field as an auxiliary variable and discretizing both the temperature and auxiliary fields in a continuous Galerkin (CG) space. Both the temperature and auxiliary variable equations are stabilized using the streamline upwind Petrov-Galerkin (SUPG) method, ensuring a better representation of the directional derivatives and therefore an overall more accurate solution. This approach can be seen as the CG-based version of our previous work (Wimmer, Southworth, Gregory, Tang, 2024), where we considered a mixed discontinuous Galerkin (DG) spatial discretization including DG-upwind stabilization. We prove consistency of the novel discretization, and demonstrate its improved accuracy over existing CG-based methods in test cases relevant to magnetic confinement fusion. This includes a long-run tokamak equilibrium sustainment scenario, demonstrating a 35% and 32% spurious heat loss for existing primal and mixed CG-based formulations versus 4% for our novel SUPG-stabilized discretization.

Reinforcement learning (RL) algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. Most common RL algorithms use undirected exploration, i.e., select random sequences of actions. Exploration can also be directed using intrinsic rewards, such as curiosity or model epistemic uncertainty. However, effectively balancing task and intrinsic rewards is challenging and often task-dependent. In this work, we introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. MaxInfoRL steers exploration towards informative transitions, by maximizing intrinsic rewards such as the information gain about the underlying task. When combined with Boltzmann exploration, this approach naturally trades off maximization of the value function with that of the entropy over states, rewards, and actions. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits. We then apply this general formulation to a variety of off-policy model-free RL methods for continuous state-action spaces, yielding novel algorithms that achieve superior performance across hard exploration problems and complex scenarios such as visual control tasks.

Dynamical theories of speech use computational models of articulatory control to generate quantitative predictions and advance understanding of speech dynamics. The addition of a nonlinear restoring force to task dynamic models is a significant improvement over linear models, but nonlinearity introduces challenges with parameterization and interpretability. We illustrate these problems through numerical simulations and introduce solutions in the form of scaling laws. We apply the scaling laws to a cubic model and show how they facilitate interpretable simulations of articulatory dynamics, and can be theoretically interpreted as imposing physical and cognitive constraints on models of speech movement dynamics.

Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory requirements for elements like moving averages and a poorly understood convergence theory. To overcome these challenges, we introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch, along with its deterministic proof of global convergence to a stationary point. To evaluate the F-CMA, we integrate it into conventional training protocols for classification tasks involving both convolutional neural networks and vision transformer models, allowing for a direct comparison with popular optimizers. Computational tests show significant improvements, including a decrease in the overall training time by up to 68%, an increase in per-epoch efficiency by up to 20%, and in model accuracy by up to 5%.

Detection of abrupt spatial changes in physical properties representing unique geometric features such as buried objects, cavities, and fractures is an important problem in geophysics and many engineering disciplines. In this context, simultaneous spatial field and geometry estimation methods that explicitly parameterize the background spatial field and the geometry of the embedded anomalies are of great interest. This paper introduces an advanced inversion procedure for simultaneous estimation using the domain independence property of the Karhunen-Lo\`eve (K-L) expansion. Previous methods pursuing this strategy face significant computational challenges. The associated integral eigenvalue problem (IEVP) needs to be solved repeatedly on evolving domains, and the shape derivatives in gradient-based algorithms require costly computations of the Moore-Penrose inverse. Leveraging the domain independence property of the K-L expansion, the proposed method avoids both of these bottlenecks, and the IEVP is solved only once on a fixed bounding domain. Comparative studies demonstrate that our approach yields two orders of magnitude improvement in K-L expansion gradient computation time. Inversion studies on one-dimensional and two-dimensional seepage flow problems highlight the benefits of incorporating geometry parameters along with spatial field parameters. The proposed method captures abrupt changes in hydraulic conductivity with a lower number of parameters and provides accurate estimates of boundary and spatial-field uncertainties, outperforming spatial-field-only estimation methods.

The present paper evaluates the learning behaviour of a transformer-based neural network with regard to an irregular inflectional paradigm. We apply the paradigm cell filling problem to irregular patterns. We approach this problem using the morphological reinflection task and model it as a character sequence-to-sequence learning problem. The test case under investigation are irregular verbs in Spanish. Besides many regular verbs in Spanish L-shaped verbs the first person singular indicative stem irregularly matches the subjunctive paradigm, while other indicative forms remain unaltered. We examine the role of frequency during learning and compare models under differing input frequency conditions. We train the model on a corpus of Spanish with a realistic distribution of regular and irregular verbs to compare it with models trained on input with augmented distributions of (ir)regular words. We explore how the neural models learn this L-shaped pattern using post-hoc analyses. Our experiments show that, across frequency conditions, the models are surprisingly capable of learning the irregular pattern. Furthermore, our post-hoc analyses reveal the possible sources of errors. All code and data are available at \url{//anonymous.4open.science/r/modeling_spanish_acl-7567/} under MIT license.

We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.

北京阿比特科技有限公司