亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Surrogate-assisted evolutionary algorithms (SAEAs) aim to use efficient computational models with the goal of approximating the fitness function in evolutionary computation systems. This area of research has been active for over two decades and has received significant attention from the specialised research community in different areas, for example, single and many objective optimisation or dynamic and stationary optimisation problems. An emergent and exciting area that has received little attention from the SAEAs community is in neuroevolution. This refers to the use of evolutionary algorithms in the automatic configuration of artificial neural network (ANN) architectures, hyper-parameters and/or the training of ANNs. However, ANNs suffer from two major issues: (a) the use of highly-intense computational power for their correct training, and (b) the highly specialised human expertise required to correctly configure ANNs necessary to get a well-performing network. This work aims to fill this important research gap in SAEAs in neuroevolution by addressing these two issues. We demonstrate how one can use a Kriging Partial Least Squares method that allows efficient computation of good approximate surrogate models compared to the well-known Kriging method, which normally cannot be used in neuroevolution due to the high dimensionality of the data.

相關內容

Variational Autoencoders and their many variants have displayed impressive ability to perform dimensionality reduction, often achieving state-of-the-art performance. Many current methods however, struggle to learn good representations in High Dimensional, Low Sample Size (HDLSS) tasks, which is an inherently challenging setting. We address this challenge by using an ensemble of lightweight VAEs to learn posteriors over subsets of the feature-space, which get aggregated into a joint posterior in a novel divide-and-conquer approach. Specifically, we present an alternative factorisation of the joint posterior that induces a form of implicit data augmentation that yields greater sample efficiency. Through a series of experiments on eight real-world datasets, we show that our method learns better latent representations in HDLSS settings, which leads to higher accuracy in a downstream classification task. Furthermore, we verify that our approach has a positive effect on disentanglement and achieves a lower estimated Total Correlation on learnt representations. Finally, we show that our approach is robust to partial features at inference, exhibiting little performance degradation even with most features missing.

We present a finite element scheme for fractional diffusion problems with varying diffusivity and fractional order. We consider a symmetric integral form of these nonlocal equations defined on general geometries and in arbitrary bounded domains. A number of challenges are encountered when discretizing these equations. The first comes from the heterogeneous kernel singularity in the fractional integral operator. The second comes from the dense discrete operator with its quadratic growth in memory footprint and arithmetic operations. An additional challenge comes from the need to handle volume conditions-the generalization of classical local boundary conditions to the nonlocal setting. Satisfying these conditions requires that the effect of the whole domain, including both the interior and exterior regions, can be computed on every interior point in the discretization. Performed directly, this would result in quadratic complexity. To address these challenges, we propose a strategy that decomposes the stiffness matrix into three components. The first is a sparse matrix that handles the singular near-field separately and is computed by adapting singular quadrature techniques available for the homogeneous case to the case of spatially variable order. The second component handles the remaining smooth part of the near-field as well as the far field and is approximated by a hierarchical $\mathcal{H}^{2}$ matrix that maintains linear complexity in storage and operations. The third component handles the effect of the global mesh at every node and is written as a weighted mass matrix whose density is computed by a fast-multipole type method. The resulting algorithm has therefore overall linear space and time complexity. Analysis of the consistency of the stiffness matrix is provided and numerical experiments are conducted to illustrate the convergence and performance of the proposed algorithm.

Deep neural networks (DNNs) have made significant progress, but often suffer from fairness issues, as deep models typically show distinct accuracy differences among certain subgroups (e.g., males and females). Existing research addresses this critical issue by employing fairness-aware loss functions to constrain the last-layer outputs and directly regularize DNNs. Although the fairness of DNNs is improved, it is unclear how the trained network makes a fair prediction, which limits future fairness improvements. In this paper, we investigate fairness from the perspective of decision rationale and define the parameter parity score to characterize the fair decision process of networks by analyzing neuron influence in various subgroups. Extensive empirical studies show that the unfair issue could arise from the unaligned decision rationales of subgroups. Existing fairness regularization terms fail to achieve decision rationale alignment because they only constrain last-layer outputs while ignoring intermediate neuron alignment. To address the issue, we formulate the fairness as a new task, i.e., decision rationale alignment that requires DNNs' neurons to have consistent responses on subgroups at both intermediate processes and the final prediction. To make this idea practical during optimization, we relax the naive objective function and propose gradient-guided parity alignment, which encourages gradient-weighted consistency of neurons across subgroups. Extensive experiments on a variety of datasets show that our method can significantly enhance fairness while sustaining a high level of accuracy and outperforming other approaches by a wide margin.

Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence to a neighborhood compared to the stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.

The hierarchical prior used in Latent Gaussian models (LGMs) induces a posterior geometry prone to frustrate inference algorithms. Marginalizing out the latent Gaussian variable using an integrated Laplace approximation removes the offending geometry, allowing us to do efficient inference on the hyperparameters. To use gradient-based inference we need to compute the approximate marginal likelihood and its gradient. The adjoint-differentiated Laplace approximation differentiates the marginal likelihood and scales well with the dimension of the hyperparameters. While this method can be applied to LGMs with any prior covariance, it only works for likelihoods with a diagonal Hessian. Furthermore, the algorithm requires methods which compute the first three derivatives of the likelihood with current implementations relying on analytical derivatives. I propose a generalization which is applicable to a broader class of likelihoods and does not require analytical derivatives of the likelihood. Numerical experiments suggest the added flexibility comes at no computational cost: on a standard LGM, the new method is in fact slightly faster than the existing adjoint-differentiated Laplace approximation. I also apply the general method to an LGM with an unconventional likelihood. This example highlights the algorithm's potential, as well as persistent challenges.

As an efficient alternative to conventional full finetuning, parameter-efficient finetuning (PEFT) is becoming the prevailing method to adapt pretrained language models. In PEFT, a lightweight module is learned on each dataset while the underlying pretrained language model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficient modules through linear arithmetic operations in the weight space, thereby integrating different module capabilities. Specifically, we first define addition and negation operators for the module, and then further compose these two basic operators to perform flexible arithmetic. Our approach requires \emph{no additional training} and enables highly flexible module composition. We apply different arithmetic operations to compose the parameter-efficient modules for (1) distribution generalization, (2) multi-tasking, (3) unlearning, and (4) domain transfer. Additionally, we extend our approach to detoxify Alpaca-LoRA, the latest instruction-tuned large language model based on LLaMA. Empirical results demonstrate that our approach produces new and effective parameter-efficient modules that significantly outperform existing ones across all settings.

The widely used stochastic gradient methods for minimizing nonconvex composite objective functions require the Lipschitz smoothness of the differentiable part. But the requirement does not hold true for problem classes including quadratic inverse problems and training neural networks. To address this issue, we investigate a family of stochastic Bregman proximal gradient (SBPG) methods, which only require smooth adaptivity of the differentiable part. SBPG replaces the upper quadratic approximation used in SGD with the Bregman proximity measure, resulting in a better approximation model that captures the non-Lipschitz gradients of the nonconvex objective. We formulate the vanilla SBPG and establish its convergence properties under nonconvex setting without finite-sum structure. Experimental results on quadratic inverse problems testify the robustness of SBPG. Moreover, we propose a momentum-based version of SBPG (MSBPG) and prove it has improved convergence properties. We apply MSBPG to the training of deep neural networks with a polynomial kernel function, which ensures the smooth adaptivity of the loss function. Experimental results on representative benchmarks demonstrate the effectiveness and robustness of MSBPG in training neural networks. Since the additional computation cost of MSBPG compared with SGD is negligible in large-scale optimization, MSBPG can potentially be employed an universal open-source optimizer in the future.

In this article, we present a method for increasing adaptivity of an existing robust estimation algorithm by learning two parameters to better fit the residual distribution. The analyzed method uses these two parameters to calculate weights for Iterative Re-weighted Least Squares. This adaptive nature of the weights can be helpful in situations where the noise level varies in the measurements. We test our algorithm first on the point cloud registration problem with synthetic data sets and LiDAR odometry with open source real-world data sets. We show that the existing approach needs an additional manual tuning of a residual scale parameter which our method directly learns from data and has similar or better performance. We further present the idea of decoupling scale and shape parameters to improve performance of the algorithm. We give detailed analysis of our algorithm along with its comparison with similar well-known algorithms from literature to show the benefits of the proposed approach.

Moving horizon estimation (MHE) is a widely studied state estimation approach in several practical applications. In the MHE problem, the state estimates are obtained via the solution of an approximated nonlinear optimization problem. However, this optimization step is known to be computationally complex. Given this limitation, this paper investigates the idea of iteratively preconditioned gradient-descent (IPG) to solve MHE problem with the aim of an improved performance than the existing solution techniques. To our knowledge, the preconditioning technique is used for the first time in this paper to reduce the computational cost and accelerate the crucial optimization step for MHE. The convergence guarantee of the proposed iterative approach for a class of MHE problems is presented. Additionally, sufficient conditions for the MHE problem to be convex are also derived. Finally, the proposed method is implemented on a unicycle localization example. The simulation results demonstrate that the proposed approach can achieve better accuracy with reduced computational costs.

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

北京阿比特科技有限公司