亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of $K$ possible arms of a bandit machine to play at each time step, where the corresponding arm returns a random reward to the player, potentially from a specific unknown distribution. The target of the player is to collect as many rewards as possible during the process. Despite its simplicity, the MAB model offers an excellent playground for studying the trade-off between exploration versus exploitation and designing effective algorithms for sequential decision-making under uncertainty. Although many asymptotically optimal algorithms have been established, the finite-time behaviors of the stochastic dynamics of the MAB model appear much more challenging to analyze, due to the intertwine between the decision-making and the rewards being collected. In this paper, we employ techniques in statistical physics to analyze the MAB model, which facilitates the characterization of the distribution of cumulative regrets at a finite short time, the central quantity of interest in an MAB algorithm, as well as the intricate dynamical behaviors of the model. Our analytical results, in good agreement with simulations, point to the emergence of an interesting multimodal regret distribution, with large regrets resulting from excess exploitation of sub-optimal arms due to an initial unlucky output from the optimal one.

相關內容

In order to fully harness the potential of dielectric elastomer actu-ators (DEAs) in soft robots, advanced control methods are need-ed. An important groundwork for this is the development of a control-oriented model that can adequately describe the underly-ing dynamics of a DEA. A common feature of existing models is that always custom-made DEAs were investigated. This makes the modelling process easier, as all specifications and the struc-ture of the actuator are well known. In the case of a commercial actuator, however, only the information from the manufacturer is available and must be checked or completed during the modelling process. The aim of this paper is to explore how a commercial stacked silicone-based DEA can be modelled and how complex the model should be to properly replicate the features of the actu-ator. The static description has demonstrated the suitability of Hooke's law. In the case of dynamic description, it is shown that no viscoelastic model is needed for control-oriented modelling. However, if all features of the DEA are considered, the general-ized Kelvin-Maxwell model with three Maxwell elements shows good results, stability and computational efficiency.

Training robust speaker verification systems without speaker labels has long been a challenging task. Previous studies observed a large performance gap between self-supervised and fully supervised methods. In this paper, we apply a non-contrastive self-supervised learning framework called DIstillation with NO labels (DINO) and propose two regularization terms applied to embeddings in DINO. One regularization term guarantees the diversity of the embeddings, while the other regularization term decorrelates the variables of each embedding. The effectiveness of various data augmentation techniques are explored, on both time and frequency domain. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the regularized DINO framework in speaker verification. Our method achieves the state-of-the-art speaker verification performance under a single-stage self-supervised setting on VoxCeleb. Code has been made publicly available at //github.com/alibaba-damo-academy/3D-Speaker.

Penalties that induce smoothness are common in nonparametric regression. In many settings, the amount of smoothness in the data generating function will not be known. Simon and Shojaie (2021) derived convergence rates for nonparametric estimators under misspecified smoothness. We show that their theoretical convergence rates can be improved by working with convenient approximating functions. Properties of convolutions and higher-order kernels allow these approximation functions to match the true functions more closely than those used in Simon and Shojaie (2021). As a result, we obtain tighter convergence rates.

Navigating automated driving systems (ADSs) through complex driving environments is difficult. Predicting the driving behavior of surrounding human-driven vehicles (HDVs) is a critical component of an ADS. This paper proposes an enhanced motion-planning approach for an ADS in a highway-merging scenario. The proposed enhanced approach utilizes the results of two aspects: the driving behavior and long-term trajectory of surrounding HDVs, which are coupled using a hierarchical model that is used for the motion planning of an ADS to improve driving safety.

We present Surjective Sequential Neural Likelihood (SSNL) estimation, a novel method for simulation-based inference in models where the evaluation of the likelihood function is not tractable and only a simulator that can generate synthetic data is available. SSNL fits a dimensionality-reducing surjective normalizing flow model and uses it as a surrogate likelihood function which allows for conventional Bayesian inference using either Markov chain Monte Carlo methods or variational inference. By embedding the data in a low-dimensional space, SSNL solves several issues previous likelihood-based methods had when applied to high-dimensional data sets that, for instance, contain non-informative data dimensions or lie along a lower-dimensional manifold. We evaluate SSNL on a wide variety of experiments and show that it generally outperforms contemporary methods used in simulation-based inference, for instance, on a challenging real-world example from astrophysics which models the magnetic field strength of the sun using a solar dynamo model.

Network control theory can be used to model how one should steer the brain between different states by driving a specific region with an input. The needed energy to control a network is often used to quantify its controllability, and controlling brain networks requires diverse energy depending on the selected input region. We use the theory of how input node placement affects the longest control chain (LCC) in the controllability of brain networks to study the role of the architecture of white matter fibers in the required control energy. We show that the energy needed to control human brain networks is related to the LCC, i.e., the longest distance between the input region and other regions in the network. We indicate that regions that control brain networks with lower energy have small LCCs. These regions align with areas that can steer the brain around the state space smoothly. By contrast, regions that need higher energy to move the brain toward different target states have larger LCCs. We also investigate the role of the number of paths between regions in the control energy. Our results show that the more paths between regions, the lower cost needed to control brain networks. We evaluate the number of paths by counting specific motifs in brain networks since determining all paths in graphs is a difficult problem.

We develop a novel randomised block coordinate primal-dual algorithm for a class of non-smooth ill-posed convex programs. Lying in the midway between the celebrated Chambolle-Pock primal-dual algorithm and Tseng's accelerated proximal gradient method, we establish global convergence of the last iterate as well optimal $O(1/k)$ and $O(1/k^{2})$ complexity rates in the convex and strongly convex case, respectively, $k$ being the iteration count. Motivated by the increased complexity in the control of distribution level electric power systems, we test the performance of our method on a second-order cone relaxation of an AC-OPF problem. Distributed control is achieved via the distributed locational marginal prices (DLMPs), which are obtained \revise{as} dual variables in our optimisation framework.

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: //github.com/wangxiao5791509/MultiModal_BigModels_Survey

In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.

北京阿比特科技有限公司