亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$ for the convergence rate of DASGD, where $S_{avg}$ is the average staleness between models, $Q$ is a constant that bounds the norm of the gradients, and $\epsilon$ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(\sqrt{\hat{S}_{avg}\hat{S}_{max}}\epsilon^{-1})$, with $\hat{S}_{max}$ and $\hat{S}_{avg}$ representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers.

相關內容

Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes for the in-context learning of translations through perturbations of high-quality, in-domain demonstrations. We find that asymmetric perturbation of the source-target mappings yield vastly different results. We show that the perturbation of the source side has surprisingly little impact, while target perturbation can drastically reduce translation quality, suggesting that it is the output text distribution that provides the most important learning signal during in-context learning of translations. We propose a method named Zero-Shot-Context to add this signal automatically in Zero-Shot prompting. We demonstrate that it improves upon the zero-shot translation performance of GPT-3, even making it competitive with few-shot prompted translations.

The Integer Multicommodity Flow problem has been studied extensively in the literature. However, from a parameterised perspective, mostly special cases, such as the Disjoint Paths problem, have been considered. Therefore, we investigate the parameterised complexity of the general Integer Multicommodity Flow problem. We show that the decision version of this problem on directed graphs for a constant number of commodities, when the capacities are given in unary, is XNLP-complete with pathwidth as parameter and XALP-complete with treewidth as parameter. When the capacities are given in binary, the problem is NP-complete even for graphs of pathwidth at most 13. We give related results for undirected graphs. These results imply that the problem is unlikely to be fixed-parameter tractable by these parameters. In contrast, we show that the problem does become fixed-parameter tractable when weighted tree partition width (a variant of tree partition width for edge weighted graphs) is used as parameter.

The RoboCup competitions hold various leagues, and the Soccer Simulation 2D League is a major among them. Soccer Simulation 2D (SS2D) match involves two teams, including 11 players and a coach for each team, competing against each other. The players can only communicate with the Soccer Simulation Server during the game. Several code bases are released publicly to simplify team development. So researchers can easily focus on decision-making and implementing machine learning methods. SS2D actions and behaviors are only partially accurate due to different challenges, such as noise and partial observation. Therefore, one strategy is to implement alternative denoising methods to tackle observation inaccuracy. Our idea is to predict opponent positions while they have yet to be seen in a finite number of cycles using machine learning methods to make more accurate actions such as pass. We will explain our position prediction idea powered by Long Short-Term Memory models (LSTM) and Deep Neural Networks (DNN). The results show that the LSTM and DNN predict the opponents' position more accurately than the standard algorithm, such as the last-seen method.

ELO rating system is proposed by Arpad Elo, a Hungarian-American physics professor. Originally, it was proposed for the ranking system of chess players, but it was soon adapted to many other zero-sum sports fields like football, baseball, basketball , etc. Nowadays, besides the traditional sports games, computer/video games are also playing an important role in social lives especially among the teenagers. In most of the online competition games, player's performance is usually scored and recorded by the game's ranking system. Meanwhile, ranking system like ladder in Dota is not the only the metric for the players to evaluate their gaming strength, an ELO rating score based on players in-game performance is also a decisive factor for gamers' matching. Namely, the matching system will refer to players' score in the ranking system and performance score system to ensure the matched players will promisingly undergo a balanced game without one team dramatically overwhelming the other. ELO scheme and its variants in modern online competition games aims to ensuring the expected winning rate for each team approaches 50\%. However, ELO rating is also causing compliments among players. In this research, I will dig into the advantages and drawbacks of leveraging ELO ranking system in online games and why it is still employed by game developers despite the fact that it is disliked by most of the players. Also, a new effort based rating scheme will be proposed and compared with ELO scheme under the simulation environment.

Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at //github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.

The concept of causality plays an important role in human cognition . In the past few decades, causal inference has been well developed in many fields, such as computer science, medicine, economics, and education. With the advancement of deep learning techniques, it has been increasingly used in causal inference against counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective optimization functions to estimate counterfactual data unbiasedly based on the different optimization methods. This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.

Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces. We extend this theoretical framework to include continuous features - which occur regularly in real-world input domains and within the hidden layers of GNNs - and we demonstrate the requirement for multiple aggregation functions in this context. Accordingly, we propose Principal Neighbourhood Aggregation (PNA), a novel architecture combining multiple aggregators with degree-scalers (which generalize the sum aggregator). Finally, we compare the capacity of different models to capture and exploit the graph structure via a novel benchmark containing multiple tasks taken from classical graph theory, alongside existing benchmarks from real-world domains, all of which demonstrate the strength of our model. With this work, we hope to steer some of the GNN research towards new aggregation methods which we believe are essential in the search for powerful and robust models.

Interest in the field of Explainable Artificial Intelligence has been growing for decades and has accelerated recently. As Artificial Intelligence models have become more complex, and often more opaque, with the incorporation of complex machine learning techniques, explainability has become more critical. Recently, researchers have been investigating and tackling explainability with a user-centric focus, looking for explanations to consider trustworthiness, comprehensibility, explicit provenance, and context-awareness. In this chapter, we leverage our survey of explanation literature in Artificial Intelligence and closely related fields and use these past efforts to generate a set of explanation types that we feel reflect the expanded needs of explanation for today's artificial intelligence applications. We define each type and provide an example question that would motivate the need for this style of explanation. We believe this set of explanation types will help future system designers in their generation and prioritization of requirements and further help generate explanations that are better aligned to users' and situational needs.

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.

We introduce an effective model to overcome the problem of mode collapse when training Generative Adversarial Networks (GAN). Firstly, we propose a new generator objective that finds it better to tackle mode collapse. And, we apply an independent Autoencoders (AE) to constrain the generator and consider its reconstructed samples as "real" samples to slow down the convergence of discriminator that enables to reduce the gradient vanishing problem and stabilize the model. Secondly, from mappings between latent and data spaces provided by AE, we further regularize AE by the relative distance between the latent and data samples to explicitly prevent the generator falling into mode collapse setting. This idea comes when we find a new way to visualize the mode collapse on MNIST dataset. To the best of our knowledge, our method is the first to propose and apply successfully the relative distance of latent and data samples for stabilizing GAN. Thirdly, our proposed model, namely Generative Adversarial Autoencoder Networks (GAAN), is stable and has suffered from neither gradient vanishing nor mode collapse issues, as empirically demonstrated on synthetic, MNIST, MNIST-1K, CelebA and CIFAR-10 datasets. Experimental results show that our method can approximate well multi-modal distribution and achieve better results than state-of-the-art methods on these benchmark datasets. Our model implementation is published here: //github.com/tntrung/gaan

北京阿比特科技有限公司