亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we concentrate on decentralized optimization problems with nonconvex and nonsmooth objective functions, especially on the decentralized training of nonsmooth neural networks. We introduce a unified framework, named DSM, to analyze the global convergence of decentralized stochastic subgradient methods. We prove the global convergence of our proposed framework under mild conditions, by establishing that the generated sequence asymptotically approximates the trajectories of its associated differential inclusion. Furthermore, we establish that our proposed framework encompasses a wide range of existing efficient decentralized subgradient methods, including decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGDm). In addition, we introduce SignSGD employing the sign map to regularize the update directions in DSGDm, and show it is enclosed in our proposed framework. Consequently, our convergence results establish, for the first time, global convergence of these methods when applied to nonsmooth nonconvex objectives. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized subgradient methods with convergence guarantees in the training of nonsmooth neural networks.

相關內容

In this paper, we propose a highly efficient method to estimate an image's mean opinion score (MOS) from a single opinion score (SOS). Assuming that each SOS is the observed sample of a normal distribution and the MOS is its unknown expectation, the MOS inference is formulated as a maximum likelihood estimation problem, where the perceptual correlation of pairwise images is considered in modeling the likelihood of SOS. More specifically, by means of the quality-aware representations learned from the self-supervised backbone, we introduce a learnable relative quality measure to predict the MOS difference between two images. Then, the current image's maximum likelihood estimation towards MOS is represented by the sum of another reference image's estimated MOS and their relative quality. Ideally, no matter which image is selected as the reference, the MOS of the current image should remain unchanged, which is termed perceptual cons tancy constrained calibration (PC3). Finally, we alternatively optimize the relative quality measure's parameter and the current image's estimated MOS via backpropagation and Newton's method respectively. Experiments show that the proposed method is efficient in calibrating the biased SOS and significantly improves IQA model learning when only SOSs are available.

This paper introduces a new class of explanation structures, called robust counterfactual witnesses (RCWs), to provide robust, both counterfactual and factual explanations for graph neural networks. Given a graph neural network M, a robust counterfactual witness refers to the fraction of a graph G that are counterfactual and factual explanation of the results of M over G, but also remains so for any "disturbed" G by flipping up to k of its node pairs. We establish the hardness results, from tractable results to co-NP-hardness, for verifying and generating robust counterfactual witnesses. We study such structures for GNN-based node classification, and present efficient algorithms to verify and generate RCWs. We also provide a parallel algorithm to verify and generate RCWs for large graphs with scalability guarantees. We experimentally verify our explanation generation process for benchmark datasets, and showcase their applications.

Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitate the incorporation of additional hyperparameters, notably the threshold, which defines whether the sample pair is informative. The threshold provides a stable numerical standard for determining whether to retain the pairs. It is a vital parameter to reduce the redundant sample pairs participating in training. Nonetheless, finding the optimal threshold can be a time-consuming endeavor, often requiring extensive grid searches. Because the threshold cannot be dynamically adjusted in the training stage, we should conduct plenty of repeated experiments to determine the threshold. Therefore, we introduce a novel approach for adjusting the thresholds associated with both the loss function and the sample mining strategy. We design a static Asymmetric Sample Mining Strategy (ASMS) and its dynamic version Adaptive Tolerance ASMS (AT-ASMS), tailored for sample mining methods. ASMS utilizes differentiated thresholds to address the problems (too few positive pairs and too many redundant negative pairs) caused by only applying a single threshold to filter samples. AT-ASMS can adaptively regulate the ratio of positive and negative pairs during training according to the ratio of the currently mined positive and negative pairs. This meta-learning-based threshold generation algorithm utilizes a single-step gradient descent to obtain new thresholds. We combine these two threshold adjustment algorithms to form the Dual Dynamic Threshold Adjustment Strategy (DDTAS). Experimental results show that our algorithm achieves competitive performance on CUB200, Cars196, and SOP datasets.

In this paper, we first propose a simple and unified approach to stability of phaseless operator to both amplitude and intensity measurement, both complex and real cases on arbitrary geometric set, thus characterizing the robust performance of phase retrieval via empirical minimization method. The unified analysis involves the random embedding of concave lifting operator on tangent space. Similarly, we investigate structured matrix recovery problem through the robust injectivity of linear rank one measurement operator on arbitrary matrix set. The core of our analysis lies in bounding the empirical chaos process. We introduce Talagrand's $\gamma_{\alpha}$ functionals to characterize the relationship between the required number of measurements and the geometric constraints. Additionally, adversarial noise is generated to illustrate the recovery bounds are sharp in the above situations.

In this paper, we set the mathematical foundations of the Dynamical Low-Rank Approximation (DLRA) method for stochastic differential equations. DLRA aims at approximating the solution as a linear combination of a small number of basis vectors with random coefficients (low rank format) with the peculiarity that both the basis vectors and the random coefficients vary in time. While the formulation and properties of DLRA are now well understood for random/parametric equations, the same cannot be said for SDEs and this work aims to fill this gap. We start by rigorously formulating a Dynamically Orthogonal (DO) approximation (an instance of DLRA successfully used in applications) for SDEs, which we then generalize to define a parametrization independent DLRA for SDEs. We show local well-posedness of the DO equations and their equivalence with the DLRA formulation. We also characterize the explosion time of the DO solution by a loss of linear independence of the random coefficients defining the solution expansion and give sufficient conditions for global existence.

In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of network training. This was achieved only after reformulating the original loss function for a stable implementation. We found that (i)epistemic uncertainty was highly correlated with prediction errors, with various association indices comparable or stronger than those for Monte-Carlo Dropout and Deep Ensemble methods, (ii)the median error varied with uncertainty threshold much more linearly for epistemic uncertainty in Deep Evidential Learning relative to these other two conventional frameworks, indicative of a more uniformly calibrated sensitivity to model errors, (iii)relative to epistemic uncertainty, aleatoric uncertainty demonstrated a more significant shift in its distribution in response to Gaussian noise added to CT intensity, compatible with its interpretation as reflecting data noise. Collectively, our results suggest that Deep Evidential Learning is a promising approach that can endow deep-learning models in radiotherapy dose prediction with statistical robustness. Towards enhancing its clinical relevance, we demonstrate how we can use such a model to construct the predicted Dose-Volume-Histograms' confidence intervals.

In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.

In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.

In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax

北京阿比特科技有限公司