亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In the past several years, the convergence of the last iterate of the Stochastic Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding. For Lipschitz and convex functions, different works have established the optimal $O(\log(1/\delta)\log T/\sqrt{T})$ or $O(\sqrt{\log(1/\delta)/T})$ high-probability convergence rates for the final iterate, where $T$ is the time horizon and $\delta$ is the failure probability. However, to prove these bounds, all the existing works are limited to compact domains or require almost surely bounded noises. It is natural to ask whether the last iterate of SGD can still guarantee the optimal convergence rate but without these two restrictive assumptions. Besides this important question, there are still lots of theoretical problems lacking an answer. For example, compared with the last iterate convergence of SGD for non-smooth problems, only few results for smooth optimization have yet been developed. Additionally, the existing results are all limited to a non-composite objective and the standard Euclidean norm. It still remains unclear whether the last-iterate convergence can be provably extended to wider composite optimization and non-Euclidean norms. In this work, to address the issues mentioned above, we revisit the last-iterate convergence of stochastic gradient methods and provide the first unified way to prove the convergence rates both in expectation and in high probability to accommodate general domains, composite objectives, non-Euclidean norms, Lipschitz conditions, smoothness and (strong) convexity simultaneously. Additionally, we extend our analysis to obtain the last-iterate convergence under heavy-tailed noises.

相關內容

Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error, and vice versa. We study this distortion-perception (DP) tradeoff over finite-alphabet channels, for the Wasserstein-$1$ distance induced by a general metric as the perception index, and an arbitrary distortion matrix. Under this setting, we show that computing the DP function and the optimal reconstructions is equivalent to solving a set of linear programming problems. We provide a structural characterization of the DP tradeoff, where the DP function is piecewise linear in the perception index. We further derive a closed-form expression for the case of binary sources.

Runge-Kutta (RK) methods may exhibit order reduction when applied to stiff problems. For linear problems with time-independent operators, order reduction can be avoided if the method satisfies certain weak stage order (WSO) conditions, which are less restrictive than traditional stage order conditions. This paper outlines the first algebraic theory of WSO, and establishes general order barriers that relate the WSO of a RK scheme to its order and number of stages for both fully-implicit and DIRK schemes. It is shown in several scenarios that the constructed bounds are sharp. The theory characterizes WSO in terms of orthogonal invariant subspaces and associated minimal polynomials. The resulting necessary conditions on the structure of RK methods with WSO are then shown to be of practical use for the construction of such schemes.

Neural Style Transfer (NST) refers to a class of algorithms able to manipulate an element, most often images, to adopt the appearance or style of another one. Each element is defined as a combination of Content and Style: the Content can be conceptually defined as the what and the Style as the how of said element. In this context, we propose a custom NST framework for transferring a set of styles to the motion of a robotic manipulator, e.g., the same robotic task can be carried out in an angry, happy, calm, or sad way. An autoencoder architecture extracts and defines the Content and the Style of the target robot motions. A Twin Delayed Deep Deterministic Policy Gradient (TD3) network generates the robot control policy using the loss defined by the autoencoder. The proposed Neural Policy Style Transfer TD3 (NPST3) alters the robot motion by introducing the trained style. Such an approach can be implemented either offline, for carrying out autonomous robot motions in dynamic environments, or online, for adapting at runtime the style of a teleoperated robot. The considered styles can be learned online from human demonstrations. We carried out an evaluation with human subjects enrolling 73 volunteers, asking them to recognize the style behind some representative robotic motions. Results show a good recognition rate, proving that it is possible to convey different styles to a robot using this approach.

We present a simple argument using Promise Theory and dimensional analysis for the Dunbar scaling hierarchy, supported by recent data from group formation in Wikipedia editing. We show how the assumption of a common priority seeds group alignment until the costs associated with attending to the group outweigh the benefits in a detailed balance scenario. Subject to partial efficiency of implementing promised intentions, we can reproduce a series of compatible rates that balance growth with entropy.

Combining the strengths of many existing predictors to obtain a Mixture of Experts which is superior to its individual components is an effective way to improve the performance without having to develop new architectures or train a model from scratch. However, surprisingly, we find that na\"ively combining expert object detectors in a similar way to Deep Ensembles, can often lead to degraded performance. We identify that the primary cause of this issue is that the predictions of the experts do not match their performance, a term referred to as miscalibration. Consequently, the most confident detector dominates the final predictions, preventing the mixture from leveraging all the predictions from the experts appropriately. To address this, when constructing the Mixture of Experts, we propose to combine their predictions in a manner which reflects the individual performance of the experts; an objective we achieve by first calibrating the predictions before filtering and refining them. We term this approach the Mixture of Calibrated Experts and demonstrate its effectiveness through extensive experiments on 5 different detection tasks using a variety of detectors, showing that it: (i) improves object detectors on COCO and instance segmentation methods on LVIS by up to $\sim 2.5$ AP; (ii) reaches state-of-the-art on COCO test-dev with $65.1$ AP and on DOTA with $82.62$ $\mathrm{AP_{50}}$; (iii) outperforms single models consistently on recent detection tasks such as Open Vocabulary Object Detection.

The increased utilization of Artificial Intelligence (AI) solutions brings with it inherent risks, such as misclassification and sub-optimal execution time performance, due to errors introduced in their deployment infrastructure because of problematic configuration and software faults. On top of that, AI methods such as Deep Neural Networks (DNNs) are utilized to perform demanding, resource-intensive and even safety-critical tasks, and in order to effectively increase the performance of the DNN models deployed, a variety of Machine Learning (ML) compilers have been developed, allowing compatibility of DNNs with a variety of hardware acceleration devices, such as GPUs and TPUs. Furthermore the correctness of the compilation process should be verified. In order to allow developers and researchers to explore the robustness of DNN models deployed on different hardware accelerators via ML compilers, in this paper we propose MutateNN, a tool that provides mutation testing and model analysis features in the context of deployment on different hardware accelerators. To demonstrate the capabilities of MutateNN, we focus on the image recognition domain by applying mutation testing to 7 well-established models utilized for image classification. We instruct 21 mutations of 6 different categories, and deploy our mutants on 4 different hardware acceleration devices of varying capabilities. Our results indicate that models are proven robust to changes related to layer modifications and arithmetic operators, while presenting discrepancies of up to 90.3% in mutants related to conditional operators. We also observed unexpectedly severe performance degradation on mutations related to arithmetic types of variables, leading the mutants to produce the same classifications for all dataset inputs.

We investigate the constant-depth circuit complexity of the Isomorphism Problem, Minimum Generating Set Problem (MGS), and Sub(quasi)group Membership Problem (Membership) for groups and quasigroups (=Latin squares), given as input in terms of their multiplication (Cayley) tables. Despite decades of research on these problems, lower bounds for these problems even against depth-$2$ AC circuits remain unknown. Perhaps surprisingly, Chattopadhyay, Tor\'an, and Wagner (FSTTCS 2010; ACM Trans. Comput. Theory, 2013) showed that Quasigroup Isomorphism could be solved by AC circuits of depth $O(\log \log n)$ using $O(\log^2 n)$ nondeterministic bits, a class we denote $\exists^{\log^2(n)}FOLL$. We narrow this gap by improving the upper bound for many of these problems to $quasiAC^0$, thus decreasing the depth to constant. In particular, we show: - MGS for quasigroups is in $\exists^{\log^2(n)}\forall^{\log n}NTIME(\mathrm{polylog}(n))\subseteq quasiAC^0$. Papadimitriou and Yannakakis (J. Comput. Syst. Sci., 1996) conjectured that this problem was $\exists^{\log^2(n)}P$-complete; our results refute a version of that conjecture for completeness under $quasiAC^0$ reductions unconditionally, and under polylog-space reductions assuming EXP $\neq$ PSPACE. - MGS for groups is in $AC^{1}(L)$, improving on the previous upper bound of P (Lucchini & Thakkar, J. Algebra, 2024). - Quasigroup Isomorphism belongs to $\exists^{\log^2(n)}AC^0(DTISP(\mathrm{polylog},\log)\subseteq quasiAC^0$, improving on the previous bound of $\exists^{\log^2(n)}L\cap\exists^{\log^2(n)}FOLL\subseteq quasiFOLL$ (Chattopadhyay, Tor\'an, & Wagner, ibid.; Levet, Australas. J. Combin., 2023). Our results suggest that understanding the constant-depth circuit complexity may be key to resolving the complexity of problems concerning (quasi)groups in the multiplication table model.

Explainable Artificial Intelligence (XAI) is transforming the field of Artificial Intelligence (AI) by enhancing the trust of end-users in machines. As the number of connected devices keeps on growing, the Internet of Things (IoT) market needs to be trustworthy for the end-users. However, existing literature still lacks a systematic and comprehensive survey work on the use of XAI for IoT. To bridge this lacking, in this paper, we address the XAI frameworks with a focus on their characteristics and support for IoT. We illustrate the widely-used XAI services for IoT applications, such as security enhancement, Internet of Medical Things (IoMT), Industrial IoT (IIoT), and Internet of City Things (IoCT). We also suggest the implementation choice of XAI models over IoT systems in these applications with appropriate examples and summarize the key inferences for future works. Moreover, we present the cutting-edge development in edge XAI structures and the support of sixth-generation (6G) communication services for IoT applications, along with key inferences. In a nutshell, this paper constitutes the first holistic compilation on the development of XAI-based frameworks tailored for the demands of future IoT use cases.

Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.

Deep Convolutional Neural Networks have pushed the state-of-the art for semantic segmentation provided that a large amount of images together with pixel-wise annotations is available. Data collection is expensive and a solution to alleviate it is to use transfer learning. This reduces the amount of annotated data required for the network training but it does not get rid of this heavy processing step. We propose a method of transfer learning without annotations on the target task for datasets with redundant content and distinct pixel distributions. Our method takes advantage of the approximate content alignment of the images between two datasets when the approximation error prevents the reuse of annotation from one dataset to another. Given the annotations for only one dataset, we train a first network in a supervised manner. This network autonomously learns to generate deep data representations relevant to the semantic segmentation. Then the images in the new dataset, we train a new network to generate a deep data representation that matches the one from the first network on the previous dataset. The training consists in a regression between feature maps and does not require any annotations on the new dataset. We show that this method reaches performances similar to a classic transfer learning on the PASCAL VOC dataset with synthetic transformations.

北京阿比特科技有限公司