亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Hindsight Experience Replay (HER) is a technique used in reinforcement learning (RL) that has proven to be very efficient for training off-policy RL-based agents to solve goal-based robotic manipulation tasks using sparse rewards. Even though HER improves the sample efficiency of RL-based agents by learning from mistakes made in past experiences, it does not provide any guidance while exploring the environment. This leads to very large training times due to the volume of experience required to train an agent using this replay strategy. In this paper, we propose a method that uses primitive behaviours that have been previously learned to solve simple tasks in order to guide the agent toward more rewarding actions during exploration while learning other more complex tasks. This guidance, however, is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed by the previously-learned primitive policies. We evaluate our method by comparing its performance against HER and other more efficient variations of this algorithm in several block manipulation tasks. We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time. Code is available at //github.com/franroldans/qmp-her.

相關內容

In the past decades, model averaging (MA) has attracted much attention as it has emerged as an alternative tool to the model selection (MS) statistical approach. Hansen [\emph{Econometrica} \textbf{75} (2007) 1175--1189] introduced a Mallows model averaging (MMA) method with model weights selected by minimizing a Mallows' $C_p$ criterion. The main theoretical justification for MMA is an asymptotic optimality (AOP), which states that the risk/loss of the resulting MA estimator is asymptotically equivalent to that of the best but infeasible averaged model. MMA's AOP is proved in the literature by either constraining weights in a special discrete weight set or limiting the number of candidate models. In this work, it is first shown that under these restrictions, however, the optimal risk of MA becomes an unreachable target, and MMA may converge more slowly than MS. In this background, a foundational issue that has not been addressed is: When a suitably large set of candidate models is considered, and the model weights are not harmfully constrained, can the MMA estimator perform asymptotically as well as the optimal convex combination of the candidate models? We answer this question in a nested model setting commonly adopted in the area of MA. We provide finite sample inequalities for the risk of MMA and show that without unnatural restrictions on the candidate models, MMA's AOP holds in a general continuous weight set under certain mild conditions. Several specific methods for constructing the candidate model sets are proposed. Implications on minimax adaptivity are given as well. The results from simulations back up our theoretical findings.

Cloth-changing Person Re-Identification (CC-ReID) is a challenging task that aims to retrieve the target person across multiple surveillance cameras when clothing changes might happen. Despite recent progress in CC-ReID, existing approaches are still hindered by the interference of clothing variations since they lack effective constraints to keep the model consistently focused on clothing-irrelevant regions. To address this issue, we present a Semantic-aware Consistency Network (SCNet) to learn identity-related semantic features by proposing effective consistency constraints. Specifically, we generate the black-clothing image by erasing pixels in the clothing area, which explicitly mitigates the interference from clothing variations. In addition, to fully exploit the fine-grained identity information, a head-enhanced attention module is introduced, which learns soft attention maps by utilizing the proposed part-based matching loss to highlight head information. We further design a semantic consistency loss to facilitate the learning of high-level identity-related semantic features, forcing the model to focus on semantically consistent cloth-irrelevant regions. By using the consistency constraint, our model does not require any extra auxiliary segmentation module to generate the black-clothing image or locate the head region during the inference stage. Extensive experiments on four cloth-changing person Re-ID datasets (LTCC, PRCC, Vc-Clothes, and DeepChange) demonstrate that our proposed SCNet makes significant improvements over prior state-of-the-art approaches. Our code is available at: //github.com/Gpn-star/SCNet.

As Large Language Models (LLMs) are becoming prevalent in various fields, there is an urgent need for improved NLP benchmarks that encompass all the necessary knowledge of individual discipline. Many contemporary benchmarks for foundational models emphasize a broad range of subjects but often fall short in presenting all the critical subjects and encompassing necessary professional knowledge of them. This shortfall has led to skewed results, given that LLMs exhibit varying performance across different subjects and knowledge areas. To address this issue, we present psybench, the first comprehensive Chinese evaluation suite that covers all the necessary knowledge required for graduate entrance exams. psybench offers a deep evaluation of a model's strengths and weaknesses in psychology through multiple-choice questions. Our findings show significant differences in performance across different sections of a subject, highlighting the risk of skewed results when the knowledge in test sets is not balanced. Notably, only the ChatGPT model reaches an average accuracy above $70\%$, indicating that there is still plenty of room for improvement. We expect that psybench will help to conduct thorough evaluations of base models' strengths and weaknesses and assist in practical application in the field of psychology.

Split Federated Learning (SFL) has recently emerged as a promising distributed learning technology, leveraging the strengths of both federated learning and split learning. It emphasizes the advantages of rapid convergence while addressing privacy concerns. As a result, this innovation has received significant attention from both industry and academia. However, since the model is split at a specific layer, known as a cut layer, into both client-side and server-side models for the SFL, the choice of the cut layer in SFL can have a substantial impact on the energy consumption of clients and their privacy, as it influences the training burden and the output of the client-side models. Moreover, the design challenge of determining the cut layer is highly intricate, primarily due to the inherent heterogeneity in the computing and networking capabilities of clients. In this article, we provide a comprehensive overview of the SFL process and conduct a thorough analysis of energy consumption and privacy. This analysis takes into account the influence of various system parameters on the cut layer selection strategy. Additionally, we provide an illustrative example of the cut layer selection, aiming to minimize the risk of clients from reconstructing the raw data at the server while sustaining energy consumption within the required energy budget, which involve trade-offs. Finally, we address open challenges in this field including their applications to 6G technology. These directions represent promising avenues for future research and development.

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.

Generalized linear models (GLMs) are popular for data-analysis in almost all quantitative sciences, but the choice of likelihood family and link function is often difficult. This motivates the search for likelihoods and links that minimize the impact of potential misspecification. We perform a large-scale simulation study on double-bounded and lower-bounded response data where we systematically vary both true and assumed likelihoods and links. In contrast to previous studies, we also study posterior calibration and uncertainty metrics in addition to point-estimate accuracy. Our results indicate that certain likelihoods and links can be remarkably robust to misspecification, performing almost on par with their respective true counterparts. Additionally, normal likelihood models with identity link (i.e., linear regression) often achieve calibration comparable to the more structurally faithful alternatives, at least in the studied scenarios. On the basis of our findings, we provide practical suggestions for robust likelihood and link choices in GLMs.

Stochastic PDEs of Fluctuating Hydrodynamics are a powerful tool for the description of fluctuations in many-particle systems. In this paper, we develop and analyze a Multilevel Monte Carlo (MLMC) scheme for the Dean-Kawasaki equation, a pivotal representative of this class of SPDEs. We prove analytically and demonstrate numerically that our MLMC scheme provides a significant speed-up (with respect to a standard Monte Carlo method) in the simulation of the Dean-Kawasaki equation. Specifically, we quantify how the speed-up factor increases as the average particle density increases, and show that sizeable speed-ups can be obtained even in regimes of low particle density. Numerical simulations are provided in the two-dimensional case, confirming our theoretical predictions. Our results are formulated entirely in terms of the law of distributions rather than in terms of strong spatial norms: this crucially allows for MLMC speed-ups altogether despite the Dean-Kawasaki equation being highly singular.

Contrastive loss has been increasingly used in learning representations from multiple modalities. In the limit, the nature of the contrastive loss encourages modalities to exactly match each other in the latent space. Yet it remains an open question how the modality alignment affects the downstream task performance. In this paper, based on an information-theoretic argument, we first prove that exact modality alignment is sub-optimal in general for downstream prediction tasks. Hence we advocate that the key of better performance lies in meaningful latent modality structures instead of perfect modality alignment. To this end, we propose three general approaches to construct latent modality structures. Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization. Extensive experiments are conducted on two popular multi-modal representation learning frameworks: the CLIP-based two-tower model and the ALBEF-based fusion model. We test our model on a variety of tasks including zero/few-shot image classification, image-text retrieval, visual question answering, visual reasoning, and visual entailment. Our method achieves consistent improvements over existing methods, demonstrating the effectiveness and generalizability of our proposed approach on latent modality structure regularization.

Knowledge Distillation (KD) is a widely-used technology to inherit information from cumbersome teacher models to compact student models, consequently realizing model compression and acceleration. Compared with image classification, object detection is a more complex task, and designing specific KD methods for object detection is non-trivial. In this work, we elaborately study the behaviour difference between the teacher and student detection models, and obtain two intriguing observations: First, the teacher and student rank their detected candidate boxes quite differently, which results in their precision discrepancy. Second, there is a considerable gap between the feature response differences and prediction differences between teacher and student, indicating that equally imitating all the feature maps of the teacher is the sub-optimal choice for improving the student's accuracy. Based on the two observations, we propose Rank Mimicking (RM) and Prediction-guided Feature Imitation (PFI) for distilling one-stage detectors, respectively. RM takes the rank of candidate boxes from teachers as a new form of knowledge to distill, which consistently outperforms the traditional soft label distillation. PFI attempts to correlate feature differences with prediction differences, making feature imitation directly help to improve the student's accuracy. On MS COCO and PASCAL VOC benchmarks, extensive experiments are conducted on various detectors with different backbones to validate the effectiveness of our method. Specifically, RetinaNet with ResNet50 achieves 40.4% mAP in MS COCO, which is 3.5% higher than its baseline, and also outperforms previous KD methods.

Spectral clustering (SC) is a popular clustering technique to find strongly connected communities on a graph. SC can be used in Graph Neural Networks (GNNs) to implement pooling operations that aggregate nodes belonging to the same cluster. However, the eigendecomposition of the Laplacian is expensive and, since clustering results are graph-specific, pooling methods based on SC must perform a new optimization for each new sample. In this paper, we propose a graph clustering approach that addresses these limitations of SC. We formulate a continuous relaxation of the normalized minCUT problem and train a GNN to compute cluster assignments that minimize this objective. Our GNN-based implementation is differentiable, does not require to compute the spectral decomposition, and learns a clustering function that can be quickly evaluated on out-of-sample graphs. From the proposed clustering method, we design a graph pooling operator that overcomes some important limitations of state-of-the-art graph pooling techniques and achieves the best performance in several supervised and unsupervised tasks.

北京阿比特科技有限公司