亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper addresses the challenge of cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors e.g., DE-ViT~\cite{zhang2023detect} have excelled in both open-vocabulary object detection and traditional few-shot object detection, detecting categories beyond those seen during training, we thus naturally raise two key questions: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If no, how to enhance the results of open-set methods when faced with significant domain gaps? To address the first question, we introduce several metrics to quantify domain variances and establish a new CD-FSOD benchmark with diverse domain metric values. Some State-Of-The-Art (SOTA) open-set object detection methods are evaluated on this benchmark, with evident performance degradation observed across out-of-domain datasets. This indicates the failure of adopting open-set detectors directly for CD-FSOD. Sequentially, to overcome the performance degradation issue and also to answer the second proposed question, we endeavor to enhance the vanilla DE-ViT. With several novel components including finetuning, a learnable prototype module, and a lightweight attention module, we present an improved Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO). Experiments show that our CD-ViTO achieves impressive results on both out-of-domain and in-domain target datasets, establishing new SOTAs for both CD-FSOD and FSOD. All the datasets, codes, and models will be released to the community.

相關內容

小樣本學習(Few-Shot Learning,以下簡稱 FSL )用于解決當可用的數據量比較少時,如何提升神經網絡的性能。在 FSL 中,經常用到的一類方法被稱為 Meta-learning。和普通的神經網絡的訓練方法一樣,Meta-learning 也包含訓練過程和測試過程,但是它的訓練過程被稱作 Meta-training 和 Meta-testing。

Multi-robot rendezvous and exploration are fundamental challenges in the domain of mobile robotic systems. This paper addresses multi-robot rendezvous within an initially unknown environment where communication is only possible after the rendezvous. Traditionally, exploration has been focused on rapidly mapping the environment, often leading to suboptimal rendezvous performance in later stages. We adapt a standard frontier-based exploration technique to integrate exploration and rendezvous into a unified strategy, with a mechanism that allows robots to re-visit previously explored regions thus enhancing rendezvous opportunities. We validate our approach in 3D realistic simulations using ROS, showcasing its effectiveness in achieving faster rendezvous times compared to exploration strategies.

We present a novel approach to automatically synthesize "wayfinding instructions" for an embodied robot agent. In contrast to prior approaches that are heavily reliant on human-annotated datasets designed exclusively for specific simulation platforms, our algorithm uses in-context learning to condition an LLM to generate instructions using just a few references. Using an LLM-based Visual Question Answering strategy, we gather detailed information about the environment which is used by the LLM for instruction synthesis. We implement our approach on multiple simulation platforms including Matterport3D, AI Habitat and ThreeDWorld, thereby demonstrating its platform-agnostic nature. We subjectively evaluate our approach via a user study and observe that 83.3% of users find the synthesized instructions accurately capture the details of the environment and show characteristics similar to those of human-generated instructions. Further, we conduct zero-shot navigation with multiple approaches on the REVERIE dataset using the generated instructions, and observe very close correlation with the baseline on standard success metrics (< 1% change in SR), quantifying the viability of generated instructions in replacing human-annotated data. To the best of our knowledge, ours is the first LLM-driven approach capable of generating "human-like" instructions in a platform-agnostic manner, without requiring any form of training.

Utilizing unmanned aerial vehicles (UAVs) with edge server to assist terrestrial mobile edge computing (MEC) has attracted tremendous attention. Nevertheless, state-of-the-art schemes based on deterministic optimizations or single-objective reinforcement learning (RL) cannot reduce the backlog of task bits and simultaneously improve energy efficiency in highly dynamic network environments, where the design problem amounts to a sequential decision-making problem. In order to address the aforementioned problems, as well as the curses of dimensionality introduced by the growing number of terrestrial terrestrial users, this paper proposes a distributed multi-objective (MO) dynamic trajectory planning and offloading scheduling scheme, integrated with MORL and the kernel method. The design of n-step return is also applied to average fluctuations in the backlog. Numerical results reveal that the n-step return can benefit the proposed kernel-based approach, achieving significant improvement in the long-term average backlog performance, compared to the conventional 1-step return design. Due to such design and the kernel-based neural network, to which decision-making features can be continuously added, the kernel-based approach can outperform the approach based on fully-connected deep neural network, yielding improvement in energy consumption and the backlog performance, as well as a significant reduction in decision-making and online learning time.

We introduce a new class of hardware trojans called interrupt-resilient trojans (IRTs). Our work is motivated by the observation that hardware trojan attacks on CPUs, even under favorable attack scenarios (e.g., an attacker with local system access), are affected by unpredictability due to non-deterministic context switching events. As we confirm experimentally, these events can lead to race conditions between trigger signals and the CPU events targeted by the trojan payloads (e.g., a CPU memory access), thus affecting the reliability of the attacks. Our work shows that interrupt-resilient trojans can successfully address the problem of non-deterministic triggering in CPUs, thereby providing high reliability guarantees in the implementation of sophisticated hardware trojan attacks. Specifically, we successfully utilize IRTs in different attack scenarios against a Linux-capable CPU design and showcase its resilience against context-switching events. More importantly, we show that our design allows for seamless integration during fabrication stage attacks.We evaluate different strategies for the implementation of our attacks on a tape-out ready high-speed RISC-V microarchitecture in a 28nm commercial technology process and successfully implement them with an average overhead delay of only 20 picoseconds, while leaving the sign-off characteristics of the layout intact. In doing so, we challenge the common wisdom regarding the low flexibility of late supply chain stages (e.g., fabrication) for the insertion of powerful trojans. To promote further research on microprocessor trojans, we open-source our designs and provide the accompanying supporting software logic.

This paper presents a novel framework for continual feature selection (CFS) in data preprocessing, particularly in the context of an open and dynamic environment where unknown classes may emerge. CFS encounters two primary challenges: the discovery of unknown knowledge and the transfer of known knowledge. To this end, the proposed CFS method combines the strengths of continual learning (CL) with granular-ball computing (GBC), which focuses on constructing a granular-ball knowledge base to detect unknown classes and facilitate the transfer of previously learned knowledge for further feature selection. CFS consists of two stages: initial learning and open learning. The former aims to establish an initial knowledge base through multi-granularity representation using granular-balls. The latter utilizes prior granular-ball knowledge to identify unknowns, updates the knowledge base for granular-ball knowledge transfer, reinforces old knowledge, and integrates new knowledge. Subsequently, we devise an optimal feature subset mechanism that incorporates minimal new features into the existing optimal subset, often yielding superior results during each period. Extensive experimental results on public benchmark datasets demonstrate our method's superiority in terms of both effectiveness and efficiency compared to state-of-the-art feature selection methods.

In recent research, significant attention has been devoted to the open-vocabulary object detection task, aiming to generalize beyond the limited number of classes labeled during training and detect objects described by arbitrary category names at inference. Compared with conventional object detection, open vocabulary object detection largely extends the object detection categories. However, it relies on calculating the similarity between image regions and a set of arbitrary category names with a pretrained vision-and-language model. This implies that, despite its open-set nature, the task still needs the predefined object categories during the inference stage. This raises the question: What if we do not have exact knowledge of object categories during inference? In this paper, we call such a new setting as generative open-ended object detection, which is a more general and practical problem. To address it, we formulate object detection as a generative problem and propose a simple framework named GenerateU, which can detect dense objects and generate their names in a free-form way. Particularly, we employ Deformable DETR as a region proposal generator with a language model translating visual regions to object names. To assess the free-form object detection task, we introduce an evaluation method designed to quantitatively measure the performance of generative outcomes. Extensive experiments demonstrate strong zero-shot detection performance of our GenerateU. For example, on the LVIS dataset, our GenerateU achieves comparable results to the open-vocabulary object detection method GLIP, even though the category names are not seen by GenerateU during inference. Code is available at: // github.com/FoundationVision/GenerateU .

Modern large language models (LLMs) exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced novel modes of interaction, the influence of role-playing on LLMs' reasoning abilities remains underexplored. In this study, we introduce a strategically designed role-play prompting methodology and assess its performance under the zero-shot setting across twelve diverse reasoning benchmarks. Our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, in experiments conducted using ChatGPT, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%.Upon further comparison with the Zero-Shot-CoT technique, which prompts the model to "think step by step", our study demonstrates that role-play prompting acts as a more effective trigger for the CoT process. This highlights its potential to augment the reasoning capabilities of LLMs. We release our code at //github.com/NKU-HLT/Role-Play-Prompting.

Source-free domain adaptation (SFDA) aims to adapt a source-trained model to an unlabeled target domain without access to the source data. SFDA has attracted growing attention in recent years, where existing approaches focus on self-training that usually includes pseudo-labeling techniques. In this paper, we introduce a novel noise-learning approach tailored to address noise distribution in domain adaptation settings and learn to de-confuse the pseudo-labels. More specifically, we learn a noise transition matrix of the pseudo-labels to capture the label corruption of each class and learn the underlying true label distribution. Estimating the noise transition matrix enables a better true class-posterior estimation, resulting in better prediction accuracy. We demonstrate the effectiveness of our approach when combined with several SFDA methods: SHOT, SHOT++, and AaD. We obtain state-of-the-art results on three domain adaptation datasets: VisDA, DomainNet, and OfficeHome.

Few-shot image classification aims to classify unseen classes with limited labeled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta learning becomes an essential component and can largely affects the performance in practice. To this end, many pre-trained methods have been proposed, and most of them are trained in supervised way with limited transfer ability for unseen classes. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide slow and robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB). Based on the evaluation results, the proposed method achieves significantly better performance, i.e., improve 1-shot and 5-shot tasks by nearly \textbf{3\%} and \textbf{4\%} on MiniImageNet, by nearly \textbf{9\%} and \textbf{3\%} on CUB. Moreover, the proposed method can gain the improvement of (\textbf{15\%}, \textbf{13\%}) on MiniImageNet and (\textbf{15\%}, \textbf{8\%}) on CUB by pretraining using more unlabeled data. Our code will be available at \hyperref[//github.com/phecy/SSL-FEW-SHOT.]{//github.com/phecy/ssl-few-shot.}

Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network (DySAT), a novel neural architecture that operates on dynamic graphs and learns node representations that capture both structural properties and temporal evolutionary patterns. Specifically, DySAT computes node representations by jointly employing self-attention layers along two dimensions: structural neighborhood and temporal dynamics. We conduct link prediction experiments on two classes of graphs: communication networks and bipartite rating networks. Our experimental results show that DySAT has a significant performance gain over several different state-of-the-art graph embedding baselines.

北京阿比特科技有限公司