Robust and accurate localization in challenging environments is becoming crucial for SLAM. In this paper, we propose a unique sensor configuration for precise and robust odometry by integrating chip radar and a legged robot. Specifically, we introduce a tightly coupled radar-leg odometry algorithm for complementary drift correction. Adopting the 4-DoF optimization and decoupled RANSAC to mmWave chip radar significantly enhances radar odometry beyond the existing method, especially z-directional even when using a single radar. For the leg odometry, we employ rolling contact modeling-aided forward kinematics, accommodating scenarios with the potential possibility of contact drift and radar failure. We evaluate our method by comparing it with other chip radar odometry algorithms using real-world datasets with diverse environments while the datasets will be released for the robotics community. //github.com/SangwooJung98/Co-RaL-Dataset
Despite the considerable advancements achieved by deep neural networks, their performance tends to degenerate when the test environment diverges from the training ones. Domain generalization (DG) solves this issue by learning representations independent of domain-related information, thus facilitating extrapolation to unseen environments. Existing approaches typically focus on formulating tailored training objectives to extract shared features from the source data. However, the disjointed training and testing procedures may compromise robustness, particularly in the face of unforeseen variations during deployment. In this paper, we propose a novel and holistic framework based on causality, named InPer, designed to enhance model generalization by incorporating causal intervention during training and causal perturbation during testing. Specifically, during the training phase, we employ entropy-based causal intervention (EnIn) to refine the selection of causal variables. To identify samples with anti-interference causal variables from the target domain, we propose a novel metric, homeostatic score, through causal perturbation (HoPer) to construct a prototype classifier in test time. Experimental results across multiple cross-domain tasks confirm the efficacy of InPer.
In this letter, we analyze the performance of mixed coherent and non-coherent transmissions approach, which can improve the performance of cell-free multiple-input multiple-output orthogonal frequency division multiplexing (CF mMIMO-OFDM) systems under asynchronous reception. To this end, we first obtain the achievable downlink sum-rate for the mixed coherent and non-coherent transmissions, and then provide a closed-form expression for the case with the maximum ratio precoding. Subsequently, an efficient clustering algorithm is proposed to group access points into different clusters with the same quantized phase shift in each cluster. Numerical results demonstrate that the mixed coherent and non-coherent transmissions can effectively improve the sum-rate of CF mMIMO-OFDM systems under asynchronous reception.
In this paper, we propose a new distillation method for extracting knowledge from Large Foundation Models (LFM) into lightweight models, introducing a novel supervision mode that does not require manually annotated data. While LFMs exhibit exceptional zero-shot classification abilities across datasets, relying solely on LFM-generated embeddings for distillation poses two main challenges: LFM's task-irrelevant knowledge and the high density of features. The transfer of task-irrelevant knowledge could compromise the student model's discriminative capabilities, and the high density of features within target domains obstructs the extraction of discriminative knowledge essential for the task. To address this issue, we introduce the Proxy Relational Graph (PRG) method. We initially extract task-relevant knowledge from LFMs by calculating a weighted average of logits obtained through text prompt embeddings. Then we construct sample-class proxy graphs for LFM and student models, respectively, to model the correlation between samples and class proxies. Then, we achieve the distillation of selective knowledge by aligning the relational graphs produced by both the LFM and the student model. Specifically, the distillation from LFM to the student model is achieved through two types of alignment: 1) aligning the sample nodes produced by the student model with those produced by the LFM, and 2) aligning the edge relationships in the student model's graph with those in the LFM's graph. Our experimental results validate the effectiveness of PRG, demonstrating its ability to leverage the extensive knowledge base of LFMs while skillfully circumventing their inherent limitations in focused learning scenarios. Notably, in our annotation-free framework, PRG achieves an accuracy of 76.23\% (T: 77.9\%) on CIFAR-100 and 72.44\% (T: 75.3\%) on the ImageNet-1K.
In this paper, we employ a Bayesian approach to assess the reliability of a critical component in the Mars Sample Return program, focusing on the Earth Entry System's risk of containment not assured upon reentry. Our study uses Gaussian Process modeling under a Bayesian regime to analyze the Earth Entry System's resilience against operational stress. This Bayesian framework allows for a detailed probabilistic evaluation of the risk of containment not assured, indicating the feasibility of meeting the mission's stringent safety goal of 0.999999 probability of success. The findings underscore the effectiveness of Bayesian methods for complex uncertainty quantification analyses of computer simulations, providing valuable insights for computational reliability analysis in a risk-averse setting.
In this paper, we introduce the problem of Online Matching with Delays and Size-based Costs (OMDSC). The OMDSC problem involves $m$ requests arriving online. At any time, a group can be formed by matching any number of these requests that have been received but are still unmatched. The cost associated with each group is determined by the waiting time for each request within the group and a size-dependent cost. Our goal is to partition all incoming requests into multiple groups while minimizing the total associated cost. The problem extends the TCP acknowledgment problem proposed by Dooly et al. (JACM 2001). It generalizes the cost model for sending acknowledgments. This paper reveals the competitive ratios for a fundamental case where the range of the penalty function is limited to $0$ and $1$. We classify such penalty functions into three distinct cases: (i) a fixed penalty of $1$ regardless of group size, (ii) a penalty of $0$ if and only if the group size is a multiple of a specific integer $k$, and (iii) other situations. The problem of case (i) is equivalent to the TCP acknowledgment problem, for which Dooly et al. proposed a $2$-competitive algorithm. For case (ii), we first show that natural algorithms that match all the remaining requests are $\Omega(\sqrt{k})$-competitive. We then propose an $O(\log k / \log \log k)$-competitive deterministic algorithm by carefully managing match size and timing, and we also prove its optimality. For case (iii), we demonstrate the non-existence of a competitive online algorithm. Additionally, we discuss competitive ratios for other typical penalty functions.
Novel View Synthesis (NVS) and 3D generation have recently achieved prominent improvements. However, these works mainly focus on confined categories or synthetic 3D assets, which are discouraged from generalizing to challenging in-the-wild scenes and fail to be employed with 2D synthesis directly. Moreover, these methods heavily depended on camera poses, limiting their real-world applications. To overcome these issues, we propose MVInpainter, re-formulating the 3D editing as a multi-view 2D inpainting task. Specifically, MVInpainter partially inpaints multi-view images with the reference guidance rather than intractably generating an entirely novel view from scratch, which largely simplifies the difficulty of in-the-wild NVS and leverages unmasked clues instead of explicit pose conditions. To ensure cross-view consistency, MVInpainter is enhanced by video priors from motion components and appearance guidance from concatenated reference key&value attention. Furthermore, MVInpainter incorporates slot attention to aggregate high-level optical flow features from unmasked regions to control the camera movement with pose-free training and inference. Sufficient scene-level experiments on both object-centric and forward-facing datasets verify the effectiveness of MVInpainter, including diverse tasks, such as multi-view object removal, synthesis, insertion, and replacement. The project page is //ewrfcas.github.io/MVInpainter/.
In this paper, we propose two mixed precision algorithms for Block-Jacobi preconditioner(BJAC): a fixed low precision strategy and an adaptive precision strategy. We evaluate the performance improvement of the proposed mixed precision BJAC preconditioners combined with the preconditioned conjugate gradient algorithm using problems including diffusion equations and radiation hydrodynamics equations. Numerical results show that, compared to the uniform high precision PCG algorithm, the mixed precision preconditioners can achieve speedups from 1.3 to 1.8 without sacrificing accuracy. Furthermore, we observe the phenomenon of convergence delay in some test cases for the mixed precision preconditioners, and further analyse the matrix features associate with the convergence delay behavior.
In this paper, we consider deploying multiple Unmanned Aerial Vehicles (UAVs) to enhance the computation service of Mobile Edge Computing (MEC) through collaborative computation among UAVs. In particular, the tasks of different types and service requirements in MEC network are offloaded from one UAV to another. To pursue the goal of low-carbon edge computing, we study the problem of minimizing system energy consumption by jointly optimizing computation resource allocation, task scheduling, service placement, and UAV trajectories. Considering the inherent unpredictability associated with task generation and the dynamic nature of wireless fading channels, addressing this problem presents a significant challenge. To overcome this issue, we reformulate the complicated non-convex problem as a Markov decision process and propose a soft actor-critic-based trajectory optimization and resource allocation algorithm to implement a flexible learning strategy. Numerical results illustrate that within a multi-UAV-enabled MEC network, the proposed algorithm effectively reduces the system energy consumption in heterogeneous tasks and services scenarios compared to other baseline solutions.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
We consider an interesting problem-salient instance segmentation in this paper. Other than producing bounding boxes, our network also outputs high-quality instance-level segments. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also its surrounding context, enabling us to distinguish the instances in the same scope even with obstruction. Our network is end-to-end trainable and runs at a fast speed (40 fps when processing an image with resolution 320x320). We evaluate our approach on a publicly available benchmark and show that it outperforms other alternative solutions. We also provide a thorough analysis of the design choices to help readers better understand the functions of each part of our network. The source code can be found at \url{//github.com/RuochenFan/S4Net}.