In this paper, we present formulations and an exact method to solve the Time Dependent Traveling Salesman Problem with Time Window (TD-TSPTW) under a generic travel cost function where waiting is allowed. A particular case in which the travel cost is a non-decreasing function has been addressed recently. With that assumption, because of both the First-In-First-Out property of the travel time function and the non-decreasing property of the travel cost function, we can ignore the possibility of waiting. However, for generic travel cost functions, waiting after visiting some locations can be part of optimal solutions. To handle the general case, we introduce new lower-bound formulations that allow us to ensure the existence of optimal solutions. We adapt the existing algorithm for TD-TSPTW with non-decreasing travel costs to solve the TD-TSPTW with generic travel costs. In the experiment, we evaluate the strength of the proposed lower bound formulations and algorithm by applying them to solve the TD-TSPTW with the total travel time objective. The results indicate that the proposed algorithm is competitive with and even outperforms the state-of-art solver in various benchmark instances.
In this paper, we conducted a comparative evaluation of three RGB-D SLAM (Simultaneous Localization and Mapping) algorithms: RTAB-Map, ORB-SLAM3, and OpenVSLAM for SURENA-V humanoid robot localization and mapping. Our test involves the robot to follow a full circular pattern, with an Intel RealSense D435 RGB-D camera installed on its head. In assessing localization accuracy, ORB-SLAM3 outperformed the others with an ATE of 0.1073, followed by RTAB-Map at 0.1641 and OpenVSLAM at 0.1847. However, it should be noted that both ORB-SLAM3 and OpenVSLAM faced challenges in maintaining accurate odometry when the robot encountered a wall with limited feature points. Nevertheless, OpenVSLAM demonstrated the ability to detect loop closures and successfully relocalize itself within the map when the robot approached its initial location. The investigation also extended to mapping capabilities, where RTAB-Map excelled by offering diverse mapping outputs, including dense, OctoMap, and occupancy grid maps. In contrast, both ORB-SLAM3 and OpenVSLAM provided only sparse maps.
This paper presents the design and implementation of a Right Invariant Extended Kalman Filter (RIEKF) for estimating the states of the kinematic base of the Surena V humanoid robot. The state representation of the robot is defined on the Lie group $SE_4(3)$, encompassing the position, velocity, and orientation of the base, as well as the position of the left and right feet. In addition, we incorporated IMU biases as concatenated states within the filter. The prediction step of the RIEKF utilizes IMU equations, while the update step incorporates forward kinematics. To evaluate the performance of the RIEKF, we conducted experiments using the Choreonoid dynamic simulation framework and compared it against a Quaternion-based Extended Kalman Filter (QEKF). The results of the analysis demonstrate that the RIEKF exhibits reduced drift in localization and achieves estimation convergence in a shorter time compared to the QEKF. These findings highlight the effectiveness of the proposed RIEKF for accurate state estimation of the kinematic base in humanoid robotics.
In this paper, we propose a novel high-dimensional time-varying coefficient estimator for noisy high-frequency observations. In high-frequency finance, we often observe that noises dominate a signal of an underlying true process. Thus, we cannot apply usual regression procedures to analyze noisy high-frequency observations. To handle this issue, we first employ a smoothing method for the observed variables. However, the smoothed variables still contain non-negligible noises. To manage these non-negligible noises and the high dimensionality, we propose a nonconvex penalized regression method for each local coefficient. This method produces consistent but biased local coefficient estimators. To estimate the integrated coefficients, we propose a debiasing scheme and obtain a debiased integrated coefficient estimator using debiased local coefficient estimators. Then, to further account for the sparsity structure of the coefficients, we apply a thresholding scheme to the debiased integrated coefficient estimator. We call this scheme the Thresholded dEbiased Nonconvex LASSO (TEN-LASSO) estimator. Furthermore, this paper establishes the concentration properties of the TEN-LASSO estimator and discusses a nonconvex optimization algorithm.
In this paper, we introduce $\textbf{GS-SLAM}$ that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. The source code will be released soon.
In this paper, we provide a novel enumeration algorithm for the set of all walks of a given length within a directed graph. Our algorithm has worst-case constant delay between outputting succinct representations of such walks, after a preprocessing step requiring linear time relative to the size of the graph. We apply these results to the problem of enumerating succinct representations of the strings of a given length from a prefix-closed regular language (languages accepted by a finite automaton which has final states only).
RAW to sRGB mapping, which aims to convert RAW images from smartphones into RGB form equivalent to that of Digital Single-Lens Reflex (DSLR) cameras, has become an important area of research. However, current methods often ignore the difference between cell phone RAW images and DSLR camera RGB images, a difference that goes beyond the color matrix and extends to spatial structure due to resolution variations. Recent methods directly rebuild color mapping and spatial structure via shared deep representation, limiting optimal performance. Inspired by Image Signal Processing (ISP) pipeline, which distinguishes image restoration and enhancement, we present a novel Neural ISP framework, named FourierISP. This approach breaks the image down into style and structure within the frequency domain, allowing for independent optimization. FourierISP is comprised of three subnetworks: Phase Enhance Subnet for structural refinement, Amplitude Refine Subnet for color learning, and Color Adaptation Subnet for blending them in a smooth manner. This approach sharpens both color and structure, and extensive evaluations across varied datasets confirm that our approach realizes state-of-the-art results. Code will be available at ~\url{//github.com/alexhe101/FourierISP}.
In Part II of this two-part paper, we prove the convergence of the simplified information geometry approach (SIGA) proposed in Part I. For a general Bayesian inference problem, we first show that the iteration of the common second-order natural parameter (SONP) is separated from that of the common first-order natural parameter (FONP). Hence, the convergence of the common SONP can be checked independently. We show that with the initialization satisfying a specific but large range, the common SONP is convergent regardless of the value of the damping factor. For the common FONP, we establish a sufficient condition of its convergence and prove that the convergence of the common FONP relies on the spectral radius of a particular matrix related to the damping factor. We give the range of the damping factor that guarantees the convergence in the worst case. Further, we determine the range of the damping factor for massive MIMO-OFDM channel estimation by using the specific properties of the measurement matrices. Simulation results are provided to confirm the theoretical results.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.