This paper addresses the critical task of gait cycle segmentation using short sequences from ear-worn IMUs, a practical and non-invasive approach for home-based monitoring and rehabilitation of patients with impaired motor function. While previous studies have focused on IMUs positioned on the lower limbs, ear-worn IMUs offer a unique advantage in capturing gait dynamics with minimal intrusion. To address the challenges of gait cycle segmentation using short sequences, we introduce the Gait Characteristic Curve Regression and Restoration (GCCRR) method, a novel two-stage approach designed for fine-grained gait phase segmentation. The first stage transforms the segmentation task into a regression task on the Gait Characteristic Curve (GCC), which is a one-dimensional feature sequence incorporating periodic information. The second stage restores the gait cycle using peak detection techniques. Our method employs Bi-LSTM-based deep learning algorithms for regression to ensure reliable segmentation for short gait sequences. Evaluation on the HamlynGait dataset demonstrates that GCCRR achieves over 80\% Accuracy, with a Timestamp Error below one sampling interval. Despite its promising results, the performance lags behind methods using more extensive sensor systems, highlighting the need for larger, more diverse datasets. Future work will focus on data augmentation using motion capture systems and improving algorithmic generalizability.
We present Tachis, a higher-order separation logic to reason about the expected cost of probabilistic programs. Inspired by the uses of time credits for reasoning about the running time of deterministic programs, we introduce a novel notion of probabilistic cost credit. Probabilistic cost credits are a separation logic resource that can be used to pay for the cost of operations in programs, and that can be distributed across all possible branches of sampling instructions according to their weight, thus enabling us to reason about expected cost. The representation of cost credits as separation logic resources gives Tachis a great deal of flexibility and expressivity. In particular, it permits reasoning about amortized expected cost by storing excess credits as potential into data structures to pay for future operations. Tachis further supports a range of cost models, including running time and entropy usage. We showcase the versatility of this approach by applying our techniques to prove upper bounds on the expected cost of a variety of probabilistic algorithms and data structures, including randomized quicksort, hash tables, and meldable heaps. All of our results have been mechanized using Coq, Iris, and the Coquelicot real analysis library.
3D geometric shape completion hinges on representation learning and a deep understanding of geometric data. Without profound insights into the three-dimensional nature of the data, this task remains unattainable. Our work addresses this challenge of 3D shape completion given partial observations by proposing a transformer operating on the latent space representing Signed Distance Fields (SDFs). Instead of a monolithic volume, the SDF of an object is partitioned into smaller high-resolution patches leading to a sequence of latent codes. The approach relies on a smooth latent space encoding learned via a variational autoencoder (VAE), trained on millions of 3D patches. We employ an efficient masked autoencoder transformer to complete partial sequences into comprehensive shapes in latent space. Our approach is extensively evaluated on partial observations from ShapeNet and the ABC dataset where only fractions of the objects are given. The proposed POC-SLT architecture compares favorably with several baseline state-of-the-art methods, demonstrating a significant improvement in 3D shape completion, both qualitatively and quantitatively.
The research aims to expand tactile feedback beyond vibrations to various modes of stimuli, such as indentation, vibration, among others. By incorporating soft material into the design of a novel tactile actuator, we can achieve multi-modality and enhance the device's wearability, which encompasses compliance, safety, and portability. The proposed tactile device can elevate the presence and immersion in VR by enabling diverse haptic feedback such as, force indentation, vibration and other arbitrary force outputs. This approach enables the rendering of haptic interactions with virtual objects, such as grasping of aa 3D virtual object to feel its stiffness - action that was difficult to achieve using widely adopted vibrotactile motors.
Integrating textual data with imaging in liver tumor segmentation is essential for enhancing diagnostic accuracy. However, current multi-modal medical datasets offer only general text annotations, lacking lesion-specific details critical for extracting nuanced features, especially for fine-grained segmentation of tumor boundaries and small lesions. To address these limitations, we developed datasets with lesion-specific text annotations for liver tumors and introduced the TexLiverNet model. TexLiverNet employs an agent-based cross-attention module that integrates text features efficiently with visual features, significantly reducing computational costs. Additionally, enhanced spatial and adaptive frequency domain perception is proposed to precisely delineate lesion boundaries, reduce background interference, and recover fine details in small lesions. Comprehensive evaluations on public and private datasets demonstrate that TexLiverNet achieves superior performance compared to current state-of-the-art methods.
CNNs exhibit inherent equivariance to image translation, leading to efficient parameter and data usage, faster learning, and improved robustness. The concept of translation equivariant networks has been successfully extended to rotation transformation using group convolution for discrete rotation groups and harmonic functions for the continuous rotation group encompassing $360^\circ$. We explore the compatibility of the SA mechanism with full rotation equivariance, in contrast to previous studies that focused on discrete rotation. We introduce the Harmformer, a harmonic transformer with a convolutional stem that achieves equivariance for both translation and continuous rotation. Accompanied by an end-to-end equivariance proof, the Harmformer not only outperforms previous equivariant transformers, but also demonstrates inherent stability under any continuous rotation, even without seeing rotated samples during training.
Surgical instrument segmentation (SIS) is pivotal for robotic-assisted minimally invasive surgery, assisting surgeons by identifying surgical instruments in endoscopic video frames. Recent unsupervised surgical instrument segmentation (USIS) methods primarily rely on pseudo-labels derived from low-level features such as color and optical flow, but these methods show limited effectiveness and generalizability in complex and unseen endoscopic scenarios. In this work, we propose a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter). Different from previous USIS works, our model is trained using a graph-cutting loss function that leverages patch affinities for supervision, eliminating the need for pseudo-labels. The framework adaptively determines which affinities from which levels should be prioritized. Therefore, the low- and high-level features and their affinities are effectively integrated to train a label-free unsupervised model, showing superior effectiveness and generalization ability. We conduct comprehensive experiments across multiple SIS datasets to validate our approach's state-of-the-art (SOTA) performance, robustness, and exceptional potential as a pre-trained model. Our code is released at //github.com/MingyuShengSMY/AMNCutter.
Graph similarity computation (GSC) aims to quantify the similarity score between two graphs. Although recent GSC methods based on graph neural networks (GNNs) take advantage of intra-graph structures in message passing, few of them fully utilize the structures presented by edges to boost the representation of their connected nodes. Moreover, previous cross-graph node embedding matching lacks the perception of the overall structure of the graph pair, due to the fact that the node representations from GNNs are confined to the intra-graph structure, causing the unreasonable similarity score. Intuitively, the cross-graph structure represented in the assignment graph is helpful to rectify the inappropriate matching. Therefore, we propose a structure-enhanced graph matching network (SEGMN). Equipped with a dual embedding learning module and a structure perception matching module, SEGMN achieves structure enhancement in both embedding learning and cross-graph matching. The dual embedding learning module incorporates adjacent edge representation into each node to achieve a structure-enhanced representation. The structure perception matching module achieves cross-graph structure enhancement through assignment graph convolution. The similarity score of each cross-graph node pair can be rectified by aggregating messages from structurally relevant node pairs. Experimental results on benchmark datasets demonstrate that SEGMN outperforms the state-of-the-art GSC methods in the GED regression task, and the structure perception matching module is plug-and-play, which can further improve the performance of the baselines by up to 25%.
We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at //bit.ly/2EPbrJs.
This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been the key solution to sequential decision-making problems. Along with the fast advance of RL in various domains. including robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which these transfer learning techniques would be approachable. We discuss the relationship between transfer learning and other relevant topics from an RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL.
We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs.We validate the utility ofMMKG in the sameAs link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.