Nuclei instance segmentation in histopathological images is of great importance for biological analysis and cancer diagnosis but remains challenging for two reasons. (1) Similar visual presentation of intranuclear and extranuclear regions of chromophobe nuclei often causes under-segmentation, and (2) current methods lack the exploration of nuclei structure, resulting in fragmented instance predictions. To address these problems, this paper proposes a structure encoding and interaction network, termed SEINE, which develops the structure modeling scheme of nuclei and exploits the structure similarity between nuclei to improve the integrality of each segmented instance. Concretely, SEINE introduces a contour-based structure encoding (SE) that considers the correlation between nuclei structure and semantics, realizing a reasonable representation of the nuclei structure. Based on the encoding, we propose a structure-guided attention (SGA) module that takes the clear nuclei as prototypes to enhance the structure learning for the fuzzy nuclei. To strengthen the structural learning ability, a semantic feature fusion (SFF) is presented to boost the semantic consistency of semantic and structure branches. Furthermore, a position enhancement (PE) method is applied to suppress incorrect nuclei boundary predictions. Extensive experiments demonstrate the superiority of our approaches, and SEINE achieves state-of-the-art (SOTA) performance on four datasets. The code is available at //github.com/zhangye-zoe/SEINE.
Bipolar disorder (BD) and schizophrenia (SZ) are severe mental disorders with profound societal impact. Identifying risk markers early is crucial for understanding disease progression and enabling preventive measures. The Danish High Risk and Resilience Study (VIA) focuses on understanding early disease processes, particularly in children with familial high risk (FHR). Understanding structural brain changes associated with these diseases during early stages is essential for effective interventions. The central sulcus (CS) is a prominent brain landmark related to brain regions involved in motor and sensory processing. Analyzing CS morphology can provide valuable insights into neurodevelopmental abnormalities in the FHR group. However, segmenting the central sulcus (CS) presents challenges due to its variability, especially in adolescents. This study introduces two novel approaches to improve CS segmentation: synthetic data generation to model CS variability and self-supervised pre-training with multi-task learning to adapt models to new cohorts. These methods aim to enhance segmentation performance across diverse populations, eliminating the need for extensive preprocessing.
A critical bottleneck limiting imitation learning in robotics is the lack of data. This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces. In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators. TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof. In its more accessible version, TeleMoMa works using simply vision (e.g., an RGB-D camera), lowering the entry bar for humans to provide mobile manipulation demonstrations. We demonstrate the versatility of TeleMoMa by teleoperating several existing mobile manipulators - PAL Tiago++, Toyota HSR, and Fetch - in simulation and the real world. We demonstrate the quality of the demonstrations collected with TeleMoMa by training imitation learning policies for mobile manipulation tasks involving synchronized whole-body motion. Finally, we also show that TeleMoMa's teleoperation channel enables teleoperation on site, looking at the robot, or remote, sending commands and observations through a computer network, and perform user studies to evaluate how easy it is for novice users to learn to collect demonstrations with different combinations of human interfaces enabled by our system. We hope TeleMoMa becomes a helpful tool for the community enabling researchers to collect whole-body mobile manipulation demonstrations. For more information and video results, //robin-lab.cs.utexas.edu/telemoma-web.
Colonoscopy reconstruction is pivotal for diagnosing colorectal cancer. However, accurate long-sequence colonoscopy reconstruction faces three major challenges: (1) dissimilarity among segments of the colon due to its meandering and convoluted shape; (2) co-existence of simple and intricately folded geometry structures; (3) sparse viewpoints due to constrained camera trajectories. To tackle these challenges, we introduce a new reconstruction framework based on neural radiance field (NeRF), named ColonNeRF, which leverages neural rendering for novel view synthesis of long-sequence colonoscopy. Specifically, to reconstruct the entire colon in a piecewise manner, our ColonNeRF introduces a region division and integration module, effectively reducing shape dissimilarity and ensuring geometric consistency in each segment. To learn both the simple and complex geometry in a unified framework, our ColonNeRF incorporates a multi-level fusion module that progressively models the colon regions from easy to hard. Additionally, to overcome the challenges from sparse views, we devise a DensiNet module for densifying camera poses under the guidance of semantic consistency. We conduct extensive experiments on both synthetic and real-world datasets to evaluate our ColonNeRF. Quantitatively, ColonNeRF exhibits a 67%-85% increase in LPIPS-ALEX scores. Qualitatively, our reconstruction visualizations show much clearer textures and more accurate geometric details. These sufficiently demonstrate our superior performance over the state-of-the-art methods.
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by varied social cognitive challenges and repetitive behavioral patterns. Identifying reliable brain imaging-based biomarkers for ASD has been a persistent challenge due to the spectrum's diverse symptomatology. Existing baselines in the field have made significant strides in this direction, yet there remains room for improvement in both performance and interpretability. We propose \emph{HyperGALE}, which builds upon the hypergraph by incorporating learned hyperedges and gated attention mechanisms. This approach has led to substantial improvements in the model's ability to interpret complex brain graph data, offering deeper insights into ASD biomarker characterization. Evaluated on the extensive ABIDE II dataset, \emph{HyperGALE} not only improves interpretability but also demonstrates statistically significant enhancements in key performance metrics compared to both previous baselines and the foundational hypergraph model. The advancement \emph{HyperGALE} brings to ASD research highlights the potential of sophisticated graph-based techniques in neurodevelopmental studies. The source code and implementation instructions are available at GitHub://github.com/mehular0ra/HyperGALE.
Studies in circadian biology often use trigonometric regression to model phenomena over time. Ideally, protocols in these studies would collect samples at evenly distributed and equally spaced time points over a 24 hour period. This sample collection protocol is known as an equispaced design, which is considered the optimal experimental design for trigonometric regression under multiple statistical criteria. However, implementing equispaced designs in studies involving individuals is logistically challenging, and failure to employ an equispaced design could cause a loss of statistical power when performing hypothesis tests with an estimated model. This paper is motivated by the potential loss of statistical power during hypothesis testing, and considers a weighted trigonometric regression as a remedy. Specifically, the weights for this regression are the normalized reciprocals of estimates derived from a kernel density estimator for sample collection time, which inflates the weight of samples collected at underrepresented time points. A search procedure is also introduced to identify the concentration hyperparameter for kernel density estimation that maximizes the Hessian of weighted squared loss, which relates to both maximizing the $D$-optimality criterion from experimental design literature and minimizing the generalized variance. Simulation studies consistently demonstrate that this weighted regression mitigates variability in inferences produced by an estimated model. Illustrations with three real circadian biology data sets further indicate that this weighted regression consistently yields larger test statistics than its unweighted counterpart for first-order trigonometric regression, or cosinor regression, which is prevalent in circadian biology studies.
Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performance with the aid of additional modules of (1) extracting the parts related to cognitive distortion, and (2) debating the reasoning steps by multiple agents. Our experimental results on a public dataset show that ERD improves the multi-class F1 score as well as binary specificity score. Regarding the latter score, it turns out that our method is effective in debiasing the baseline method which has high false positive rate, especially when the summary of multi-agent debate is provided to LLMs.
In the field of healthcare, electronic health records (EHR) serve as crucial training data for developing machine learning models for diagnosis, treatment, and the management of healthcare resources. However, medical datasets are often imbalanced in terms of sensitive attributes such as race/ethnicity, gender, and age. Machine learning models trained on class-imbalanced EHR datasets perform significantly worse in deployment for individuals of the minority classes compared to those from majority classes, which may lead to inequitable healthcare outcomes for minority groups. To address this challenge, we propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE), a novel approach to augment imbalanced datasets using samples generated by a deep generative model. The MCRAGE process involves training a Conditional Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes. We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes, which can be used to train less biased downstream models. We measure the performance of MCRAGE versus alternative approaches using Accuracy, F1 score and AUROC of these downstream models. We provide theoretical justification for our method in terms of recent convergence results for DDPMs.
The soft Dice loss (SDL) has taken a pivotal role in numerous automated segmentation pipelines in the medical imaging community. Over the last years, some reasons behind its superior functioning have been uncovered and further optimizations have been explored. However, there is currently no implementation that supports its direct utilization in scenarios involving soft labels. Hence, a synergy between the use of SDL and research leveraging the use of soft labels, also in the context of model calibration, is still missing. In this work, we introduce Dice semimetric losses (DMLs), which (i) are by design identical to SDL in a standard setting with hard labels, but (ii) can be employed in settings with soft labels. Our experiments on the public QUBIQ, LiTS and KiTS benchmarks confirm the potential synergy of DMLs with soft labels (e.g. averaging, label smoothing, and knowledge distillation) over hard labels (e.g. majority voting and random selection). As a result, we obtain superior Dice scores and model calibration, which supports the wider adoption of DMLs in practice. The code is available at //github.com/zifuwanggg/JDTLosses
Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \textit{AgentOhana} aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. It meticulously standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training. Leveraging the data unification, our training pipeline maintains equilibrium across different data sources and preserves independent randomness across devices during dataset partitioning and model training. Additionally, we present \textbf{xLAM-v0.1}, a large action model tailored for AI agents, which demonstrates exceptional performance across various benchmarks. Begin the exploration at \url{//github.com/SalesforceAIResearch/xLAM}.
Graph Neural Networks (GNNs) have shown promising performance in various graph learning tasks, but at the cost of resource-intensive computations. The primary overhead of GNN update stems from graph propagation and weight transformation, both involving operations on graph-scale matrices. Previous studies attempt to reduce the computational budget by leveraging graph-level or network-level sparsification techniques, resulting in downsized graph or weights. In this work, we propose Unifews, which unifies the two operations in an entry-wise manner considering individual matrix elements, and conducts joint edge-weight sparsification to enhance learning efficiency. The entry-wise design of Unifews enables adaptive compression across GNN layers with progressively increased sparsity, and is applicable to a variety of architectural designs with on-the-fly operation simplification. Theoretically, we establish a novel framework to characterize sparsified GNN learning in view of a graph optimization process, and prove that Unifews effectively approximates the learning objective with bounded error and reduced computational load. We conduct extensive experiments to evaluate the performance of our method in diverse settings. Unifews is advantageous in jointly removing more than 90% of edges and weight entries with comparable or better accuracy than baseline models. The sparsification offers remarkable efficiency improvements including 10-20x matrix operation reduction and up to 100x acceleration in graph propagation time for the largest graph at the billion-edge scale.