Optimizing doses for multiple indications is challenging. The pooled approach of finding a single optimal biological dose (OBD) for all indications ignores that dose-response or dose-toxicity curves may differ between indications, resulting in varying OBDs. Conversely, indication-specific dose optimization often requires a large sample size. To address this challenge, we propose a Randomized two-stage basket trial design that Optimizes doses in Multiple Indications (ROMI). In stage 1, for each indication, response and toxicity are evaluated for a high dose, which may be a previously obtained MTD, with a rule that stops accrual to indications where the high dose is unsafe or ineffective. Indications not terminated proceed to stage 2, where patients are randomized between the high dose and a specified lower dose. A latent-cluster Bayesian hierarchical model is employed to borrow information between indications, while considering the potential heterogeneity of OBD across indications. Indication-specific utilities are used to quantify response-toxicity trade-offs. At the end of stage 2, for each indication with at least one acceptable dose, the dose with highest posterior mean utility is selected as optimal. Two versions of ROMI are presented, one using only stage 2 data for dose optimization and the other optimizing doses using data from both stages. Simulations show that both versions have desirable operating characteristics compared to designs that either ignore indications or optimize dose independently for each indication.
Virtual content placement in physical scenes is a crucial aspect of augmented reality (AR). This task is particularly challenging when the virtual elements must adapt to multiple target physical environments that are unknown during development. AR authors use strategies such as manual placement performed by end-users, automated placement powered by author-defined constraints, and procedural content generation to adapt virtual content to physical spaces. Although effective, these options require human effort or annotated virtual assets. As an alternative, we present ARfy, a pipeline to support the adaptive placement of virtual content from pre-existing 3D scenes in arbitrary physical spaces. ARfy does not require intervention by end-users or asset annotation by AR authors. We demonstrate the pipeline capabilities using simulations on a publicly available indoor space dataset. ARfy automatically makes any generic 3D scene AR-ready and provides evaluation tools to facilitate future research on adaptive virtual content placement.
This study introduces a novel haptic device for enhancing dexterous manipulation in virtual reality. By stimulating mechanoreceptors on both sides of the fingernail, our lightweight system simulates tangential force sensations. We employ mechanical stimulation for more natural tactile feedback. A preliminary "balancing grasp challenge" experiment shows that users make more frequent micro-adjustments with our device, indicating improved precision. This research aims to advance haptic feedback in VR, potentially leading to more immersive and realistic virtual interactions.
We describe QGLAB, a new MATLAB package for analyzing partial differential equations on quantum graphs. The software is built on the existing, object-oriented MATLAB directed-graph class, inheriting its structure and adding additional easy-to-use features. The package allows one to construct a quantum graph and accurately compute the spectrum of elliptic operators, solutions to Poisson problems, the linear and nonlinear time evolution of a variety of PDEs, the continuation of branches of steady states (including locating and switching branches at bifurcations) and more. It overcomes the major challenge of discretizing quantum graphs -- the enforcement of vertex conditions -- using non-square differentiation matrices. It uses a unified framework to implement finite-difference and Chebyshev discretizations of differential operators on a quantum graph. For simplicity, the package overloads many built-in MATLAB functions to work on the class.
Recently, there has been a significant upsurge of interest in leveraging large language models (LLMs) to assist scientific discovery. However, most LLMs only focus on general science, while they lack domain-specific knowledge, such as chemical molecules and amino acid sequences. To bridge these gaps, we introduce SciDFM, a mixture-of-experts LLM, which is trained from scratch and is able to conduct college-level scientific reasoning and understand molecules and amino acid sequences. We collect a large-scale training corpus containing numerous scientific papers and books from different disciplines as well as data from domain-specific databases. We further fine-tune the pre-trained model on lots of instruction data to improve performances on downstream benchmarks. From experiment results, we show that SciDFM achieves strong performance on general scientific benchmarks such as SciEval and SciQ, and it reaches a SOTA performance on domain-specific benchmarks among models of similar size. We further analyze the expert layers and show that the results of expert selection vary with data from different disciplines. To benefit the broader research community, we open-source SciDFM at //huggingface.co/OpenDFM/SciDFM-MoE-A5.6B-v1.0.
We introduce DexDiffuser, a novel dexterous grasping method that generates, evaluates, and refines grasps on partial object point clouds. DexDiffuser includes the conditional diffusion-based grasp sampler DexSampler and the dexterous grasp evaluator DexEvaluator. DexSampler generates high-quality grasps conditioned on object point clouds by iterative denoising of randomly sampled grasps. We also introduce two grasp refinement strategies: Evaluator-Guided Diffusion (EGD) and Evaluator-based Sampling Refinement (ESR). The experiment results demonstrate that DexDiffuser consistently outperforms the state-of-the-art multi-finger grasp generation method FFHNet with an, on average, 9.12% and 19.44% higher grasp success rate in simulation and real robot experiments, respectively. Supplementary materials are available at //yulihn.github.io/DexDiffuser_page/
Surgical instrument segmentation (SIS) is pivotal for robotic-assisted minimally invasive surgery, assisting surgeons by identifying surgical instruments in endoscopic video frames. Recent unsupervised surgical instrument segmentation (USIS) methods primarily rely on pseudo-labels derived from low-level features such as color and optical flow, but these methods show limited effectiveness and generalizability in complex and unseen endoscopic scenarios. In this work, we propose a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter). Different from previous USIS works, our model is trained using a graph-cutting loss function that leverages patch affinities for supervision, eliminating the need for pseudo-labels. The framework adaptively determines which affinities from which levels should be prioritized. Therefore, the low- and high-level features and their affinities are effectively integrated to train a label-free unsupervised model, showing superior effectiveness and generalization ability. We conduct comprehensive experiments across multiple SIS datasets to validate our approach's state-of-the-art (SOTA) performance, robustness, and exceptional potential as a pre-trained model. Our code is released at //github.com/MingyuShengSMY/AMNCutter.
Graph similarity computation (GSC) aims to quantify the similarity score between two graphs. Although recent GSC methods based on graph neural networks (GNNs) take advantage of intra-graph structures in message passing, few of them fully utilize the structures presented by edges to boost the representation of their connected nodes. Moreover, previous cross-graph node embedding matching lacks the perception of the overall structure of the graph pair, due to the fact that the node representations from GNNs are confined to the intra-graph structure, causing the unreasonable similarity score. Intuitively, the cross-graph structure represented in the assignment graph is helpful to rectify the inappropriate matching. Therefore, we propose a structure-enhanced graph matching network (SEGMN). Equipped with a dual embedding learning module and a structure perception matching module, SEGMN achieves structure enhancement in both embedding learning and cross-graph matching. The dual embedding learning module incorporates adjacent edge representation into each node to achieve a structure-enhanced representation. The structure perception matching module achieves cross-graph structure enhancement through assignment graph convolution. The similarity score of each cross-graph node pair can be rectified by aggregating messages from structurally relevant node pairs. Experimental results on benchmark datasets demonstrate that SEGMN outperforms the state-of-the-art GSC methods in the GED regression task, and the structure perception matching module is plug-and-play, which can further improve the performance of the baselines by up to 25%.
As the demand for efficient data processing escalates, reconfigurable analog hardware which implements novel analog compute paradigms, is promising for energy-efficient computing at the sensing and actuation boundaries. These analog computing platforms embed information in physical properties and then use the physics of materials, devices, and circuits to perform computation. These hardware platforms are more sensitive to nonidealities, such as noise and fabrication variations, than their digital counterparts and accrue high resource costs when programmable elements are introduced. Identifying resource-efficient analog system designs that mitigate these nonidealities is done manually today. While design optimization frameworks have been enormously successful in other fields, such as photonics, they typically either target linear dynamical systems that have closed-form solutions or target a specific differential equation system and then derive the solution through hand analysis. In both cases, time-domain simulation is no longer needed to predict hardware behavior. In contrast, described analog hardware platforms have nonlinear time-evolving dynamics that vary substantially from design to design, lack closed-form solutions, and require the optimizer to consider time explicitly. We present Shem, an optimization framework for analog systems. Shem leverages differentiation methods recently popularized to train neural ODEs to enable the optimization of analog systems that exhibit nonlinear dynamics, noise and mismatch, and discrete behavior. We evaluate Shem on oscillator-based pattern recognizer, CNN edge detector, and transmission-line security primitive design case studies and demonstrate it can improve designs. To our knowledge, the latter two design problems have not been optimized with automated methods before.
The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.
Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.