Spinal fusion surgery requires highly accurate implantation of pedicle screw implants, which must be conducted in critical proximity to vital structures with a limited view of anatomy. Robotic surgery systems have been proposed to improve placement accuracy, however, state-of-the-art systems suffer from the limitations of open-loop approaches, as they follow traditional concepts of preoperative planning and intraoperative registration, without real-time recalculation of the surgical plan. In this paper, we propose an intraoperative planning approach for robotic spine surgery that leverages real-time observation for drill path planning based on Safe Deep Reinforcement Learning (DRL). The main contributions of our method are (1) the capability to guarantee safe actions by introducing an uncertainty-aware distance-based safety filter; and (2) the ability to compensate for incomplete intraoperative anatomical information, by encoding a-priori knowledge about anatomical structures with a network pre-trained on high-fidelity anatomical models. Planning quality was assessed by quantitative comparison with the gold standard (GS) drill planning. In experiments with 5 models derived from real magnetic resonance imaging (MRI) data, our approach was capable of achieving 90% bone penetration with respect to the GS while satisfying safety requirements, even under observation and motion uncertainty. To the best of our knowledge, our approach is the first safe DRL approach focusing on orthopedic surgeries.
One core assumption typically adopted for valid causal inference is that of no interference between experimental units, i.e., the outcome of an experimental unit is unaffected by the treatments assigned to other experimental units. This assumption can be violated in real-life experiments, which significantly complicates the task of causal inference as one must disentangle direct treatment effects from ``spillover'' effects. Current methodologies are lacking, as they cannot handle arbitrary, unknown interference structures to permit inference on causal estimands. We present a general framework to address the limitations of existing approaches. Our framework is based on the new concept of the ``degree of interference'' (DoI). The DoI is a unit-level latent variable that captures the latent structure of interference. We also develop a data augmentation algorithm that adopts a blocked Gibbs sampler and Bayesian nonparametric methodology to perform inferences on the estimands under our framework. We illustrate the DoI concept and properties of our Bayesian methodology via extensive simulation studies and an analysis of a randomized experiment investigating the impact of a cash transfer program for which interference is a critical concern. Ultimately, our framework enables us to infer causal effects without strong structural assumptions on interference.
Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action given a query state and an in-context dataset of interactions, across a diverse set of tasks. This procedure, while simple, produces a model with several surprising capabilities. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline, despite not being explicitly trained to do so. The model also generalizes beyond the pretraining distribution to new tasks and automatically adapts its decision-making strategies to unknown structure. Theoretically, we show DPT can be viewed as an efficient implementation of Bayesian posterior sampling, a provably sample-efficient RL algorithm. We further leverage this connection to provide guarantees on the regret of the in-context algorithm yielded by DPT, and prove that it can learn faster than algorithms used to generate the pretraining data. These results suggest a promising yet simple path towards instilling strong in-context decision-making abilities in transformers.
Placement is a critical step in modern chip design, aiming to determine the positions of circuit modules on the chip canvas. Recent works have shown that reinforcement learning (RL) can improve human performance in chip placement. However, such an RL-based approach suffers from long training time and low transfer ability in unseen chip circuits. To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data. ChiPFormer has several advantages that prior arts do not have. First, ChiPFormer can exploit offline placement designs to learn transferable policies more efficiently in a multi-task setting. Second, ChiPFormer can promote effective finetuning for unseen chip circuits, reducing the placement runtime from hours to minutes. Third, extensive experiments on 32 chip circuits demonstrate that ChiPFormer achieves significantly better placement quality while reducing the runtime by 10x compared to recent state-of-the-art approaches in both public benchmarks and realistic industrial tasks. The deliverables are released at //sites.google.com/view/chipformer/home.
The logical method proposed by Goubault, Ledent, and Rajsbaum provides a novel way to show the unsolvability of distributed tasks by means of a logical obstruction, which is an epistemic logic formula describing the reason of unsolvability. In this paper, we introduce the notion of partial product update, which refines that of product update in the original logical method, to encompass distributed tasks and protocols modeled by impure simplicial complexes. With this extended notion of partial product update, the original logical method is generalized so that it allows the application of logical obstruction to show unsolvability results in a distributed environment where the failure of agents is detectable. We demonstrate the use of the logical method by giving a concrete logical obstruction and showing that the consensus task is unsolvable by the single-round synchronous message-passing protocol.
Multi-agent Reinforcement Learning (MARL) based traffic signal control becomes a popular research topic in recent years. Most existing MARL approaches tend to learn the optimum control strategies in a decentralised manner by considering communication among neighbouring intersections. However, the non-stationary property in MARL may lead to extremely slow or even failure of convergence, especially when the number of intersections becomes large. One of the existing methods is to partition the whole network into several regions, each of which utilizes a centralized RL framework to speed up the convergence rate. However, there are two challenges for this strategy: the first one is how to get a flexible partition and the second one is how to search for the optimal joint actions for a region of intersections. In this paper, we propose a novel training framework where our region partitioning rule is based on the adjacency between the intersections and propose Dynamic Branching Dueling Q-Network (DBDQ) to search for optimal joint action efficiently and to maximize the regional reward. The experimental results with both real datasets and synthetic datasets demonstrate the superiority of our framework over other existing frameworks.
In everyday life, we often find that we can maintain an object's equilibrium on a tray by adjusting its orientation. Building upon this observation and extending the method we previously proposed to suppress sloshing in a moving vessel, this paper presents a feedforward control approach for transporting objects with a robot that are not firmly grasped but simply placed on a tray. The proposed approach combines smoothing actions and end-effector re-orientation to prevent object sliding. It can be integrated into existing robotic systems as a plug-in element between the reference trajectory generator and the robot control. To demonstrate the effectiveness of the proposed methods, particularly when dealing with unknown reference signals, we embed them in a direct teleoperation scheme. In this scheme, the user commands the robot carrying the tray by simply moving their hand in free space, with the hand's 3D position detected by a motion capture system. Furthermore, in the case of point-to-point motions, the same feedforward control, when fed with step inputs representing the desired goal position, dynamically generates the minimum-time reference trajectory that complies with velocity and acceleration constraints, thus avoiding sloshing and slipping. More information and accompanying videos can be found at //sites.google.com/view/robotwaiter/
Indirect simultaneous positioning (ISP), where internal tissue points are placed at desired locations indirectly through the manipulation of boundary points, is a type of subtask frequently performed in robotic surgeries. Although challenging due to complex tissue dynamics, automating the task can potentially reduce the workload of surgeons. This paper presents a sim-to-real framework for learning to automate the task without interacting with a real environment, and for planning preoperatively to find the grasping points that minimize local tissue deformation. A control policy is learned using deep reinforcement learning (DRL) in the FEM-based simulation environment and transferred to real-world situation. Grasping points are planned in the simulator by utilizing the trained policy using Bayesian optimization (BO). Inconsistent simulation performance is overcome by formulating the problem as a state augmented Markov decision process (MDP). Experimental results show that the learned policy places the internal tissue points accurately, and that the planned grasping points yield small tissue deformation among the trials. The proposed learning and planning scheme is able to automate internal tissue point manipulation in surgeries and has the potential to be generalized to complex surgical scenarios.
Given harsh operating conditions and physical constraints in reactors, nuclear applications cannot afford to equip the physical asset with a large array of sensors. Therefore, it is crucial to carefully determine the placement of sensors within the given spatial limitations, enabling the reconstruction of reactor flow fields and the creation of nuclear digital twins. Various design considerations are imposed, such as predetermined sensor locations, restricted areas within the reactor, a fixed number of sensors allocated to a specific region, or sensors positioned at a designated distance from one another. We develop a data-driven technique that integrates constraints into an optimization procedure for sensor placement, aiming to minimize reconstruction errors. Our approach employs a greedy algorithm that can optimize sensor locations on a grid, adhering to user-defined constraints. We demonstrate the near optimality of our algorithm by computing all possible configurations for selecting a certain number of sensors for a randomly generated state space system. In this work, the algorithm is demonstrated on the Out-of-Pile Testing and Instrumentation Transient Water Irradiation System (OPTI-TWIST) prototype vessel, which is electrically heated to mimic the neutronics effect of the Transient Reactor Test facility (TREAT) at Idaho National Laboratory (INL). The resulting sensor-based reconstruction of temperature within the OPTI-TWIST minimizes error, provides probabilistic bounds for noise-induced uncertainty and will finally be used for communication between the digital twin and experimental facility.
Background. Coping with the rapid growing complexity in contemporary software architecture, tracing has become an increasingly critical practice and been adopted widely by software engineers. By adopting tracing tools, practitioners are able to monitor, debug, and optimize distributed software architectures easily. However, with excessive number of valid candidates, researchers and practitioners have a hard time finding and selecting the suitable tracing tools by systematically considering their features and advantages.Objective. To such a purpose, this paper aims to provide an overview of popular Open tracing tools via comparison. Method. Herein, we first identified \ra{30} tools in an objective, systematic, and reproducible manner adopting the Systematic Multivocal Literature Review protocol. Then, we characterized each tool looking at the 1) measured features, 2) popularity both in peer-reviewed literature and online media, and 3) benefits and issues. We used topic modeling and sentiment analysis to extract and summarize the benefits and issues. Specially, we adopted ChatGPT to support the topic interpretation. Results. As a result, this paper presents a systematic comparison amongst the selected tracing tools in terms of their features, popularity, benefits and issues. Conclusion. The result mainly shows that each tracing tool provides a unique combination of features with also different pros and cons. The contribution of this paper is to provide the practitioners better understanding of the tracing tools facilitating their adoption.
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.