Force perception on medical instruments is critical for understanding the mechanism between surgical tools and tissues for feeding back quantized force information, which is essential for guidance and supervision in robotic autonomous surgery. Especially for continuous curvilinear capsulorhexis (CCC), it always lacks a force measuring method, providing a sensitive, accurate, and multi-dimensional measurement to track the intraoperative force. Furthermore, the decoupling matrix obtained from the calibration can decorrelate signals with acceptable accuracy, however, this calculating method is not a strong way for thoroughly decoupling under some sensitive measuring situations such as the CCC. In this paper, a three-dimensional force perception method on capsulorhexis forceps by installing Fiber Bragg Grating sensors (FBGs) on prongs and a signal decoupling method combined with FASTICA is first proposed to solve these problems. According to experimental results, the measuring range is up to 1 N (depending on the range of wavelength shifts of sensors) and the resolution on x, y, and z axial force is 0.5, 0.5, and 2 mN separately. To minimize the coupling effects among sensors on measuring multi-axial forces, by unitizing the particular parameter and scaling the corresponding vector in the mixing matrix and recovered signals from FastICA, the signals from sensors can be decorrelated and recovered with the errors on axial forces decreasing up to 50% least. The calibration and calculation can also be simplified with half the parameters involved in the calculation. Experiments on thin sheets and in vitro porcine eyes were performed, and it was found that the tearing forces were stable and the time sequence of tearing forceps was stationary or first-order difference stationary during roughly circular crack propagating.
Recently, brain-inspired spiking neural networks (SNNs) have demonstrated promising capabilities in solving pattern recognition tasks. However, these SNNs are grounded on homogeneous neurons that utilize a uniform neural coding for information representation. Given that each neural coding scheme possesses its own merits and drawbacks, these SNNs encounter challenges in achieving optimal performance such as accuracy, response time, efficiency, and robustness, all of which are crucial for practical applications. In this study, we argue that SNN architectures should be holistically designed to incorporate heterogeneous coding schemes. As an initial exploration in this direction, we propose a hybrid neural coding and learning framework, which encompasses a neural coding zoo with diverse neural coding schemes discovered in neuroscience. Additionally, it incorporates a flexible neural coding assignment strategy to accommodate task-specific requirements, along with novel layer-wise learning methods to effectively implement hybrid coding SNNs. We demonstrate the superiority of the proposed framework on image classification and sound localization tasks. Specifically, the proposed hybrid coding SNNs achieve comparable accuracy to state-of-the-art SNNs, while exhibiting significantly reduced inference latency and energy consumption, as well as high noise robustness. This study yields valuable insights into hybrid neural coding designs, paving the way for developing high-performance neuromorphic systems.
Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
In the era of precision medicine, more and more clinical trials are now driven or guided by biomarkers, which are patient characteristics objectively measured and evaluated as indicators of normal biological processes, pathogenic processes, or pharmacologic responses to therapeutic interventions. With the overarching objective to optimize and personalize disease management, biomarker-guided clinical trials increase the efficiency by appropriately utilizing prognostic or predictive biomarkers in the design. However, the efficiency gain is often not quantitatively compared to the traditional all-comers design, in which a faster enrollment rate is expected (e.g. due to no restriction to biomarker positive patients) potentially leading to a shorter duration. To accurately predict biomarker-guided trial duration, we propose a general framework using mixture distributions accounting for heterogeneous population. Extensive simulations are performed to evaluate the impact of heterogeneous population and the dynamics of biomarker characteristics and disease on the study duration. Several influential parameters including median survival time, enrollment rate, biomarker prevalence and effect size are identitied. Re-assessments of two publicly available trials are conducted to empirically validate the prediction accuracy and to demonstrate the practical utility. The R package \emph{detest} is developed to implement the proposed method and is publicly available on CRAN.
Numerous statistical methods have been developed to explore genomic imprinting and maternal effects, which are causes of parent-of-origin patterns in complex human diseases. However, most of them either only model one of these two confounded epigenetic effects, or make strong yet unrealistic assumptions about the population to avoid over-parameterization. A recent partial likelihood method (LIME) can identify both epigenetic effects based on case-control family data without those assumptions. Theoretical and empirical studies have shown its validity and robustness. However, because LIME obtains parameter estimation by maximizing partial likelihood, it is interesting to compare its efficiency with full likelihood maximizer. To overcome the difficulty in over-parameterization when using full likelihood, in this study we propose a Monte Carlo Expectation Maximization (MCEM) method to detect imprinting and maternal effects jointly. Those unknown mating type probabilities, the nuisance parameters, can be considered as latent variables in EM algorithm. Monte Carlo samples are used to numerically approximate the expectation function that cannot be solved algebraically. Our simulation results show that though this MCEM algorithm takes longer computational time, and can give higher bias in some simulations compared to LIME, it can generally detect both epigenetic effects with higher power and smaller standard error which demonstrates that it can be a good complement of LIME method.
Understanding causality helps to structure interventions to achieve specific goals and enables predictions under interventions. With the growing importance of learning causal relationships, causal discovery tasks have transitioned from using traditional methods to infer potential causal structures from observational data to the field of pattern recognition involved in deep learning. The rapid accumulation of massive data promotes the emergence of causal search methods with brilliant scalability. Existing summaries of causal discovery methods mainly focus on traditional methods based on constraints, scores and FCMs, there is a lack of perfect sorting and elaboration for deep learning-based methods, also lacking some considers and exploration of causal discovery methods from the perspective of variable paradigms. Therefore, we divide the possible causal discovery tasks into three types according to the variable paradigm and give the definitions of the three tasks respectively, define and instantiate the relevant datasets for each task and the final causal model constructed at the same time, then reviews the main existing causal discovery methods for different tasks. Finally, we propose some roadmaps from different perspectives for the current research gaps in the field of causal discovery and point out future research directions.
Decision-making algorithms are being used in important decisions, such as who should be enrolled in health care programs and be hired. Even though these systems are currently deployed in high-stakes scenarios, many of them cannot explain their decisions. This limitation has prompted the Explainable Artificial Intelligence (XAI) initiative, which aims to make algorithms explainable to comply with legal requirements, promote trust, and maintain accountability. This paper questions whether and to what extent explainability can help solve the responsibility issues posed by autonomous AI systems. We suggest that XAI systems that provide post-hoc explanations could be seen as blameworthy agents, obscuring the responsibility of developers in the decision-making process. Furthermore, we argue that XAI could result in incorrect attributions of responsibility to vulnerable stakeholders, such as those who are subjected to algorithmic decisions (i.e., patients), due to a misguided perception that they have control over explainable algorithms. This conflict between explainability and accountability can be exacerbated if designers choose to use algorithms and patients as moral and legal scapegoats. We conclude with a set of recommendations for how to approach this tension in the socio-technical process of algorithmic decision-making and a defense of hard regulation to prevent designers from escaping responsibility.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Transformer is a type of deep neural network mainly based on self-attention mechanism which is originally applied in natural language processing field. Inspired by the strong representation ability of transformer, researchers propose to extend transformer for computer vision tasks. Transformer-based models show competitive and even better performance on various visual benchmarks compared to other network types such as convolutional networks and recurrent networks. In this paper we provide a literature review of these visual transformer models by categorizing them in different tasks and analyze the advantages and disadvantages of these methods. In particular, the main categories include the basic image classification, high-level vision, low-level vision and video processing. Self-attention in computer vision is also briefly revisited as self-attention is the base component in transformer. Efficient transformer methods are included for pushing transformer into real applications. Finally, we give a discussion about the further research directions for visual transformer.
Human doctors with well-structured medical knowledge can diagnose a disease merely via a few conversations with patients about symptoms. In contrast, existing knowledge-grounded dialogue systems often require a large number of dialogue instances to learn as they fail to capture the correlations between different diseases and neglect the diagnostic experience shared among them. To address this issue, we propose a more natural and practical paradigm, i.e., low-resource medical dialogue generation, which can transfer the diagnostic experience from source diseases to target ones with a handful of data for adaptation. It is capitalized on a commonsense knowledge graph to characterize the prior disease-symptom relations. Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues. More importantly, by dynamically evolving disease-symptom graphs, GEML also well addresses the real-world challenges that the disease-symptom correlations of each disease may vary or evolve along with more diagnostic cases. Extensive experiment results on the CMDD dataset and our newly-collected Chunyu dataset testify the superiority of our approach over state-of-the-art approaches. Besides, our GEML can generate an enriched dialogue-sensitive knowledge graph in an online manner, which could benefit other tasks grounded on knowledge graph.
We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages - distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions. We show that our proposed network is capable of producing accurate and diverse captions across images.