This paper proposes two methods for causal additive models with unobserved variables (CAM-UV). CAM-UV assumes that the causal functions take the form of generalized additive models and that latent confounders are present. First, we propose a method that leverages prior knowledge for efficient causal discovery. Then, we propose an extension of this method for inferring causality in time series data. The original CAM-UV algorithm differs from other existing causal function models in that it does not seek the causal order between observed variables, but rather aims to identify the causes for each observed variable. Therefore, the first proposed method in this paper utilizes prior knowledge, such as understanding that certain variables cannot be causes of specific others. Moreover, by incorporating the prior knowledge that causes precedes their effects in time, we extend the first algorithm to the second method for causal discovery in time series data. We validate the first proposed method by using simulated data to demonstrate that the accuracy of causal discovery increases as more prior knowledge is accumulated. Additionally, we test the second proposed method by comparing it with existing time series causal discovery methods, using both simulated data and real-world data.
Recent advancements in foundation models have yielded impressive performance across a wide range of tasks. Meanwhile, for specific applications, practitioners have been developing specialized application models. To enjoy the benefits of both kinds of models, one natural path is to transfer the knowledge in foundation models into specialized application models, which are generally more efficient for serving. Techniques from knowledge distillation may be applied here, where the application model learns to mimic the foundation model. However, specialized application models and foundation models have substantial gaps in capacity, employing distinct architectures, using different input features from different modalities, and being optimized on different distributions. These differences in model characteristics lead to significant challenges for distillation methods. In this work, we propose creating a teaching committee comprising both foundation model teachers and complementary teachers. Complementary teachers possess model characteristics akin to the student's, aiming to bridge the gap between the foundation model and specialized application models for a smoother knowledge transfer. Further, to accommodate the dissimilarity among the teachers in the committee, we introduce DiverseDistill, which allows the student to understand the expertise of each teacher and extract task knowledge. Our evaluations demonstrate that adding complementary teachers enhances student performance. Finally, DiverseDistill consistently outperforms baseline distillation methods, regardless of the teacher choices, resulting in significantly improved student performance.
We use Markov categories to develop generalizations of the theory of Markov chains and hidden Markov models in an abstract setting. This comprises characterizations of hidden Markov models in terms of local and global conditional independences as well as existing algorithms for Bayesian filtering and smoothing applicable in all Markov categories with conditionals. We show that these algorithms specialize to existing ones such as the Kalman filter, forward-backward algorithm, and the Rauch-Tung-Striebel smoother when instantiated in appropriate Markov categories. Under slightly stronger assumptions, we also prove that the sequence of outputs of the Bayes filter is itself a Markov chain with a concrete formula for its transition maps. There are two main features of this categorical framework. The first is its generality, as it can be used in any Markov category with conditionals. In particular, it provides a systematic unified account of hidden Markov models and algorithms for filtering and smoothing in discrete probability, Gaussian probability, measure-theoretic probability, possibilistic nondeterminism and others at the same time. The second feature is the intuitive visual representation of information flow in these algorithms in terms of string diagrams.
This paper proposes crack segmentation augmented by super resolution (SR) with deep neural networks. In the proposed method, a SR network is jointly trained with a binary segmentation network in an end-to-end manner. This joint learning allows the SR network to be optimized for improving segmentation results. For realistic scenarios, the SR network is extended from non-blind to blind for processing a low-resolution image degraded by unknown blurs. The joint network is improved by our proposed two extra paths that further encourage the mutual optimization between SR and segmentation. Comparative experiments with State of The Art (SoTA) segmentation methods demonstrate the superiority of our joint learning, and various ablation studies prove the effects of our contributions.
This paper explores the role of the Chain of Thought (CoT) in Large Language Models (LLMs) reasoning. Despite its potential to improve task performance, our analysis reveals a surprising frequency of correct answers following incorrect CoTs and vice versa. We employ causal analysis to assess the cause-effect relationship between CoTs/instructions and answers in LLMs, uncovering the Structural Causal Model (SCM) that LLMs approximate. By comparing the implied SCM with that of human reasoning, we highlight discrepancies between LLM and human reasoning processes. We further examine the factors influencing the causal structure of the implied SCM, revealing that in-context learning, supervised fine-tuning, and reinforcement learning on human feedback significantly impact the causal relations. We release the code and results at //github.com/StevenZHB/CoT_Causal_Analysis.
This paper investigates the problem of noncoherent direction-of-arrival (DOA) estimation using different sparse subarrays. In particular, we present a Multiple Measurements Vector (MMV) model for noncoherent DOA estimation based on a low-rank and sparse recovery optimization problem. Moreover, we develop two different practical strategies to obtain sparse arrays and subarrays: i) the subarrays are generated from a main sparse array geometry (Type-I sparse array), and ii) the sparse subarrays that are directly designed and grouped together to generate the whole sparse array (Type-II sparse array). Numerical results demonstrate that the proposed MMV model can benefit from multiple data records and that Type-II sparse noncoherent arrays are superior in performance for DOA estimation
Reduced-order models (ROMs) allow for the simulation of blood flow in patient-specific vasculatures without the high computational cost and wait time associated with traditional computational fluid dynamics (CFD) models. Unfortunately, due to the simplifications made in their formulations, ROMs can suffer from significantly reduced accuracy. One common simplifying assumption is the continuity of static or total pressure over vascular junctions. In many cases, this assumption has been shown to introduce significant error. We propose a model to account for this pressure difference, with the ultimate goal of increasing the accuracy of cardiovascular ROMs. Our model successfully uses a structure common in existing ROMs in conjunction with machine-learning techniques to predict the pressure difference over a vascular bifurcation. We analyze the performance of our model on steady and transient flows, testing it on three bifurcation cohorts representing three different bifurcation geometric types. We also compare the efficacy of different machine-learning techniques and two different model modalities.
This paper proposes an interpretation of RLAIF as Bayesian inference by introducing distilled Self-Critique (dSC), which refines the outputs of a LLM through a Gibbs sampler that is later distilled into a fine-tuned model. Only requiring synthetic data, dSC is exercised in experiments regarding safety, sentiment, and privacy control, showing it can be a viable and cheap alternative to align LLMs. Code released at \url{//github.com/vicgalle/distilled-self-critique}.
We analyze and compare different methods for handling the mutual coupling in RIS-aided communication systems. A new mutual coupling aware algorithm is derived where the reactance of each element is updated successively with a closed-form solution. In comparison to existing element-wise methods, this approach leads to a considerably reduced computational complexity. Furthermore, we introduce decoupling networks for the RIS array as a potential solution for handling mutual coupling. With these networks, the system model reduces to the same structure as when no mutual coupling were present. Including decoupling networks, we can optimize the channel gain of a RIS-aided SISO system in closed-form which allows to analyze the scenario under mutual coupling analytically and to draw connections to the conventional transmit array gain. In particular, a super-quadratic channel gain can be achieved which scales as N^4 where N is the number of RIS elements.
Suitable representations of dynamical systems can simplify their analysis and control. On this line of thought, this paper aims to answer the following question: Can a transformation of the generalized coordinates under which the actuators directly perform work on a subset of the configuration variables be found? Not only we show that the answer to this question is yes, but we also provide necessary and sufficient conditions. More specifically, we look for a representation of the configuration space such that the right-hand side of the dynamics in Euler-Lagrange form becomes $[\boldsymbol{I} \; \boldsymbol{O}]^{T}\boldsymbol{u}$, being $u$ the system input. We identify a class of systems, called collocated, for which this problem is solvable. Under mild conditions on the input matrix, a simple test is presented to verify whether a system is collocated or not. By exploiting power invariance, we provide necessary and sufficient conditions that a change of coordinates decouples the input channels if and only if the dynamics is collocated. In addition, we use the collocated form to derive novel controllers for damped underactuated mechanical systems. To demonstrate the theoretical findings, we consider several Lagrangian systems with a focus on continuum soft robots.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.