Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.
Open parity games are proposed as a compositional extension of parity games with algebraic operations, forming string diagrams of parity games. A potential application of string diagrams of parity games is to describe a large parity game with a given compositional structure and solve it efficiently as a divide-and-conquer algorithm by exploiting its compositional structure. Building on our recent progress in open Markov decision processes, we introduce Pareto fronts of open parity games, offering a framework for multi-objective solutions. We establish the positional determinacy of open parity games with respect to their Pareto fronts through a novel translation method. Our translation converts an open parity game into a parity game tailored to a given single-objective. Furthermore, we present a simple algorithm for solving open parity games, derived from this translation that allows the application of existing efficient algorithms for parity games. Expanding on this foundation, we develop a compositional algorithm for string diagrams of parity games.
We give a construction in a column of a one-dimensional cellular automaton of the Minkowski sum of two sets which can themselves occur in columns of cellular automata. It enables us to obtain another construction of the set of integers that are sums of three squares, answering a question by the same author.
Crossed random effects structures arise in many scientific contexts. They raise severe computational problems with likelihood computations scaling like $N^{3/2}$ or worse for $N$ data points. In this paper we develop a new composite likelihood approach for crossed random effects probit models. For data arranged in R rows and C columns, the likelihood function includes a very difficult R + C dimensional integral. The composite likelihood we develop uses the marginal distribution of the response along with two hierarchical models. The cost is reduced to $\mathcal{O}(N)$ and it can be computed with $R + C$ one dimensional integrals. We find that the commonly used Laplace approximation has a cost that grows superlinearly. We get consistent estimates of the probit slope and variance components from our composite likelihood algorithm. We also show how to estimate the covariance of the estimated regression coefficients. The algorithm scales readily to a data set of five million observations from Stitch Fix with $R + C > 700{,}000$.
This work presents a procedure to solve the Euler equations by explicitly updating, in a conservative manner, a generic thermodynamic variable such as temperature, pressure or entropy instead of the total energy. The presented procedure is valid for any equation of state and spatial discretization. When using complex equations of state such as Span-Wagner, choosing the temperature as the generic thermodynamic variable yields great reductions in the computational costs associated to thermodynamic evaluations. Results computed with a state of the art thermodynamic model are presented, and computational times are analyzed. Particular attention is dedicated to the conservation of total energy, the propagation speed of shock waves and jump conditions. The procedure is thoroughly tested using the Span-Wagner equation of state through the CoolProp thermodynamic library and the Van der Waals equation of state, both in the ideal and non-ideal compressible fluid-dynamics regimes, by comparing it to the standard total energy update and analytical solutions where available.
We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. Such problems are minimised `at infinity' and have many possible solutions; we study which solution is preferred by the algorithm depending on the mirror potential. For exponential tailed losses and under mild assumptions on the potential, we show that the iterates converge in direction towards a $\phi_\infty$-maximum margin classifier. The function $\phi_\infty$ is the $\textit{horizon function}$ of the mirror potential and characterises its shape `at infinity'. When the potential is separable, a simple formula allows to compute this function. We analyse several examples of potentials and provide numerical experiments highlighting our results.
Tensor network contractions are widely used in statistical physics, quantum computing, and computer science. We introduce a method to efficiently approximate tensor network contractions using low-rank approximations, where each intermediate tensor generated during the contractions is approximated as a low-rank binary tree tensor network. The proposed algorithm has the flexibility to incorporate a large portion of the environment when performing low-rank approximations, which can lead to high accuracy for a given rank. Here, the environment refers to the remaining set of tensors in the network, and low-rank approximations with larger environments can generally provide higher accuracy. For contracting tensor networks defined on lattices, the proposed algorithm can be viewed as a generalization of the standard boundary-based algorithms. In addition, the algorithm includes a cost-efficient density matrix algorithm for approximating a tensor network with a general graph structure into a tree structure, whose computational cost is asymptotically upper-bounded by that of the standard algorithm that uses canonicalization. Experimental results indicate that the proposed technique outperforms previously proposed approximate tensor network contraction algorithms for multiple problems in terms of both accuracy and efficiency.
Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural networks, there have been efforts to establish diffractive processors that are jointly optimized with digital neural networks serving as their back-end. These jointly optimized hybrid (optical+digital) processors establish a new "diffractive language" between input electromagnetic waves that carry analog information and neural networks that process the digitized information at the back-end, providing the best of both worlds. Such hybrid designs can process spatially and temporally coherent, partially coherent, or incoherent input waves, providing universal coverage for any spatially varying set of point spread functions that can be optimized for a given task, executed in collaboration with digital neural networks. In this article, we highlight the utility of this exciting collaboration between engineered and programmed diffraction and digital neural networks for a diverse range of applications. We survey some of the major innovations enabled by the push-pull relationship between analog wave processing and digital neural networks, also covering the significant benefits that could be reaped through the synergy between these two complementary paradigms.
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.
The concept of causality plays an important role in human cognition . In the past few decades, causal inference has been well developed in many fields, such as computer science, medicine, economics, and education. With the advancement of deep learning techniques, it has been increasingly used in causal inference against counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective optimization functions to estimate counterfactual data unbiasedly based on the different optimization methods. This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.