Multi-task partially annotated data where each data point is annotated for only a single task are potentially helpful for data scarcity if a network can leverage the inter-task relationship. In this paper, we study the joint learning of object detection and semantic segmentation, the two most popular vision problems, from multi-task data with partial annotations. Extensive experiments are performed to evaluate each task performance and explore their complementarity when a multi-task network cannot optimize both tasks simultaneously. We propose employing knowledge distillation to leverage joint-task optimization. The experimental results show favorable results for multi-task learning and knowledge distillation over single-task learning and even full supervision scenario. All code and data splits are available at //github.com/lhoangan/multas
Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probability that the Lasso estimator for the neighborhood of a node within a Gaussian graphical model, optimized using a prediction oracle, misidentifies the neighborhood. Our results pertain to both undirected and directed acyclic graphs, encompassing general, sparse covariance structures. To support our theoretical findings, we conduct an empirical investigation of this inconsistency by contrasting our outcomes with other commonly used information criteria through an extensive simulation study. Given that many algorithms designed to learn the structure of graphical models require hyperparameter selection, the precise calibration of this hyperparameter is paramount for accurately estimating the inherent structure. Consequently, our observations shed light on this widely recognized practical challenge.
In recent years, the Adaptive Antoulas-Anderson AAA algorithm has established itself as the method of choice for solving rational approximation problems. Data-driven Model Order Reduction (MOR) of large-scale Linear Time-Invariant (LTI) systems represents one of the many applications in which this algorithm has proven to be successful since it typically generates reduced-order models (ROMs) efficiently and in an automated way. Despite its effectiveness and numerical reliability, the classical AAA algorithm is not guaranteed to return a ROM that retains the same structural features of the underlying dynamical system, such as the stability of the dynamics. In this paper, we propose a novel algebraic characterization for the stability of ROMs with transfer function obeying the AAA barycentric structure. We use this characterization to formulate a set of convex constraints on the free coefficients of the AAA model that, whenever verified, guarantee by construction the asymptotic stability of the resulting ROM. We suggest how to embed such constraints within the AAA optimization routine, and we validate experimentally the effectiveness of the resulting algorithm, named stabAAA, over a set of relevant MOR applications.
This paper addresses the estimation of the second-order structure of a manifold cross-time random field (RF) displaying spatially varying Long Range Dependence (LRD), adopting the functional time series framework introduced in Ruiz-Medina (2022). Conditions for the asymptotic unbiasedness of the integrated periodogram operator in the Hilbert-Schmidt operator norm are derived beyond structural assumptions. Weak-consistent estimation of the long-memory operator is achieved under a semiparametric functional spectral framework in the Gaussian context. The case where the projected manifold process can display Short Range Dependence (SRD) and LRD at different manifold scales is also analyzed. The performance of both estimation procedures is illustrated in the simulation study, in the context of multifractionally integrated spherical functional autoregressive-moving average (SPHARMA(p,q)) processes.
The statistical analysis of group studies in neuroscience is particularly challenging due to the complex spatio-temporal nature of the data, its multiple levels and the inter-individual variability in brain responses. In this respect, traditional ANOVA-based studies and linear mixed effects models typically provide only limited exploration of the dynamic of the group brain activity and variability of the individual responses potentially leading to overly simplistic conclusions and/or missing more intricate patterns. In this study we propose a novel method based on functional Principal Components Analysis and Bayesian model-based clustering to simultaneously assess group effects and individual deviations over the most important temporal features in the data. This method provides a thorough exploration of group differences and individual deviations in neuroscientific group studies without compromising on the spatio-temporal nature of the data. By means of a simulation study we demonstrate that the proposed model returns correct classification in different clustering scenarios under low and high of noise levels in the data. Finally we consider a case study using Electroencephalogram data recorded during an object recognition task where our approach provides new insights into the underlying brain mechanisms generating the data and their variability.
An a posteriori error estimator based on an equilibrated flux reconstruction is proposed for defeaturing problems in the context of finite element discretizations. Defeaturing consists in the simplification of a geometry by removing features that are considered not relevant for the approximation of the solution of a given PDE. In this work, the focus is on Poisson equation with Neumann boundary conditions on the feature boundary. The estimator accounts both for the so-called defeaturing error and for the numerical error committed by approximating the solution on the defeatured domain. Unlike other estimators that were previously proposed for defeaturing problems, the use of the equilibrated flux reconstruction allows to obtain a sharp bound for the numerical component of the error. Furthermore, it does not require the evaluation of the normal trace of the numerical flux on the feature boundary: this makes the estimator well-suited for finite element discretizations, in which the normal trace of the numerical flux is typically discontinuous across elements. The reliability of the estimator is proven and verified on several numerical examples. Its capability to identify the most relevant features is also shown, in anticipation of a future application to an adaptive strategy.
This paper introduces a new theoretical and computational framework for a data driven Koopman mode analysis of nonlinear dynamics. To alleviate the potential problem of ill-conditioned eigenvectors in the existing implementations of the Dynamic Mode Decomposition (DMD) and the Extended Dynamic Mode Decomposition (EDMD), the new method introduces a Koopman-Schur decomposition that is entirely based on unitary transformations. The analysis in terms of the eigenvectors as modes of a Koopman operator compression is replaced with a modal decomposition in terms of a flag of invariant subspaces that correspond to selected eigenvalues. The main computational tool from the numerical linear algebra is the partial ordered Schur decomposition that provides convenient orthonormal bases for these subspaces. In the case of real data, a real Schur form is used and the computation is based on real orthogonal transformations. The new computational scheme is presented in the framework of the Extended DMD and the kernel trick is used.
This work deals with developing two fast randomized algorithms for computing the generalized tensor singular value decomposition (GTSVD) based on the tubal product (t-product). The random projection method is utilized to compute the important actions of the underlying data tensors and use them to get small sketches of the original data tensors, which are easier to be handled. Due to the small size of the sketch tensors, deterministic approaches are applied to them to compute their GTSVDs. Then, from the GTSVD of the small sketch tensors, the GTSVD of the original large-scale data tensors is recovered. Some experiments are conducted to show the effectiveness of the proposed approach.
Many complex tasks and environments can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to expedite adaptation and enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. While most of these systems are monolithic, modularity promises to allow capturing the compositional nature of many tasks. However, it is unclear under which circumstances modular systems discover this hidden compositional structure. To shed light on this question, we study a teacher-student setting with a modular teacher where we have full control over the composition of ground truth modules. This allows us to relate the problem of compositional generalization to that of identification of the underlying modules. We show theoretically that identification up to linear transformation purely from demonstrations is possible in hypernetworks without having to learn an exponential number of module combinations. While our theory assumes the infinite data limit, in an extensive empirical study we demonstrate how meta-learning from finite data can discover modular solutions that generalize compositionally in modular but not monolithic architectures. We further show that our insights translate outside the teacher-student setting and demonstrate that in tasks with compositional preferences and tasks with compositional goals hypernetworks can discover modular policies that compositionally generalize.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.