Inverse problems arise anywhere we have indirect measurement. As, in general they are ill-posed, to obtain satisfactory solutions for them needs prior knowledge. Classically, different regularization methods and Bayesian inference based methods have been proposed. As these methods need a great number of forward and backward computations, they become costly in computation, in particular, when the forward or generative models are complex and the evaluation of the likelihood becomes very costly. Using Deep Neural Network surrogate models and approximate computation can become very helpful. However, accounting for the uncertainties, we need first understand the Bayesian Deep Learning and then, we can see how we can use them for inverse problems. In this work, we focus on NN, DL and more specifically the Bayesian DL particularly adapted for inverse problems. We first give details of Bayesian DL approximate computations with exponential families, then we will see how we can use them for inverse problems. We consider two cases: First the case where the forward operator is known and used as physics constraint, the second more general data driven DL methods. keyword: Neural Network, Variational Bayesian inference, Bayesian Deep Learning (DL), Inverse problems, Physics based DL.
Single neurons in neural networks are often ``interpretable'' in that they represent individual, intuitively meaningful features. However, many neurons exhibit $\textit{mixed selectivity}$, i.e., they represent multiple unrelated features. A recent hypothesis proposes that features in deep networks may be represented in $\textit{superposition}$, i.e., on non-orthogonal axes by multiple neurons, since the number of possible interpretable features in natural data is generally larger than the number of neurons in a given network. Accordingly, we should be able to find meaningful directions in activation space that are not aligned with individual neurons. Here, we propose (1) an automated method for quantifying visual interpretability that is validated against a large database of human psychophysics judgments of neuron interpretability, and (2) an approach for finding meaningful directions in network activation space. We leverage these methods to discover directions in convolutional neural networks that are more intuitively meaningful than individual neurons, as we confirm and investigate in a series of analyses. Moreover, we apply the same method to two recent datasets of visual neural responses in the brain and find that our conclusions largely transfer to real neural data, suggesting that superposition might be deployed by the brain. This also provides a link with disentanglement and raises fundamental questions about robust, efficient and factorized representations in both artificial and biological neural systems.
Neural networks have become a prominent approach to solve inverse problems in recent years. While a plethora of such methods was developed to solve inverse problems empirically, we are still lacking clear theoretical guarantees for these methods. On the other hand, many works proved convergence to optimal solutions of neural networks in a more general setting using overparametrization as a way to control the Neural Tangent Kernel. In this work we investigate how to bridge these two worlds and we provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.
Robots need to predict and react to human motions to navigate through a crowd without collisions. Many existing methods decouple prediction from planning, which does not account for the interaction between robot and human motions and can lead to the robot getting stuck. We propose SICNav, a Model Predictive Control (MPC) method that jointly solves for robot motion and predicted crowd motion in closed-loop. We model each human in the crowd to be following an Optimal Reciprocal Collision Avoidance (ORCA) scheme and embed that model as a constraint in the robot's local planner, resulting in a bilevel nonlinear MPC optimization problem. We use a KKT-reformulation to cast the bilevel problem as a single level and use a nonlinear solver to optimize. Our MPC method can influence pedestrian motion while explicitly satisfying safety constraints in a single-robot multi-human environment. We analyze the performance of SICNav in a simulation environment to demonstrate safe robot motion that can influence the surrounding humans. We also validate the trajectory forecasting performance of ORCA on a human trajectory dataset.
As voice assistants cement their place in our technologically advanced society, there remains a need to cater to the diverse linguistic landscape, including colloquial forms of low-resource languages. Our study introduces the first-ever comprehensive dataset for intent detection and slot filling in formal Bangla, colloquial Bangla, and Sylheti languages, totaling 984 samples across 10 unique intents. Our analysis reveals the robustness of large language models for tackling downstream tasks with inadequate data. The GPT-3.5 model achieves an impressive F1 score of 0.94 in intent detection and 0.51 in slot filling for colloquial Bangla.
The maximum likelihood estimation is widely used for statistical inferences. In this study, we reformulate the h-likelihood proposed by Lee and Nelder in 1996, whose maximization yields maximum likelihood estimators for fixed parameters and asymptotically best unbiased predictors for random parameters. We establish the statistical foundations for h-likelihood theories, which extend classical likelihood theories to embrace broad classes of statistical models with random parameters. The maximum h-likelihood estimators asymptotically achieve the generalized Cramer-Rao lower bound. Furthermore, we explore asymptotic theory when the consistency of either fixed parameter estimation or random parameter prediction is violated. The introduction of this new h-likelihood framework enables likelihood theories to cover inferences for a much broader class of models, while also providing computationally efficient fitting algorithms to give asymptotically optimal estimators for fixed parameters and predictors for random parameters.
There is no doubt that the Moon has become the center of interest for commercial and international actors. Over the past decade, the number of planned long-term missions has increased dramatically. This makes the establishment of cislunar space networks (CSNs) crucial to orchestrate uninterrupted communications between the Moon and Earth. However, there are numerous challenges, unknowns, and uncertainties associated with cislunar communications that may pose various risks to lunar missions. In this study, we aim to address these challenges for cislunar communications by proposing a machine learning-based cislunar space domain awareness (SDA) capability that enables robust and secure communications. To this end, we first propose a detailed channel model for selected cislunar scenarios. Secondly, we propose two types of interference that could model anomalies that occur in cislunar space and are so far known only to a limited extent. Finally, we discuss our cislunar SDA to work in conjunction with the spacecraft communication system. Our proposed cislunar SDA, involving heuristic learning capabilities with machine learning algorithms, detects interference models with over 96% accuracy. The results demonstrate the promising performance of our cislunar SDA approach for secure and robust cislunar communication.
Since the cyberspace consolidated as fifth warfare dimension, the different actors of the defense sector began an arms race toward achieving cyber superiority, on which research, academic and industrial stakeholders contribute from a dual vision, mostly linked to a large and heterogeneous heritage of developments and adoption of civilian cybersecurity capabilities. In this context, augmenting the conscious of the context and warfare environment, risks and impacts of cyber threats on kinetic actuations became a critical rule-changer that military decision-makers are considering. A major challenge on acquiring mission-centric Cyber Situational Awareness (CSA) is the dynamic inference and assessment of the vertical propagations from situations that occurred at the mission supportive Information and Communications Technologies (ICT), up to their relevance at military tactical, operational and strategical views. In order to contribute on acquiring CSA, this paper addresses a major gap in the cyber defence state-of-the-art: the dynamic identification of Key Cyber Terrains (KCT) on a mission-centric context. Accordingly, the proposed KCT identification approach explores the dependency degrees among tasks and assets defined by commanders as part of the assessment criteria. These are correlated with the discoveries on the operational network and the asset vulnerabilities identified thorough the supported mission development. The proposal is presented as a reference model that reveals key aspects for mission-centric KCT analysis and supports its enforcement and further enforcement by including an illustrative application case.
Defensive deception is a promising approach for cyberdefense. Although defensive deception is increasingly popular in the research community, there has not been a systematic investigation of its key components, the underlying principles, and its tradeoffs in various problem settings. This survey paper focuses on defensive deception research centered on game theory and machine learning, since these are prominent families of artificial intelligence approaches that are widely employed in defensive deception. This paper brings forth insights, lessons, and limitations from prior work. It closes with an outline of some research directions to tackle major gaps in current defensive deception research.
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.