The degree to which subjects differ from each other with respect to certain properties measured by a set of variables, plays an important role in many statistical methods. For example, classification, clustering, and data visualization methods all require a quantification of differences in the observed values. We can refer to the quantification of such differences, as distance. An appropriate definition of a distance depends on the nature of the data and the problem at hand. For distances between numerical variables, there exist many definitions that depend on the size of the observed differences. For categorical data, the definition of a distance is more complex, as there is no straightforward quantification of the size of the observed differences. Consequently, many proposals exist that can be used to measure differences based on categorical variables. In this paper, we introduce a general framework that allows for an efficient and transparent implementation of distances between observations on categorical variables. We show that several existing distances can be incorporated into the framework. Moreover, our framework quite naturally leads to the introduction of new distance formulations and allows for the implementation of flexible, case and data specific distance definitions. Furthermore, in a supervised classification setting, the framework can be used to construct distances that incorporate the association between the response and predictor variables and hence improve the performance of distance-based classifiers.
The multi-agent setting is intricate and unpredictable since the behaviors of multiple agents influence one another. To address this environmental uncertainty, distributional reinforcement learning algorithms that incorporate uncertainty via distributional output have been integrated with multi-agent reinforcement learning (MARL) methods, achieving state-of-the-art performance. However, distributional MARL algorithms still rely on the traditional $\epsilon$-greedy, which does not take cooperative strategy into account. In this paper, we present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution. Initially, we take expectations from the upper quantiles of state-action values for exploration, which are optimistic actions, and gradually shift the sampling region of quantiles to the full distribution for exploitation. By ensuring that each agent is exposed to the same level of risk, we can force them to take cooperatively optimistic actions. Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression appropriately controlling the level of risk.
We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate sampling strategy at the near boundary region. Overly dense sampling can adversely impede training convergence if the local gradient behaviors are too complex to be adequately modelled by PINNs. On the other hand, if the samples are too sparse, existing PINNs tend to overfit the near boundary region, leading to incorrect solution. To prevent such issues, we propose a new Boundary Connectivity (BCXN) loss function which provides linear local structure approximation (LSA) to the gradient behaviors at the boundary for PINN. Our BCXN-loss implicitly imposes local structure during training, thus facilitating fast physics-informed learning across entire problem domains with order of magnitude sparser training samples. This LSA-PINN method shows a few orders of magnitude smaller errors than existing methods in terms of the standard L2-norm metric, while using dramatically fewer training samples and iterations. Our proposed LSA-PINN does not pose any requirement on the differentiable property of the networks, and we demonstrate its benefits and ease of implementation on both multi-layer perceptron and convolutional neural network versions as commonly used in current PINN literature.
In this paper, we give a simple polynomial-time reduction of {L(p)-Labeling} on graphs with a small diameter to {Metric (Path) TSP}, which enables us to use numerous results on {(Metric) TSP}. On the practical side, we can utilize various high-performance heuristics for TSP, such as Concordo and LKH, to solve our problem. On the theoretical side, we can see that the problem for any p under this framework is 1.5-approximable, and it can be solved by the Held-Karp algorithm in O(2^n n^2) time, where n is the number of vertices, and so on.
Handling the problem of scalability is one of the essential issues for multi-agent reinforcement learning (MARL) algorithms to be applied to real-world problems typically involving massively many agents. For this, parameter sharing across multiple agents has widely been used since it reduces the training time by decreasing the number of parameters and increasing the sample efficiency. However, using the same parameters across agents limits the representational capacity of the joint policy and consequently, the performance can be degraded in multi-agent tasks that require different behaviors for different agents. In this paper, we propose a simple method that adopts structured pruning for a deep neural network to increase the representational capacity of the joint policy without introducing additional parameters. We evaluate the proposed method on several benchmark tasks, and numerical results show that the proposed method significantly outperforms other parameter-sharing methods.
Marginal likelihood, also known as model evidence, is a fundamental quantity in Bayesian statistics. It is used for model selection using Bayes factors or for empirical Bayes tuning of prior hyper-parameters. Yet, the calculation of evidence has remained a longstanding open problem in Gaussian graphical models. Currently, the only feasible solutions that exist are for special cases such as the Wishart or G-Wishart, in moderate dimensions. We develop an approach based on a novel telescoping block decomposition of the precision matrix that allows the estimation of evidence by application of Chib's technique under a very broad class of priors under mild requirements. Specifically, the requirements are: (a) the priors on the diagonal terms on the precision matrix can be written as gamma or scale mixtures of gamma random variables and (b) those on the off-diagonal terms can be represented as normal or scale mixtures of normal. This includes structured priors such as the Wishart or G-Wishart, and more recently introduced element-wise priors, such as the Bayesian graphical lasso and the graphical horseshoe. Among these, the true marginal is known in an analytically closed form for Wishart, providing a useful validation of our approach. For the general setting of the other three, and several more priors satisfying conditions (a) and (b) above, the calculation of evidence has remained an open question that this article resolves under a unifying framework.
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space, and a label for each data point given by an oracle indicating what subsets of points should be clustered together. This setting captures situations where we have access to some auxiliary information about the data set relevant for our clustering objective, for instance the labels output by a neural network. Following prior work, we assume that there are at most an $\alpha \in (0,c)$ for some $c<1$ fraction of false positives and false negatives in each predicted cluster, in the absence of which the labels would attain the optimal clustering cost $\mathrm{OPT}$. For a dataset of size $m$, we propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost compared to the previous randomized algorithm while preserving the $O( d m \log m)$ runtime. Furthermore, our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $\alpha$ up to $1/2$, an improvement over $\alpha$ being at most $1/7$ in the previous work. For the $k$-medians problem we improve upon prior work by achieving a biquadratic improvement in the dependence of the approximation factor on the accuracy parameter $\alpha$ to get a cost of $(1+O(\alpha))\mathrm{OPT}$, while requiring essentially just $O(md \log^3 m/\alpha)$ runtime.
The information-theoretic framework promises to explain the predictive power of neural networks. In particular, the information plane analysis, which measures mutual information (MI) between input and representation as well as representation and output, should give rich insights into the training process. This approach, however, was shown to strongly depend on the choice of estimator of the MI. The problem is amplified for deterministic networks if the MI between input and representation is infinite. Thus, the estimated values are defined by the different approaches for estimation, but do not adequately represent the training process from an information-theoretic perspective. In this work, we show that dropout with continuously distributed noise ensures that MI is finite. We demonstrate in a range of experiments that this enables a meaningful information plane analysis for a class of dropout neural networks that is widely used in practice.
Researchers have proposed various methods for visually interpreting the Convolutional Neural Network (CNN) via saliency maps, which include Class-Activation-Map (CAM) based approaches as a leading family. However, in terms of the internal design logic, existing CAM-based approaches often overlook the causal perspective that answers the core "why" question to help humans understand the explanation. Additionally, current CNN explanations lack the consideration of both necessity and sufficiency, two complementary sides of a desirable explanation. This paper presents a causality-driven framework, SUNY, designed to rationalize the explanations toward better human understanding. Using the CNN model's input features or internal filters as hypothetical causes, SUNY generates explanations by bi-directional quantifications on both the necessary and sufficient perspectives. Extensive evaluations justify that SUNY not only produces more informative and convincing explanations from the angles of necessity and sufficiency, but also achieves performances competitive to other approaches across different CNN architectures over large-scale datasets, including ILSVRC2012 and CUB-200-2011.
Understanding causality helps to structure interventions to achieve specific goals and enables predictions under interventions. With the growing importance of learning causal relationships, causal discovery tasks have transitioned from using traditional methods to infer potential causal structures from observational data to the field of pattern recognition involved in deep learning. The rapid accumulation of massive data promotes the emergence of causal search methods with brilliant scalability. Existing summaries of causal discovery methods mainly focus on traditional methods based on constraints, scores and FCMs, there is a lack of perfect sorting and elaboration for deep learning-based methods, also lacking some considers and exploration of causal discovery methods from the perspective of variable paradigms. Therefore, we divide the possible causal discovery tasks into three types according to the variable paradigm and give the definitions of the three tasks respectively, define and instantiate the relevant datasets for each task and the final causal model constructed at the same time, then reviews the main existing causal discovery methods for different tasks. Finally, we propose some roadmaps from different perspectives for the current research gaps in the field of causal discovery and point out future research directions.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.