This article inspects whether a multivariate distribution is different from a specified distribution or not, and it also tests the equality of two multivariate distributions. In the course of this study, a graphical tool-kit based on well-known half-spaced depth is proposed, which is a two-dimensional plot, regardless of the dimension of the data, and it is even useful in comparing high-dimensional distributions. The simple interpretability of the proposed graphical tool-kit motivates us to formulate test statistics to carry out the corresponding testing of hypothesis problems. It is established that the proposed tests are consistent, and moreover, the asymptotic distributions of the test statistics under contiguous alternatives are derived, which enable us to compute the asymptotic power of these tests. Furthermore, it is observed that the computations associated with the proposed tests are unburdensome. Besides, these tests perform better than many other tests available in the literature when data are generated from various distributions such as heavy tailed distributions, which indicates that the proposed methodology is robust as well. Finally, the usefulness of the proposed graphical tool-kit and tests is shown on two benchmark real data sets.
Correlated proportions appear in many real-world applications and present a unique challenge in terms of finding an appropriate probabilistic model due to their constrained nature. The bivariate beta is a natural extension of the well-known beta distribution to the space of correlated quantities on $[0, 1]^2$. Its construction is not unique, however. Over the years, many bivariate beta distributions have been proposed, ranging from three to eight or more parameters, and for which the joint density and distribution moments vary in terms of mathematical tractability. In this paper, we investigate the construction proposed by Olkin & Trikalinos (2015), which strikes a balance between parameter-richness and tractability. We provide classical (frequentist) and Bayesian approaches to estimation in the form of method-of-moments and latent variable/data augmentation coupled with Hamiltonian Monte Carlo, respectively. The elicitation of bivariate beta as a prior distribution is also discussed. The development of diagnostics for checking model fit and adequacy is explored in depth with the aid of Monte Carlo experiments under both well-specified and misspecified data-generating settings. Keywords: Bayesian estimation; bivariate beta; correlated proportions; diagnostics; method of moments.
ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of ChatGPT have been done, its robustness, i.e., the performance to unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct a thorough evaluation of the robustness of ChatGPT from the adversarial and out-of-distribution (OOD) perspective. To do so, we employ the AdvGLUE and ANLI benchmarks to assess adversarial robustness and the Flipkart review and DDXPlus medical diagnosis datasets for OOD evaluation. We select several popular foundation models as baselines. Results show that ChatGPT shows consistent advantages on most adversarial and OOD classification and translation tasks. However, the absolute performance is far from perfection, which suggests that adversarial and OOD robustness remains a significant threat to foundation models. Moreover, ChatGPT shows astounding performance in understanding dialogue-related texts and we find that it tends to provide informal suggestions for medical tasks instead of definitive answers. Finally, we present in-depth discussions of possible research directions.
The chain graph model admits both undirected and directed edges in one graph, where symmetric conditional dependencies are encoded via undirected edges and asymmetric causal relations are encoded via directed edges. Though frequently encountered in practice, the chain graph model has been largely under investigated in literature, possibly due to the lack of identifiability conditions between undirected and directed edges. In this paper, we first establish a set of novel identifiability conditions for the Gaussian chain graph model, exploiting a low rank plus sparse decomposition of the precision matrix. Further, an efficient learning algorithm is built upon the identifiability conditions to fully recover the chain graph structure. Theoretical analysis on the proposed method is conducted, assuring its asymptotic consistency in recovering the exact chain graph structure. The advantage of the proposed method is also supported by numerical experiments on both simulated examples and a real application on the Standard & Poor 500 index data.
The coordination of actions and the allocation of profit in supply chains under decentralized control play an important role in improving the profits of retailers and suppliers in the chain. We focus on supply chains under decentralized control in which noncompeting retailers can order from multiple suppliers to replenish their stocks. Suppliers' production capacity is bounded. The goal of the firms in the chain is to maximize their individual profits. As the outcome under decentralized control is inefficient, coordination of actions between cooperating agents can improve individual profits. Cooperative game theory is used to analyze cooperation between agents. We define multi-retailer-supplier games and show that agents can always achieve together an optimal profit and they have incentives to cooperate and to form the grand coalition. Moreover, we show that there always exist stable allocations of the total profit among the firms upon which no coalition can improve. Then we propose and characterize a stable allocation of the total surplus induced by cooperation.
The over-smoothing problem is an obstacle of developing deep graph neural network (GNN). Although many approaches to improve the over-smoothing problem have been proposed, there is still a lack of comprehensive understanding and conclusion of this problem. In this work, we analyze the over-smoothing problem from the Markov chain perspective. We focus on message passing of GNN and first establish a connection between GNNs and Markov chains on the graph. GNNs are divided into two classes of operator-consistent and operator-inconsistent based on whether the corresponding Markov chains are time-homogeneous. Next we attribute the over-smoothing problem to the convergence of an arbitrary initial distribution to a stationary distribution. Based on this, we prove that although the previously proposed methods can alleviate over-smoothing, but these methods cannot avoid the over-smoothing problem. In addition, we give the conclusion of the over-smoothing problem in two types of GNNs in the Markovian sense. On the one hand, operator-consistent GNN cannot avoid over-smoothing at an exponential rate. On the other hand, operator-inconsistent GNN is not always over-smoothing. Further, we investigate the existence of the limiting distribution of the time-inhomogeneous Markov chain, from which we derive a sufficient condition for operator-inconsistent GNN to avoid over-smoothing. Finally, we design experiments to verify our findings. Results show that our proposed sufficient condition can effectively improve over-smoothing problem in operator-inconsistent GNN and enhance the performance of the model.
Machine Learning (ML) has widely been used for modeling and predicting physical systems. These techniques offer high expressive power and good generalizability for interpolation within observed data sets. However, the disadvantage of black-box models is that they underperform under blind conditions since no physical knowledge is incorporated. Physics-based ML aims to address this problem by retaining the mathematical flexibility of ML techniques while incorporating physics. In accord, this paper proposes to embed mechanics-based models into the mean function of a Gaussian Process (GP) model and characterize potential discrepancies through kernel machines. A specific class of kernel function is promoted, which has a connection with the gradient of the physics-based model with respect to the input and parameters and shares similarity with the exact Autocovariance function of linear dynamical systems. The spectral properties of the kernel function enable considering dominant periodic processes originating from physics misspecification. Nevertheless, the stationarity of the kernel function is a difficult hurdle in the sequential processing of long data sets, resolved through hierarchical Bayesian techniques. This implementation is also advantageous to mitigate computational costs, alleviating the scalability of GPs when dealing with sequential data. Using numerical and experimental examples, potential applications of the proposed method to structural dynamics inverse problems are demonstrated.
With apparently all research on estimation-of-distribution algorithms (EDAs) concentrated on pseudo-Boolean optimization and permutation problems, we undertake the first steps towards using EDAs for problems in which the decision variables can take more than two values, but which are not permutation problems. To this aim, we propose a natural way to extend the known univariate EDAs to such variables. Different from a naive reduction to the binary case, it avoids additional constraints. Since understanding genetic drift is crucial for an optimal parameter choice, we extend the known quantitative analysis of genetic drift to EDAs for multi-valued variables. Roughly speaking, when the variables take $r$ different values, the time for genetic drift to become significant is $r$ times shorter than in the binary case. Consequently, the update strength of the probabilistic model has to be chosen $r$ times lower now. To investigate how desired model updates take place in this framework, we undertake a mathematical runtime analysis on the $r$-valued LeadingOnes problem. We prove that with the right parameters, the multi-valued UMDA solves this problem efficiently in $O(r\log(r)^2 n^2 \log(n))$ function evaluations. Overall, our work shows that EDAs can be adjusted to multi-valued problems, and it gives advice on how to set the main parameters.
Graph neural networks (GNNs) are a type of deep learning models that learning over graphs, and have been successfully applied in many domains. Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs. As a remedy, distributed computing becomes a promising solution of training large-scale GNNs, since it is able to provide abundant computing resources. However, the dependency of graph structure increases the difficulty of achieving high-efficiency distributed GNN training, which suffers from the massive communication and workload imbalance. In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed. Yet, there is a lack of systematic review on the optimization techniques from graph processing to distributed execution. In this survey, we analyze three major challenges in distributed GNN training that are massive feature communication, the loss of model accuracy and workload imbalance. Then we introduce a new taxonomy for the optimization techniques in distributed GNN training that address the above challenges. The new taxonomy classifies existing techniques into four categories that are GNN data partition, GNN batch generation, GNN execution model, and GNN communication protocol.We carefully discuss the techniques in each category. In the end, we summarize existing distributed GNN systems for multi-GPUs, GPU-clusters and CPU-clusters, respectively, and give a discussion about the future direction on scalable GNNs.
Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.
Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representations, are usually overlooked. In this paper, we propose a novel graph pooling operator, called Hierarchical Graph Pooling with Structure Learning (HGP-SL), which can be integrated into various graph neural network architectures. HGP-SL incorporates graph pooling and structure learning into a unified module to generate hierarchical representations of graphs. More specifically, the graph pooling operation adaptively selects a subset of nodes to form an induced subgraph for the subsequent layers. To preserve the integrity of graph's topological information, we further introduce a structure learning mechanism to learn a refined graph structure for the pooled graph at each layer. By combining HGP-SL operator with graph neural networks, we perform graph level representation learning with focus on graph classification task. Experimental results on six widely used benchmarks demonstrate the effectiveness of our proposed model.