Random forests are a popular class of algorithms used for regression and classification. The algorithm introduced by Breiman in 2001 and many of its variants are ensembles of randomized decision trees built from axis-aligned partitions of the feature space. One such variant, called Mondrian forests, was proposed to handle the online setting and is the first class of random forests for which minimax rates were obtained in arbitrary dimension. However, the restriction to axis-aligned splits fails to capture dependencies between features, and random forests that use oblique splits have shown improved empirical performance for many tasks. In this work, we show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension. This class includes STIT forests, a generalization of Mondrian forests to arbitrary split directions, as well as random forests derived from Poisson hyperplane tessellations. These are the first results showing that random forest variants with oblique splits can obtain minimax optimality in arbitrary dimension. Our proof technique relies on the novel application of the theory of stationary random tessellations in stochastic geometry to statistical learning theory.
Modeling the trajectories of animals is challenging due to the complexity of their behaviors, the influence of unpredictable environmental factors, individual variability, and the lack of detailed data on their movements. Additionally, factors such as migration, hunting, reproduction, and social interactions add additional layers of complexity when attempting to accurately forecast their movements. In the literature, various models exits that aim to study animal telemetry, by modeling the velocity of the telemetry, the telemetry itself or both processes jointly through a Markovian process. In this work, we propose to model the velocity of each coordinate axis for animal telemetry data as a fractional Ornstein-Uhlenbeck (fOU) process. Then, the integral fOU process models position data in animal telemetry. Compared to traditional methods, the proposed model is flexible in modeling long-range memory. The Hurst parameter $H \in (0,1)$ is a crucial parameter in integral fOU process, as it determines the degree of dependence or long-range memory. The integral fOU process is nonstationary process. In addition, a higher Hurst parameter ($H > 0.5$) indicates a stronger memory, leading to trajectories with transient trends, while a lower Hurst parameter ($H < 0.5$) implies a weaker memory, resulting in trajectories with recurring trends. When H = 0.5, the process reduces to a standard integral Ornstein-Uhlenbeck process. We develop a fast simulation algorithm of telemetry trajectories using an approach via finite-dimensional distributions. We also develop a maximum likelihood method for parameter estimation and its performance is examined by simulation studies. Finally, we present a telemetry application of Fin Whales that disperse over the Gulf of Mexico.
Stochastic programs where the uncertainty distribution must be inferred from noisy data samples are considered. The stochastic programs are approximated with distributionally-robust optimizations that minimize the worst-case expected cost over ambiguity sets, i.e., sets of distributions that are sufficiently compatible with the observed data. In this paper, the ambiguity sets capture the set of probability distributions whose convolution with the noise distribution remains within a ball centered at the empirical noisy distribution of data samples parameterized by the total variation distance. Using the prescribed ambiguity set, the solutions of the distributionally-robust optimizations converge to the solutions of the original stochastic programs when the numbers of the data samples grow to infinity. Therefore, the proposed distributionally-robust optimization problems are asymptotically consistent. This is proved under the assumption that the distribution of the noise is uniformly diagonally dominant. More importantly, the distributionally-robust optimization problems can be cast as tractable convex optimization problems and are therefore amenable to large-scale stochastic problems.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Graph Neural Networks (GNNs) are widely used for analyzing graph-structured data. Most GNN methods are highly sensitive to the quality of graph structures and usually require a perfect graph structure for learning informative embeddings. However, the pervasiveness of noise in graphs necessitates learning robust representations for real-world problems. To improve the robustness of GNN models, many studies have been proposed around the central concept of Graph Structure Learning (GSL), which aims to jointly learn an optimized graph structure and corresponding representations. Towards this end, in the presented survey, we broadly review recent progress of GSL methods for learning robust representations. Specifically, we first formulate a general paradigm of GSL, and then review state-of-the-art methods classified by how they model graph structures, followed by applications that incorporate the idea of GSL in other graph tasks. Finally, we point out some issues in current studies and discuss future directions.
Interest in the field of Explainable Artificial Intelligence has been growing for decades and has accelerated recently. As Artificial Intelligence models have become more complex, and often more opaque, with the incorporation of complex machine learning techniques, explainability has become more critical. Recently, researchers have been investigating and tackling explainability with a user-centric focus, looking for explanations to consider trustworthiness, comprehensibility, explicit provenance, and context-awareness. In this chapter, we leverage our survey of explanation literature in Artificial Intelligence and closely related fields and use these past efforts to generate a set of explanation types that we feel reflect the expanded needs of explanation for today's artificial intelligence applications. We define each type and provide an example question that would motivate the need for this style of explanation. We believe this set of explanation types will help future system designers in their generation and prioritization of requirements and further help generate explanations that are better aligned to users' and situational needs.
Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Besides, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.
In recent years, Graph Neural Networks (GNNs), which can naturally integrate node information and topological structure, have been demonstrated to be powerful in learning on graph data. These advantages of GNNs provide great potential to advance social recommendation since data in social recommender systems can be represented as user-user social graph and user-item graph; and learning latent factors of users and items is the key. However, building social recommender systems based on GNNs faces challenges. For example, the user-item graph encodes both interactions and their associated opinions; social relations have heterogeneous strengths; users involve in two graphs (e.g., the user-user social graph and the user-item graph). To address the three aforementioned challenges simultaneously, in this paper, we present a novel graph neural network framework (GraphRec) for social recommendations. In particular, we provide a principled approach to jointly capture interactions and opinions in the user-item graph and propose the framework GraphRec, which coherently models two graphs and heterogeneous strengths. Extensive experiments on two real-world datasets demonstrate the effectiveness of the proposed framework GraphRec. Our code is available at \url{//github.com/wenqifan03/GraphRec-WWW19}
Cold-start problems are long-standing challenges for practical recommendations. Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenarios with few interactions. This paper addresses such problems using few-shot learning and meta learning. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. To accomplish this, we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to-end framework, namely Scenario-specific Sequential Meta learner (or s^2 meta). By doing so, our meta-learner produces a generic initial model through aggregating contextual information from a variety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge. Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation. Deployment is at the Guess You Like session, the front page of the Mobile Taobao.
Text in natural images is of arbitrary orientations, requiring detection in terms of oriented bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text presence detection, which is a classification problem disregarding text orientation; 2) oriented bounding box regression, which concerns about text orientation. Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks. To address this issue, we propose to perform classification and regression on features of different characteristics, extracted by two network branches of different designs. Concretely, the regression branch extracts rotation-sensitive features by actively rotating the convolutional filters, while the classification branch extracts rotation-invariant features by pooling the rotation-sensitive features. The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on three oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17 and COCO-Text. Furthermore, RRD achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.
Semantic Role Labeling (SRL) is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end SRL with recurrent neural networks (RNN) has gained increasing attention. However, it remains a major challenge for RNNs to handle structural information and long range dependencies. In this paper, we present a simple and effective architecture for SRL which aims to address these problems. Our model is based on self-attention which can directly capture the relationships between two tokens regardless of their distance. Our single model achieves F$_1=83.4$ on the CoNLL-2005 shared task dataset and F$_1=82.7$ on the CoNLL-2012 shared task dataset, which outperforms the previous state-of-the-art results by $1.8$ and $1.0$ F$_1$ score respectively. Besides, our model is computationally efficient, and the parsing speed is 50K tokens per second on a single Titan X GPU.