Spherical polygons used in practice are nice, but the spherical point-in-polygon problem (SPiP) has long eluded solutions based on the winding number (wn). That a punctured sphere is simply connected is to blame. As a workaround, we prove that requiring the boundary of a spherical polygon to never intersect its antipode is sufficient to reduce its SPiP problem to the planar, point-in-polygon (PiP) problem, whose state-of-the-art solution uses wn and does not utilize known interior points (KIP). We refer to such spherical polygons as boundary antipode-excluding (BAE) and show that all spherical polygons fully contained within an open hemisphere is BAE. We document two successful reduction methods, one based on rotation and the other on shearing, and address a common concern. Both reduction algorithms, when combined with a wn-PiP algorithm, solve SPiP correctly and efficiently for BAE spherical polygons. The MATLAB code provided demonstrates scenarios that are problematic for previous work.
Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact (eigenvalue-modified) inverse Hessian. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.
Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between the sides of a given partition of input vertices. Our results reveal that the ability to model interaction is primarily determined by the partition's walk index -- a graph-theoretical characteristic defined by the number of walks originating from the boundary of the partition. Experiments with common GNN architectures corroborate this finding. As a practical application of our theory, we design an edge sparsification algorithm named Walk Index Sparsification (WIS), which preserves the ability of a GNN to model interactions when input edges are removed. WIS is simple, computationally efficient, and in our experiments has markedly outperformed alternative methods in terms of induced prediction accuracy. More broadly, it showcases the potential of improving GNNs by theoretically analyzing the interactions they can model.
We propose a model of treatment interference where the response of a unit depends only on its treatment status and the statuses of units within its K-neighborhood. Current methods for detecting interference include carefully designed randomized experiments and conditional randomization tests on a set of focal units. We give guidance on how to choose focal units under this model of interference. We then conduct a simulation study to evaluate the efficacy of existing methods for detecting network interference. We show that this choice of focal units leads to powerful tests of treatment interference which outperform current experimental methods.
Deception, which includes leading cyber-attackers astray with false information, has shown to be an effective method of thwarting cyber-attacks. There has been little investigation of the effect of probing action costs on adversarial decision-making, despite earlier studies on deception in cybersecurity focusing primarily on variables like network size and the percentage of honeypots utilized in games. Understanding human decision-making when prompted with choices of various costs is essential in many areas such as in cyber security. In this paper, we will use a deception game (DG) to examine different costs of probing on adversarial decisions. To achieve this we utilized an IBLT model and a delayed feedback mechanism to mimic knowledge of human actions. Our results were taken from an even split of deception and no deception to compare each influence. It was concluded that probing was slightly taken less as the cost of probing increased. The proportion of attacks stayed relatively the same as the cost of probing increased. Although a constant cost led to a slight decrease in attacks. Overall, our results concluded that the different probing costs do not have an impact on the proportion of attacks whereas it had a slightly noticeable impact on the proportion of probing.
Fourth order accurate compact schemes for variable coefficient convection-diffusion equations are considered. A sufficient condition for stability of the schemes have been derived using a difference equation based approach. The constant coefficient problems are considered as a special case, and the unconditional stability of compact schemes for such case is proved theoretically. The condition number of the amplification matrix is also analysed, and an estimate for the same is derived. In order to verify the derived conditions numerically, MATLAB codes are provided in Appendix of the manuscript. An example is provided to support the assumption taken to assure stability.
While the accuracy-fairness trade-off has been frequently observed in the literature of fair machine learning, rigorous theoretical analyses have been scarce. To demystify this long-standing challenge, this work seeks to develop a theoretical framework by characterizing the shape of the accuracy-fairness trade-off Pareto frontier (FairFrontier), determined by a set of all optimal Pareto classifiers that no other classifiers can dominate. Specifically, we first demonstrate the existence of the trade-off in real-world scenarios and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. Experimental results on synthetic data suggest insightful findings of the proposed framework: (1) When sensitive attributes can be fully interpreted by non-sensitive attributes, FairFrontier is mostly continuous. (2) Accuracy can suffer a \textit{sharp} decline when over-pursuing fairness. (3) Eliminate the trade-off via a two-step streamlined approach. The proposed research enables an in-depth understanding of the accuracy-fairness trade-off, pushing current fair machine-learning research to a new frontier.
Kleene Algebra (KA) is a useful tool for proving that two programs are equivalent by reasoning equationally. Because it abstracts from the meaning of primitive programs, KA's equational theory is decidable, so it integrates well with interactive theorem provers. This raises the question: which equations can we (not) prove using the laws of KA? Moreover, which models of KA are complete, in the sense that they satisfy exactly the provable equations? Kozen (1994) answered these questions by characterizing KA in terms of its language model. Concretely, equivalences provable in KA are exactly those that hold for regular expressions. Pratt (1980) observed that KA is complete w.r.t. relational models, i.e., that its provable equations are those that hold for any relational interpretation. A less known result due to Palka (2005) says that finite models are complete for KA, i.e., that provable equivalences coincide with equations satisfied by all finite KAs. Phrased contrapositively, the latter is a finite model property (FMP): any unprovable equation is falsified by a finite KA. These results can be argued using Kozen's theorem, but the implication is mutual: given that KA is complete w.r.t. finite (resp. relational) models, Palka's (resp. Pratt's) arguments show that it is complete w.r.t. the language model. We embark on a study of the different complete models of KA, and the connections between them. This yields a fourth result subsuming those of Palka and Pratt, namely that KA is complete w.r.t. finite relational models. Next, we put an algebraic spin on Palka's techniques, which yield an elementary proof of the finite model property, and by extension, of Kozen's and Pratt's theorems. In contrast with earlier approaches, this proof relies not on minimality or bisimilarity of automata, but rather on representing the regular expressions involved in terms of transformation automata.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.