The ability to manipulate logical-mathematical symbols (LMS), encompassing tasks such as calculation, reasoning, and programming, is a cognitive skill arguably unique to humans. Considering the relatively recent emergence of this ability in human evolutionary history, it has been suggested that LMS processing may build upon more fundamental cognitive systems, possibly through neuronal recycling. Previous studies have pinpointed two primary candidates, natural language processing and spatial cognition. Existing comparisons between these domains largely relied on task-level comparison, which may be confounded by task idiosyncrasy. The present study instead compared the neural correlates at the domain level with both automated meta-analysis and synthesized maps based on three representative LMS tasks, reasoning, calculation, and mental programming. Our results revealed a more substantial cortical overlap between LMS processing and spatial cognition, in contrast to language processing. Furthermore, in regions activated by both spatial and language processing, the multivariate activation pattern for LMS processing exhibited greater multivariate similarity to spatial cognition than to language processing. A hierarchical clustering analysis further indicated that typical LMS tasks were indistinguishable from spatial cognition tasks at the neural level, suggesting an inherent connection between these two cognitive processes. Taken together, our findings support the hypothesis that spatial cognition is likely the basis of LMS processing, which may shed light on the limitations of large language models in logical reasoning, particularly those trained exclusively on textual data without explicit emphasis on spatial content.
Modern methods of graph theory describe a graph up to isomorphism, which makes it difficult to create mathematical models for visualizing graph drawings on a plane. The topological drawing of the planar part of a graph allows representing the planarization process by algebraic methods, without making any geometric constructions on the plane. Constructing a rotation of graph vertices solves two most important problems of graph theory simultaneously: the problem of testing a graph for planarity and the problem of constructing a topological drawing of a planar graph. It is shown that the problem of constructing a drawing of a non-planar graph can be reduced to the problem of constructing a drawing of a planar graph, taking into account the introduction of additional vertices characterizing the intersection of edges. Naturally, the development of such a mathematical structure will make it possible to solve the following important problems of graph theory: testing the planarity of a graph, identifying the largest planar subgraph of a graph, determining the thickness of a graph, obtaining a graph with a minimum number of intersections, etc.
Choice constructs are an important part of the language of logic programming, yet the study of their semantics has been a challenging task. So far, only two-valued semantics have been studied, and the different proposals for such semantics have not been compared in a principled way. In this paper, an operator-based framework allow for the definition and comparison of different semantics in a principled way is proposed.
Receiver operating characteristic (ROC) analysis is a tool to evaluate the capacity of a numeric measure to distinguish between groups, often employed in the evaluation of diagnostic tests. Overall classification ability is sometimes crudely summarized by a single numeric measure such as the area under the empirical ROC curve. However, it may also be of interest to estimate the full ROC curve while leveraging assumptions regarding the nature of the data (parametric) or about the ROC curve directly (semiparametric). Although there has been recent interest in methods to conduct comparisons by way of stochastic ordering, nuances surrounding ROC geometry and estimation are not widely known in the broader scientific and statistical community. The overarching goals of this manuscript are to (1) provide an overview of existing frameworks for ROC curve estimation with examples, (2) offer intuition for and considerations regarding methodological trade-offs, and (3) supply sample R code to guide implementation. We utilize simulations to demonstrate the bias-variance trade-off across various methods. As an illustrative example, we analyze data from a recent cohort study in order to compare responses to SARS-CoV-2 vaccination between solid organ transplant recipients and healthy controls.
We propose and implement a protocol for a scalable, cost-effective, information-theoretically secure key distribution and management system. The system, called Distributed Symmetric Key Establishment (DSKE), relies on pre-shared random numbers between DSKE clients and a group of Security Hubs. Any group of DSKE clients can use the DSKE protocol to distill from the pre-shared numbers a secret key. The clients are protected from Security Hub compromise via a secret sharing scheme that allows the creation of the final key without the need to trust individual Security Hubs. Precisely, if the number of compromised Security Hubs does not exceed a certain threshold, confidentiality is guaranteed to DSKE clients and, at the same time, robustness against denial-of-service (DoS) attacks. The DSKE system can be used for quantum-secure communication, can be easily integrated into existing network infrastructures, and can support arbitrary groups of communication parties that have access to a key. We discuss the high-level protocol, analyze its security, including its robustness against disruption. A proof-of-principle demonstration of secure communication between two distant clients with a DSKE-based VPN using Security Hubs on Amazon Web Server (AWS) nodes thousands of kilometres away from them was performed, demonstrating the feasibility of DSKE-enabled secret sharing one-time-pad encryption with a data rate above 50 Mbit/s and a latency below 70 ms.
Applying differential privacy (DP) by means of the DP-SGD algorithm to protect individual data points during training is becoming increasingly popular in NLP. However, the choice of granularity at which DP is applied is often neglected. For example, neural machine translation (NMT) typically operates on the sentence-level granularity. From the perspective of DP, this setup assumes that each sentence belongs to a single person and any two sentences in the training dataset are independent. This assumption is however violated in many real-world NMT datasets, e.g. those including dialogues. For proper application of DP we thus must shift from sentences to entire documents. In this paper, we investigate NMT at both the sentence and document levels, analyzing the privacy/utility trade-off for both scenarios, and evaluating the risks of not using the appropriate privacy granularity in terms of leaking personally identifiable information (PII). Our findings indicate that the document-level NMT system is more resistant to membership inference attacks, emphasizing the significance of using the appropriate granularity when working with DP.
Areas of computational mechanics such as uncertainty quantification and optimization usually involve repeated evaluation of numerical models that represent the behavior of engineering systems. In the case of complex nonlinear systems however, these models tend to be expensive to evaluate, making surrogate models quite valuable. Artificial neural networks approximate systems very well by taking advantage of the inherent information of its given training data. In this context, this paper investigates the improvement of the training process by including sensitivity information, which are partial derivatives w.r.t. inputs, as outlined by Sobolev training. In computational mechanics, sensitivities can be applied to neural networks by expanding the training loss function with additional loss terms, thereby improving training convergence resulting in lower generalisation error. This improvement is shown in two examples of linear and non-linear material behavior. More specifically, the Sobolev designed loss function is expanded with residual weights adjusting the effect of each loss on the training step. Residual weighting is the given scaling to the different training data, which in this case are response and sensitivities. These residual weights are optimized by an adaptive scheme, whereby varying objective functions are explored, with some showing improvements in accuracy and precision of the general training convergence.
In the statistical literature, as well as in artificial intelligence and machine learning, measures of discrepancy between two probability distributions are largely used to develop measures of goodness-of-fit. We concentrate on quadratic distances, which depend on a non-negative definite kernel. We propose a unified framework for the study of two-sample and k-sample goodness of fit tests based on the concept of matrix distance. We provide a succinct review of the goodness of fit literature related to the use of distance measures, and specifically to quadratic distances. We show that the quadratic distance kernel-based two-sample test has the same functional form with the maximum mean discrepancy test. We develop tests for the $k$-sample scenario, where the two-sample problem is a special case. We derive their asymptotic distribution under the null hypothesis and discuss computational aspects of the test procedures. We assess their performance, in terms of level and power, via extensive simulations and a real data example. The proposed framework is implemented in the QuadratiK package, available in both R and Python environments.
Sharpness is an almost generic assumption in continuous optimization that bounds the distance from minima by objective function suboptimality. It facilitates the acceleration of first-order methods through restarts. However, sharpness involves problem-specific constants that are typically unknown, and restart schemes typically reduce convergence rates. Moreover, these schemes are challenging to apply in the presence of noise or with approximate model classes (e.g., in compressive imaging or learning problems), and they generally assume that the first-order method used produces feasible iterates. We consider the assumption of approximate sharpness, a generalization of sharpness that incorporates an unknown constant perturbation to the objective function error. This constant offers greater robustness (e.g., with respect to noise or relaxation of model classes) for finding approximate minimizers. By employing a new type of search over the unknown constants, we design a restart scheme that applies to general first-order methods and does not require the first-order method to produce feasible iterates. Our scheme maintains the same convergence rate as when the constants are known. The convergence rates we achieve for various first-order methods match the optimal rates or improve on previously established rates for a wide range of problems. We showcase our restart scheme in several examples and highlight potential future applications and developments of our framework and theory.
We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-accuracy trade-off for the diffusion models, which is a trade-off relationship between the speed and accuracy of data generation in diffusion models. Our result implies that the entropy production rate in the forward process affects the errors in data generation. From a stochastic thermodynamic perspective, our results provide quantitative insight into how best to generate data in diffusion models. The optimal learning protocol is introduced by the conservative force in stochastic thermodynamics and the geodesic of space by the 2-Wasserstein distance in optimal transport theory. We numerically illustrate the validity of the speed-accuracy trade-off for the diffusion models with different noise schedules such as the cosine schedule, the conditional optimal transport, and the optimal transport.
New differential-recurrence relations for B-spline basis functions are given. Using these relations, a recursive method for finding the Bernstein-B\'{e}zier coefficients of B-spline basis functions over a single knot span is proposed. The algorithm works for any knot sequence and has an asymptotically optimal computational complexity. Numerical experiments show that the new method gives results which preserve a high number of digits when compared to an approach which uses the well-known de Boor-Cox formula.