We show that completeness at higher levels of the theory of the reals is a robust notion (under changing the signature and bounding the domain of the quantifiers). This mends recognized gaps in the hierarchy, and leads to stronger completeness results for various computational problems. We exhibit several families of complete problems which can be used for future completeness results in the real hierarchy. As an application we answer an open question by Jungeblut, Kleist, and Miltzow on the complexity of two semialgebraic sets having Hausdorff distance $0$, and sharpen some results by Burgisser and Cucker.
The use of counterfactual explanations (CFXs) is an increasingly popular explanation strategy for machine learning models. However, recent studies have shown that these explanations may not be robust to changes in the underlying model (e.g., following retraining), which raises questions about their reliability in real-world applications. Existing attempts towards solving this problem are heuristic, and the robustness to model changes of the resulting CFXs is evaluated with only a small number of retrained models, failing to provide exhaustive guarantees. To remedy this, we propose {\Delta}-robustness, the first notion to formally and deterministically assess the robustness (to model changes) of CFXs for neural networks. We introduce an abstraction framework based on interval neural networks to verify the {\Delta}-robustness of CFXs against a possibly infinite set of changes to the model parameters, i.e., weights and biases. We then demonstrate the utility of this approach in two distinct ways. First, we analyse the {\Delta}-robustness of a number of CFX generation methods from the literature and show that they unanimously host significant deficiencies in this regard. Second, we demonstrate how embedding {\Delta}-robustness within existing methods can provide CFXs which are provably robust.
The behavior of predictive algorithms built on data generated by a prejudiced human decision-maker is a prominent concern in the sphere of algorithmic bias. We consider the setting of a statistical and taste-based discriminator screening members of a disadvantaged group. We suppose one of two algorithms are used to score individuals: the algorithm $s_1$ favors disadvantaged individuals while the algorithm $s_2$ exemplifies the group-based prejudice in the training data set. Abstracting away from the estimation problem, we instead evaluate which of the two algorithms the discriminator prefers by using a version of regret loss generated by an algorithm. We define the notion of a regular and irregular environment and give theoretical guarantees on the firm's preferences in either case. Our main result shows that in a regular environment, greater levels of prejudice lead firms to prefer $s_2$ over $s_1$ on average. In particular, we prove the almost sure existence of a unique level of prejudice where a firm prefers $s_2$ over $s_1$ for any greater level of prejudice. Conversely, in irregular environments, the firm prefers $s_2$ for all $\tau$ almost surely.
Real-world machine learning applications often involve deploying neural networks to domains that are not seen in the training time. Hence, we need to understand the extrapolation of nonlinear models -- under what conditions on the distributions and function class, models can be guaranteed to extrapolate to new test distributions. The question is very challenging because even two-layer neural networks cannot be guaranteed to extrapolate outside the support of the training distribution without further assumptions on the domain shift. This paper makes some initial steps toward analyzing the extrapolation of nonlinear models for structured domain shift. We primarily consider settings where the marginal distribution of each coordinate of the data (or subset of coordinates) does not shift significantly across the training and test distributions, but the joint distribution may have a much bigger shift. We prove that the family of nonlinear models of the form $f(x)=\sum f_i(x_i)$, where $f_i$ is an arbitrary function on the subset of features $x_i$, can extrapolate to unseen distributions, if the covariance of the features is well-conditioned. To the best of our knowledge, this is the first result that goes beyond linear models and the bounded density ratio assumption, even though the assumptions on the distribution shift and function class are stylized.
This paper makes 3 contributions. First, it generalizes the Lindeberg\textendash Feller and Lyapunov Central Limit Theorems to Hilbert Spaces by way of $L^2$. Second, it generalizes these results to spaces in which sample failure and missingness can occur. Finally, it shows that satisfaction of the Lindeberg\textendash Feller and Lyapunov Conditions in such spaces implies the satisfaction of the conditions in the completely observed space, and how this guarantees the consistency of inferences from the partial functional data. These latter two results are especially important given the increasing attention to statistical inference with partially observed functional data. This paper goes beyond previous research by providing simple boundedness conditions which guarantee that \textit{all} inferences, as opposed to some proper subset of them, will be consistently estimated. This is shown primarily by aggregating conditional expectations with respect to the space of missingness patterns. This paper appears to be the first to apply this technique.
The influence of the social relationships of an individual on the individual's opinions (about a topic, a product, or whatever else) is a well known phenomenon and it has been widely studied. This paper considers a network of positive (i.e. trusting) or negative (distrusting) social relationships where every individual has an initial positive or negative opinion (about a topic, a product, or whatever else) that changes over time, at discrete time-steps, due to the influences each individual gets from its neighbors. Here, the influence of a trusted neighbor is consistent with the neighbor's opinion, while the influence of an untrusted neighbor is opposite to the neighbor's opinion. This extended abstract introduces the local threshold-based opinion dynamics and, after stating the computational complexity of some natural reachability problems arising in this setting when individuals change their opinions according to the opinions of the majority of their neighbors, proves an upper bound on the number of opinion configurations met by a symmetric positive-only relationships network evolving according to any of such models, which is polynomial in the size of the network. This generalizes a result in [Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Isma\"el Jecker, and Jakub Svoboda, "Simplified Game of Life: Algorithms and Complexity", 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)]
In sparse estimation, in which the sum of the loss function and the regularization term is minimized, methods such as the proximal gradient method and the proximal Newton method are applied. The former is slow to converge to a solution, while the latter converges quickly but is inefficient for problems such as group lasso problems. In this paper, we examine how to efficiently find a solution by finding the convergence destination of the proximal gradient method. However, the case in which the Lipschitz constant of the derivative of the loss function is unknown has not been studied theoretically, and only the Newton method has been proposed for the case in which the Lipschitz constant is known. We show that the Newton method converges when the Lipschitz constant is unknown and extend the theory. Furthermore, we propose a new quasi-Newton method that avoids Hessian calculations and improves efficiency, and we prove that it converges quickly, providing a theoretical guarantee. Finally, numerical experiments show that the proposed method can significantly improve the efficiency.
Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to stationarity for these methods applied to solve different classes of RL problems under exact policy evaluation. We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces. To the best of our knowledge, the development of these algorithmic frameworks as well as their convergence analysis appear to be new in the literature.
Two combined numerical methods for solving time-varying semilinear differential-algebraic equations (DAEs) are obtained. These equations are also called degenerate DEs, descriptor systems, operator-differential equations and DEs on manifolds. The convergence and correctness of the methods are proved. When constructing methods we use, in particular, time-varying spectral projectors which can be numerically found. This enables to numerically solve and analyze the considered DAE in the original form without additional analytical transformations. To improve the accuracy of the second method, recalculation (a ``predictor-corrector'' scheme) is used. Note that the developed methods are applicable to the DAEs with the continuous nonlinear part which may not be continuously differentiable in $t$, and that the restrictions of the type of the global Lipschitz condition, including the global condition of contractivity, are not used in the theorems on the global solvability of the DAEs and on the convergence of the numerical methods. This enables to use the developed methods for the numerical solution of more general classes of mathematical models. For example, the functions of currents and voltages in electric circuits may not be differentiable or may be approximated by nondifferentiable functions. Presented conditions for the global solvability of the DAEs ensure the existence of an unique exact global solution for the corresponding initial value problem, which enables to compute approximate solutions on any given time interval (provided that the conditions of theorems or remarks on the convergence of the methods are fulfilled). In the paper, the numerical analysis of the mathematical model for a certain electrical circuit, which demonstrates the application of the presented theorems and numerical methods, is carried out.
There are existing standard solvers for tackling discrete optimization problems. However, in practice, it is uncommon to apply them directly to the large input space typical of this class of problems. Rather, the input is preprocessed to look for simplifications and to extract the core subset of the problem space, which is called the Kernel. This pre-processing procedure is known in the context of parameterized complexity theory as Kernelization. In this thesis, I implement parallel versions of some Kernelization algorithms and evaluate their performance. The performance of Kernelization algorithms is measured either by the size of the output Kernel or by the time it takes to compute the kernel. Sometimes the Kernel is the same as the original input, so it is desirable to know this, as soon as possible. The problem scope is limited to a particular type of discrete optimisation problem which is a version of the K-clique problem in which nodes of the given graph are pre-coloured legally using k colours. The final evaluation shows that my parallel implementations achieve over 50% improvement in efficiency for at least one of these algorithms. This is attained not just in terms of speed, but it is also able to produce a smaller kernel.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.