Hybrid systems theorem proving provides strong correctness guarantees about the interacting discrete and continuous dynamics of cyber-physical systems. The trustworthiness of proofs rests on the soundness of the proof calculus and its correct implementation in a theorem prover. Correctness is easier to achieve with a soundness-critical core that is stripped to the bare minimum, but, as a consequence, proof convenience has to be regained outside the soundness-critical core with proof management techniques. We present modeling and proof management techniques that are built on top of the soundness-critical core of KeYmaera X to enable expanding definitions, parametric proofs, lemmas, and other useful proof techniques in hybrid systems proofs. Our techniques steer the uniform substitution implementation of the differential dynamic logic proof calculus in KeYmaera X to allow users choose when and how in a proof abstract formulas, terms, or programs become expanded to their concrete definitions, and when and how lemmas and sub-proofs are combined to a full proof. The same techniques are exploited in implicit sub-proofs (without making such sub-proofs explicit to the user) to provide proof features, such as temporarily hiding formulas, which are notoriously difficult to get right when implemented in the prover core, but become trustworthy as proof management techniques outside the core. We illustrate our approach with several useful proof techniques and discuss their presentation on the KeYmaera X user interface.
The implicit bias of neural networks has been extensively studied in recent years. Lyu and Li [2019] showed that in homogeneous networks trained with the exponential or the logistic loss, gradient flow converges to a KKT point of the max margin problem in the parameter space. However, that leaves open the question of whether this point will generally be an actual optimum of the max margin problem. In this paper, we study this question in detail, for several neural network architectures involving linear and ReLU activations. Perhaps surprisingly, we show that in many cases, the KKT point is not even a local optimum of the max margin problem. On the flip side, we identify multiple settings where a local or global optimum can be guaranteed. Finally, we answer a question posed in Lyu and Li [2019] by showing that for non-homogeneous networks, the normalized margin may strictly decrease over time.
In this position paper, we would like to offer and defend a new template to study equivalences between programs -- in the particular framework of process algebras for concurrent computation.We believe that our layered model of development will clarify the distinction that is too often left implicit between the tasks and duties of the programmer and of the tester. It will also enlighten pre-existing issues that have been running across process algebras as diverse as the calculus of communicating systems, the $\pi$-calculus -- also in its distributed version -- or mobile ambients.Our distinction starts by subdividing the notion of process itself in three conceptually separated entities, that we call \emph{Processes}, \emph{Systems} and \emph{Tests}. While the role of what can be observed and the subtleties in the definitions of congruences have been intensively studied, the fact that \emph{not every process can be tested}, and that \emph{the tester should have access to a different set of tools than the programmer} is curiously left out, or at least not often formally discussed.We argue that this blind spot comes from the under-specification of contexts -- environments in which comparisons takes place -- that play multiple distinct roles but supposedly always \enquote{stay the same}.We illustrate our statement with a simple Java example, the \enquote{usual} concurrent languages, but also back it up with $\lambda$-calculus and existing implementations of concurrent languages as well.
A ubiquitous learning problem in today's digital market is, during repeated interactions between a seller and a buyer, how a seller can gradually learn optimal pricing decisions based on the buyer's past purchase responses. A fundamental challenge of learning in such a strategic setup is that the buyer will naturally have incentives to manipulate his responses in order to induce more favorable learning outcomes for him. To understand the limits of the seller's learning when facing such a strategic and possibly manipulative buyer, we study a natural yet powerful buyer manipulation strategy. That is, before the pricing game starts, the buyer simply commits to "imitate" a different value function by pretending to always react optimally according to this imitative value function. We fully characterize the optimal imitative value function that the buyer should imitate as well as the resultant seller revenue and buyer surplus under this optimal buyer manipulation. Our characterizations reveal many useful insights about what happens at equilibrium. For example, a seller with concave production cost will obtain essentially 0 revenue at equilibrium whereas the revenue for a seller with convex production cost is the Bregman divergence of her cost function between no production and certain production. Finally, and importantly, we show that a more powerful class of pricing schemes does not necessarily increase, in fact, may be harmful to, the seller's revenue. Our results not only lead to an effective prescriptive way for buyers to manipulate learning algorithms but also shed lights on the limits of what a seller can really achieve when pricing in the dark.
Circular (or cyclic) proofs have received increasing attention in recent years, and have been proposed as an alternative setting for studying (co)inductive reasoning. In particular, now several type systems based on circular reasoning have been proposed. However, little is known about the complexity theoretic aspects of circular proofs, which exhibit sophisticated loop structures atypical of more common 'recursion schemes'. This paper attempts to bridge the gap between circular proofs and implicit computational complexity. Namely we introduce a circular proof system based on Bellantoni and Cook's famous safe-normal function algebra, and we identify suitable proof theoretical constraints to characterise the polynomial-time and elementary computable functions.
In this paper, we use the first-order virtual element method (VEM) to investigate the effect of shape quality of polyhedra in the estimation of the critical time step for explicit three-dimensional elastodynamic finite element simulations. Low-quality finite elements are common when meshing realistic complex components, and while tetrahedral meshing technology is generally robust, meshing algorithms cannot guarantee high-quality meshes for arbitrary geometries or for non-water-tight computer-aided design models. For reliable simulations on such meshes, we consider finite element meshes with tetrahedral and prismatic elements that have badly-shaped elements$-$tetrahedra with dihedral angles close to $0^\circ$ and $180^\circ$, and slender prisms with triangular faces that have short edges$-$and agglomerate such `bad' elements with neighboring elements to form a larger polyhedral virtual element. On each element of the mesh, the element-eigenvalue inequality is used to obtain an estimate of the critical time step. For a suite of illustrative finite element meshes with $\epsilon$ being a mesh-coordinate parameter that leads to poor mesh quality, we show that adopting VEM on the agglomerated polyhedra yield critical time steps that are insensitive as $\epsilon \rightarrow 0$. This study shows the promise of virtual element technology for reliable explicit finite element elastodynamic simulations on low-quality three-dimensional meshes.
Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we characterize the missingness mechanisms in different real-world scenarios. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data. Our theoretical results imply that many of these algorithms can not guarantee fairness in practice. Modeling missingness also helps to identify correct design principles for fair algorithms. For example, in multi-stage settings where decisions are made in multiple screening rounds, we use our framework to derive the minimal distributions required to design a fair algorithm. Our proposed algorithm decentralizes the decision-making process and still achieves similar performance to the optimal algorithm that requires centralization and non-recoverable distributions.
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.
Generative adversarial nets (GANs) have generated a lot of excitement. Despite their popularity, they exhibit a number of well-documented issues in practice, which apparently contradict theoretical guarantees. A number of enlightening papers have pointed out that these issues arise from unjustified assumptions that are commonly made, but the message seems to have been lost amid the optimism of recent years. We believe the identified problems deserve more attention, and highlight the implications on both the properties of GANs and the trajectory of research on probabilistic models. We recently proposed an alternative method that sidesteps these problems.
Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.
Recommender systems (RSs) provide an effective way of alleviating the information overload problem by selecting personalized items for different users. Latent factors based collaborative filtering (CF) has become the popular approaches for RSs due to its accuracy and scalability. Recently, online social networks and user-generated content provide diverse sources for recommendation beyond ratings. Although {\em social matrix factorization} (Social MF) and {\em topic matrix factorization} (Topic MF) successfully exploit social relations and item reviews, respectively, both of them ignore some useful information. In this paper, we investigate the effective data fusion by combining the aforementioned approaches. First, we propose a novel model {\em \mbox{MR3}} to jointly model three sources of information (i.e., ratings, item reviews, and social relations) effectively for rating prediction by aligning the latent factors and hidden topics. Second, we incorporate the implicit feedback from ratings into the proposed model to enhance its capability and to demonstrate its flexibility. We achieve more accurate rating prediction on real-life datasets over various state-of-the-art methods. Furthermore, we measure the contribution from each of the three data sources and the impact of implicit feedback from ratings, followed by the sensitivity analysis of hyperparameters. Empirical studies demonstrate the effectiveness and efficacy of our proposed model and its extension.