We consider the additional entropy production (EP) incurred by a fixed quantum or classical process on some initial state $\rho$, above the minimum EP incurred by the same process on any initial state. We show that this additional EP, which we term the "mismatch cost of $\rho$", has a universal information-theoretic form: it is given by the contraction of the relative entropy between $\rho$ and the least-dissipative initial state $\varphi$ over time. We derive versions of this result for integrated EP incurred over the course of a process, for trajectory-level fluctuating EP, and for instantaneous EP rate. We also show that mismatch cost for fluctuating EP obeys an integral fluctuation theorem. Our results demonstrate a fundamental relationship between "thermodynamic irreversibility" (generation of EP) and "logical irreversibility" (inability to know the initial state corresponding to a given final state). We use this relationship to derive quantitative bounds on the thermodynamics of quantum error correction and to propose a thermodynamically-operationalized measure of the logical irreversibility of a quantum channel. Our results hold for both finite and infinite dimensional systems, and generalize beyond EP to many other thermodynamic costs, including nonadiabatic EP, free energy loss, and entropy gain.
Due to the beyond-classical capability of quantum computing, quantum machine learning is applied independently or embedded in classical models for decision making, especially in the field of finance. Fairness and other ethical issues are often one of the main concerns in decision making. In this work, we define a formal framework for the fairness verification and analysis of quantum machine learning decision models, where we adopt one of the most popular notions of fairness in the literature based on the intuition -- any two similar individuals must be treated similarly and are thus unbiased. We show that quantum noise can improve fairness and develop an algorithm to check whether a (noisy) quantum machine learning model is fair. In particular, this algorithm can find bias kernels of quantum data (encoding individuals) during checking. These bias kernels generate infinitely many bias pairs for investigating the unfairness of the model. Our algorithm is designed based on a highly efficient data structure -- Tensor Networks -- and implemented on Google's TensorFlow Quantum. The utility and effectiveness of our algorithm are confirmed by the experimental results, including income prediction and credit scoring on real-world data, for a class of random (noisy) quantum decision models with 27 qubits ($2^{27}$-dimensional state space) tripling ($2^{18}$ times more than) that of the state-of-the-art algorithms for verifying quantum machine learning models.
Subset-Sum is an NP-complete problem where one must decide if a multiset of $n$ integers contains a subset whose elements sum to a target value $m$. The best-known classical and quantum algorithms run in time $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{n/3})$, respectively, based on the well-known meet-in-the-middle technique. Here we introduce a novel classical dynamic-programming-based data structure with applications to Subset-Sum and a number of variants, including Equal-Sums (where one seeks two disjoint subsets with the same sum), 2-Subset-Sum (a relaxed version of Subset-Sum where each item in the input set can be used twice in the summation), and Shifted-Sums, a generalization of both of these variants, where one seeks two disjoint subsets whose sums differ by some specified value. Given any modulus $p$, our data structure can be constructed in time $O(n^2p)$, after which queries can be made in time $O(n^2)$ to the lists of subsets summing to any value modulo $p$. We use this data structure in combination with variable-time amplitude amplification and a new quantum pair finding algorithm, extending the quantum claw finding algorithm to the multiple solutions case, to give an $O(2^{0.504n})$ quantum algorithm for Shifted-Sums, an improvement on the best-known $O(2^{0.773n})$ classical running time. Incidentally, we obtain new $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{n/3})$ classical and quantum algorithms for Subset-Sum, not based on the seminal meet-in-the-middle method. We also study Pigeonhole Equal-Sums and Pigeonhole Modular Equal-Sums, where the existence of a solution is guaranteed by the pigeonhole principle. For the former problem, we give faster classical and quantum algorithms with running time $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{2n/5})$, respectively. For the more general modular problem, we give a classical algorithm that also runs in time $\tilde{O}(2^{n/2})$.
We present a sampling-based control approach that can generate smooth actions for general nonlinear systems without external smoothing algorithms. Model Predictive Path Integral (MPPI) control has been utilized in numerous robotic applications due to its appealing characteristics to solve non-convex optimization problems. However, the stochastic nature of sampling-based methods can cause significant chattering in the resulting commands. Chattering becomes more prominent in cases where the environment changes rapidly, possibly even causing the MPPI to diverge. To address this issue, we propose a method that seamlessly combines MPPI with an input-lifting strategy. In addition, we introduce a new action cost to smooth control sequence during trajectory rollouts while preserving the information theoretic interpretation of MPPI, which was derived from non-affine dynamics. We validate our method in two nonlinear control tasks with neural network dynamics: a pendulum swing-up task and a challenging autonomous driving task. The experimental results demonstrate that our method outperforms the MPPI baselines with additionally applied smoothing algorithms.
Our theoretical understanding of deep learning has not kept pace with its empirical success. While network architecture is known to be critical, we do not yet understand its effect on learned representations and network behavior, or how this architecture should reflect task structure.In this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. Crucially, because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our analysis demonstrates that the learning dynamics in structured networks can be conceptualized as a neural race with an implicit bias towards shared representations, which then govern the model's ability to systematically generalize, multi-task, and transfer. We validate our key insights on naturalistic datasets and with relaxed assumptions. Taken together, our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures and the role of modularity and compositionality in solving real-world problems. The code and results are available at //www.saxelab.org/gated-dln .
In cell line perturbation experiments, a collection of cells is perturbed with external agents (e.g. drugs) and responses such as protein expression measured. Due to cost constraints, only a small fraction of all possible perturbations can be tested in vitro. This has led to the development of computational (in silico) models which can predict cellular responses to perturbations. Perturbations with clinically interesting predicted responses can be prioritized for in vitro testing. In this work, we compare causal and non-causal regression models for perturbation response prediction in a Melanoma cancer cell line. The current best performing method on this data set is Cellbox which models how proteins causally effect each other using a system of ordinary differential equations (ODEs). We derive a closed form solution to the Cellbox system of ODEs in the linear case. These analytic results facilitate comparison of Cellbox to regression approaches. We show that causal models such as Cellbox, while requiring more assumptions, enable extrapolation in ways that non-causal regression models cannot. For example, causal models can predict responses for never before tested drugs. We illustrate these strengths and weaknesses in simulations. In an application to the Melanoma cell line data, we find that regression models outperform the Cellbox causal model.
Importance sampling (IS) is valuable in reducing the variance of Monte Carlo sampling for many areas, including finance, rare event simulation, and Bayesian inference. It is natural and obvious to combine quasi-Monte Carlo (QMC) methods with IS to achieve a faster rate of convergence. However, a naive replacement of Monte Carlo with QMC may not work well. This paper investigates the convergence rates of randomized QMC-based IS for estimating integrals with respect to a Gaussian measure, in which the IS measure is a Gaussian or $t$ distribution. We prove that if the target function satisfies the so-called boundary growth condition and the covariance matrix of the IS density has eigenvalues no smaller than 1, then randomized QMC with the Gaussian proposal has a root mean squared error of $O(N^{-1+\epsilon})$ for arbitrarily small $\epsilon>0$. Similar results of $t$ distribution as the proposal are also established. These sufficient conditions help to assess the effectiveness of IS in QMC. For some particular applications, we find that the Laplace IS, a very general approach to approximate the target function by a quadratic Taylor approximation around its mode, has eigenvalues smaller than 1, making the resulting integrand less favorable for QMC. From this point of view, when using Gaussian distributions as the IS proposal, a change of measure via Laplace IS may transform a favorable integrand into unfavorable one for QMC although the variance of Monte Carlo sampling is reduced. We also give some examples to verify our propositions and warn against naive replacement of MC with QMC under IS proposals. Numerical results suggest that using Laplace IS with $t$ distributions is more robust than that with Gaussian distributions.
In model extraction attacks, adversaries can steal a machine learning model exposed via a public API by repeatedly querying it and adjusting their own model based on obtained predictions. To prevent model stealing, existing defenses focus on detecting malicious queries, truncating, or distorting outputs, thus necessarily introducing a tradeoff between robustness and model utility for legitimate users. Instead, we propose to impede model extraction by requiring users to complete a proof-of-work before they can read the model's predictions. This deters attackers by greatly increasing (even up to 100x) the computational effort needed to leverage query access for model extraction. Since we calibrate the effort required to complete the proof-of-work to each query, this only introduces a slight overhead for regular users (up to 2x). To achieve this, our calibration applies tools from differential privacy to measure the information revealed by a query. Our method requires no modification of the victim model and can be applied by machine learning practitioners to guard their publicly exposed models against being easily stolen.
In our time cybersecurity has grown to be a topic of massive proportion at the national and enterprise levels. Our thesis is that the economic perspective and investment decision-making are vital factors in determining the outcome of the struggle. To build our economic framework, we borrow from the pioneering work of Gordon and Loeb in which the Defender optimally trades-off investments for lower likelihood of its system breach. Our two-sided model additionally has an Attacker, assumed to be rational and also guided by economic considerations in its decision-making, to which the Defender responds. Our model is a simplified adaptation of a model proposed during the Cold War for weapons deployment in the US. Our model may also be viewed as a Stackelberg game and, from an analytic perspective, as a Max-Min problem, the analysis of which is known to have to contend with discontinuous behavior. The complexity of our simple model is rooted in its inherent nonlinearity and, more consequentially, non-convexity of the objective function in the optimization. The possibilities of the Attacker's actions add substantially to the risk to the Defender, and the Defender's rational, risk-neutral optimal investments in general substantially exceed the optimal investments predicted by the one-sided Gordon-Loeb model. We obtain a succinct set of three decision types that categorize all of the Defender's optimal investment decisions. Also, the Defender's optimal decisions exhibit discontinuous behavior as the initial vulnerability of its system is varied. The analysis is supplemented by extensive numerical illustrations. The results from our model open several major avenues for future work.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Recent years have witnessed the enormous success of low-dimensional vector space representations of knowledge graphs to predict missing facts or find erroneous ones. Currently, however, it is not yet well-understood how ontological knowledge, e.g. given as a set of (existential) rules, can be embedded in a principled way. To address this shortcoming, in this paper we introduce a framework based on convex regions, which can faithfully incorporate ontological knowledge into the vector space embedding. Our technical contribution is two-fold. First, we show that some of the most popular existing embedding approaches are not capable of modelling even very simple types of rules. Second, we show that our framework can represent ontologies that are expressed using so-called quasi-chained existential rules in an exact way, such that any set of facts which is induced using that vector space embedding is logically consistent and deductively closed with respect to the input ontology.