Finding extended hydrodynamics equations valid from the dense gas region to the rarefied gas region remains a great challenge. The key to success is to obtain accurate constitutive relations for stress and heat flux. Recent data-driven models offer a new phenomenological approach to learning constitutive relations from data. Such models enable complex constitutive relations that extend Newton's law of viscosity and Fourier's law of heat conduction, by regression on higher derivatives. However, choices of derivatives in these models are ad-hoc without a clear physical explanation. We investigated data-driven models theoretically on a linear system. We argue that these models are equivalent to non-linear length scale scaling laws of transport coefficients. The equivalence to scaling laws justified the physical plausibility and revealed the limitation of data-driven models. Our argument also points out modeling the scaling law explicitly could avoid practical difficulties in data-driven models like derivative estimation and variable selection on noisy data. We further proposed a constitutive relation model based on scaling law and tested it on the calculation of Rayleigh scattering spectra. The result shows data-driven model has a clear advantage over the Chapman-Enskog expansion and moment methods for the first time.
The recent framework of performative prediction is aimed at capturing settings where predictions influence the target/outcome they want to predict. In this paper, we introduce a natural multi-agent version of this framework, where multiple decision makers try to predict the same outcome. We showcase that such competition can result in interesting phenomena by proving the possibility of phase transitions from stability to instability and eventually chaos. Specifically, we present settings of multi-agent performative prediction where under sufficient conditions their dynamics lead to global stability and optimality. In the opposite direction, when the agents are not sufficiently cautious in their learning/updates rates, we show that instability and in fact formal chaos is possible. We complement our theoretical predictions with simulations showcasing the predictive power of our results.
We develop a novel procedure for estimating the optimizer of general convex stochastic optimization problems of the form $\min_{x\in\mathcal{X}} \mathbb{E}[F(x,\xi)]$, when the given data is a finite independent sample selected according to $\xi$. The procedure is based on a median-of-means tournament, and is the first procedure that exhibits the optimal statistical performance in heavy tailed situations: we recover the asymptotic rates dictated by the central limit theorem in a non-asymptotic manner once the sample size exceeds some explicitly computable threshold. Additionally, our results apply in the high-dimensional setup, as the threshold sample size exhibits the optimal dependence on the dimension (up to a logarithmic factor). The general setting allows us to recover recent results on multivariate mean estimation and linear regression in heavy-tailed situations and to prove the first sharp, non-asymptotic results for the portfolio optimization problem.
Algebraic effects offer a versatile framework that covers a wide variety of effects. However, the family of operations that delimit scopes are not algebraic and are usually modelled as handlers, thus preventing them from being used freely in conjunction with algebraic operations. Although proposals for scoped operations exist, they are either ad-hoc and unprincipled, or too inconvenient for practical programming. This paper provides the best of both worlds: a theoretically-founded model of scoped effects that is convenient for implementation and reasoning. Our new model is based on an adjunction between a locally finitely presentable category and a category of functorial algebras. Using comparison functors between adjunctions, we show that our new model, an existing indexed model, and a third approach that simulates scoped operations in terms of algebraic ones have equal expressivity for handling scoped operations. We consider our new model to be the sweet spot between ease of implementation and structuredness. Additionally, our approach automatically induces fusion laws of handlers of scoped effects, which are useful for reasoning and optimisation.
We consider the problem of coded distributed computing using polar codes. The average execution time of a coded computing system is related to the error probability for transmission over the binary erasure channel in recent work by Soleymani, Jamali and Mahdavifar, where the performance of binary linear codes is investigated. In this paper, we focus on polar codes and unveil a connection between the average execution time and the scaling exponent $\mu$ of the family of codes. The scaling exponent has emerged as a central object in the finite-length characterization of polar codes, and it captures the speed of convergence to capacity. In particular, we show that (i) the gap between the normalized average execution time of polar codes and that of optimal MDS codes is $O(n^{-1/\mu})$, and (ii) this upper bound can be improved to roughly $O(n^{-1/2})$ by considering polar codes with large kernels. We conjecture that these bounds could be improved to $O(n^{-2/\mu})$ and $O(n^{-1})$, respectively, and provide a heuristic argument as well as numerical evidence supporting this view.
The paper provides three results for SVARs under the assumption that the primitive shocks are mutually independent. First, a framework is proposed to accommodate a disaster-type variable with infinite variance into a VAR. We show that the least squares estimates of the VAR are consistent but have non-standard properties. Second, the disaster shock is identified as the component with the largest kurtosis and whose impact effect is negative. An estimator that is robust to infinite variance is used to recover the mutually independent components. Third, an independence test on the residuals pre-whitened by Choleski decomposition is proposed to test the restrictions imposed on a SVAR. The test can be applied whether the data have fat or thin tails, and to over as well as exactly identified models. Three applications are considered. In the first, the independence test is used to shed light on the conflicting evidence regarding the role of uncertainty in economic fluctuations. In the second, disaster shocks are shown to have short term economic impact arising mostly from feedback dynamics. The third application uses the framework to study the dynamic effects of economic shocks post-covid.
Approaches based on Functional Causal Models (FCMs) have been proposed to determine causal direction between two variables, by properly restricting model classes; however, their performance is sensitive to the model assumptions, which makes it difficult for practitioners to use. In this paper, we provide a novel dynamical-system view of FCMs and propose a new framework for identifying causal direction in the bivariate case. We first show the connection between FCMs and optimal transport, and then study optimal transport under the constraints of FCMs. Furthermore, by exploiting the dynamical interpretation of optimal transport under the FCM constraints, we determine the corresponding underlying dynamical process of the static cause-effect pair data under the least action principle. It provides a new dimension for describing static causal discovery tasks, while enjoying more freedom for modeling the quantitative causal influences. In particular, we show that Additive Noise Models (ANMs) correspond to volume-preserving pressureless flows. Consequently, based on their velocity field divergence, we introduce a criterion to determine causal direction. With this criterion, we propose a novel optimal transport-based algorithm for ANMs which is robust to the choice of models and extend it to post-noninear models. Our method demonstrated state-of-the-art results on both synthetic and causal discovery benchmark datasets.
We use a numerical-analytic technique to construct a sequence of successive approximations to the solution of a system of fractional differential equations, subject to Dirichlet boundary conditions. We prove the uniform convergence of the sequence of approximations to a limit function, which is the unique solution to the boundary value problem under consideration, and give necessary and sufficient conditions for the existence of solutions. The obtained theoretical results are confirmed by a model example.
In this paper, a numerical scheme for a nonlinear McKendrick-von Foerster equation with diffusion in age (MV-D) with the Dirichlet boundary condition is proposed. The main idea to derive the scheme is to use the discretization based on the method of characteristics to the convection part, and the finite difference method to the rest of the terms. The nonlocal terms are dealt with the quadrature methods. As a result, an implicit scheme is obtained for the boundary value problem under consideration. The consistency and the convergence of the proposed numerical scheme is established. Moreover, numerical simulations are presented to validate the theoretical results.
Accurately determining a change in protein binding affinity upon mutations is important for the discovery and design of novel therapeutics and to assist mutagenesis studies. Determination of change in binding affinity upon mutations requires sophisticated, expensive, and time-consuming wet-lab experiments that can be aided with computational methods. Most of the computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore the sequence-based prediction of change in protein binding affinity upon mutation. We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the change in protein binding affinity upon mutation. Our proposed sequence-based novel change in protein binding affinity predictor called PANDA gives better accuracy than existing methods over the same validation set as well as on an external independent test dataset. On an external test dataset, our proposed method gives a maximum Pearson correlation coefficient of 0.52 in comparison to the state-of-the-art existing protein structure-based method called MutaBind which gives a maximum Pearson correlation coefficient of 0.59. Our proposed protein sequence-based method, to predict a change in binding affinity upon mutations, has wide applicability and comparable performance in comparison to existing protein structure-based methods. A cloud-based webserver implementation of PANDA and its python code is available at //sites.google.com/view/wajidarshad/software and //github.com/wajidarshad/panda.
Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a method to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.