This paper deals with the problem of global parameter estimation of affine diffusions in $\mathbb{R}_+ \times \mathbb{R}^n$ denoted by $AD(1, n)$ where $n$ is a positive integer which is a subclass of affine diffusions introduced by Duffie et al in [14]. The $AD(1, n)$ model can be applied to the pricing of bond and stock options, which is illustrated for the Vasicek, Cox-Ingersoll-Ross and Heston models. Our first result is about the classification of $AD(1, n)$ processes according to the subcritical, critical and supercritical cases. Then, we give the stationarity and the ergodicity theorems of this model and we establish asymptotic properties for the maximum likelihood estimator in both subcritical and a special supercritical cases.
This paper explores the capacity of additive Vertically-Drifted First Arrival Position (VDFAP) noise channels, which are emerging as a new paradigm for diffusive molecular communication. Analogous to the capacity of parallel Gaussian channels, the capacity of VDFAP noise channels is defined as the supremum of the mutual information between the input and output signals subject to an overall second-moment constraint on input distributions. Upper and lower bounds for this capacity are derived for the case of three spatial dimensions, based on an analysis of the characteristic function of the VDFAP distribution and an investigation of its stability properties. The results of this study contribute to the ongoing effort to understand the fundamental limits of molecular communication systems.
Recently, $(\beta,\gamma)$-Chebyshev functions, as well as the corresponding zeros, have been introduced as a generalization of classical Chebyshev polynomials of the first kind and related roots. They consist of a family of orthogonal functions on a subset of $[-1,1]$, which indeed satisfies a three-term recurrence formula. In this paper we present further properties, which are proven to comply with various results about classical orthogonal polynomials. In addition, we prove a conjecture concerning the Lebesgue constant's behavior related to the roots of $(\beta,\gamma)$-Chebyshev functions in the corresponding orthogonality interval.
Language is constantly changing and evolving, leaving language models to become quickly outdated. Consequently, we should continuously update our models with new data to expose them to new events and facts. However, that requires additional computing, which means new carbon emissions. Do any measurable benefits justify this cost? This paper looks for empirical evidence to support continuous training. We reproduce existing benchmarks and extend them to include additional time periods, models, and tasks. Our results show that the downstream task performance of temporally adapted English models for social media data do not improve over time. Pretrained models without temporal adaptation are actually significantly more effective and efficient. However, we also note a lack of suitable temporal benchmarks. Our findings invite a critical reflection on when and how to temporally adapt language models, accounting for sustainability.
In this paper we discuss potentially practical ways to produce expander graphs with good spectral properties and a compact description. We focus on several classes of uniform and bipartite expander graphs defined as random Schreier graphs of the general linear group over the finite field of size two. We perform numerical experiments and show that such constructions produce spectral expanders that can be useful for practical applications. To find a theoretical explanation of the observed experimental results, we used the method of moments to prove upper bounds for the expected second largest eigenvalue of the random Schreier graphs used in our constructions. We focus on bounds for which it is difficult to study the asymptotic behaviour but it is possible to compute non-trivial conclusions for relatively small graphs with parameters from our numerical experiments (e.g., with less than 2^200 vertices and degree at least logarithmic in the number of vertices).
Subset selection tasks, arise in recommendation systems and search engines and ask to select a subset of items that maximize the value for the user. The values of subsets often display diminishing returns, and hence, submodular functions have been used to model them. If the inputs defining the submodular function are known, then existing algorithms can be used. In many applications, however, inputs have been observed to have social biases that reduce the utility of the output subset. Hence, interventions to improve the utility are desired. Prior works focus on maximizing linear functions -- a special case of submodular functions -- and show that fairness constraint-based interventions can not only ensure proportional representation but also achieve near-optimal utility in the presence of biases. We study the maximization of a family of submodular functions that capture functions arising in the aforementioned applications. Our first result is that, unlike linear functions, constraint-based interventions cannot guarantee any constant fraction of the optimal utility for this family of submodular functions. Our second result is an algorithm for submodular maximization. The algorithm provably outputs subsets that have near-optimal utility for this family under mild assumptions and that proportionally represent items from each group. In empirical evaluation, with both synthetic and real-world data, we observe that this algorithm improves the utility of the output subset for this family of submodular functions over baselines.
SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the region that we project into, provided that the magnitude of the reward is not too large. Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime.
An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor (component) analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.
This paper investigates the mathematical properties of a stochastic version of the balanced 2D thermal quasigeostrophic (TQG) model of potential vorticity dynamics. This stochastic TQG model is intended as a basis for parametrisation of the dynamical creation of unresolved degrees of freedom in computational simulations of upper ocean dynamics when horizontal buoyancy gradients and bathymetry affect the dynamics, particularly at the submesoscale (250m--10km). Specifically, we have chosen the SALT (Stochastic Advection by Lie Transport) algorithm introduced in [1] and applied in [2,3] as our modelling approach. The SALT approach preserves the Kelvin circulation theorem and an infinite family of integral conservation laws for TQG. The goal of the SALT algorithm is to quantify the uncertainty in the process of up-scaling, or coarse-graining of either observed or synthetic data at fine scales, for use in computational simulations at coarser scales. The present work provides a rigorous mathematical analysis of the solution properties of the thermal quasigeostrophic (TQG) equations with stochastic advection by Lie transport (SALT) [4,5].
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.