The present study supposes a single unit and investigates cumulative damage and catastrophic failure models for the unit, in situations where the interarrival times between the shocks, and the magnitudes of the shocks, involve two different stochastic processes. In order to consider two essentially different stochastic processes, integer gamma and Weibull distributions are treated as distributions with two parameters and extensions of exponential distributions. With respect to the cumulative damage models, under the assumption that the interarrival times between shocks follow exponential distributions, the case in which the magnitudes of the shocks follow integer gamma distributions is analyzed. With respect to the catastrophic failure models, the respective cases in which the interarrival times between shocks follow integer gamma and Weibull distributions are discussed. Finally, the study provides some characteristic values for reliability in such models.
We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit, we recover the standard gradient flow. We apply dynamical mean-field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process. We explore the performance of the algorithm as a function of the control parameters shedding light on how it navigates the loss landscape.
We derive the form of the variance-covariance matrix for any affine equivariant matrix-valued statistics when sampling from complex elliptical distributions. We then use this result to derive the variance-covariance matrix of the sample covariance matrix (SCM) as well as its theoretical mean squared error (MSE) when finite fourth-order moments exist. Finally, illustrative examples of the formulas are presented.
We examine the behaviors of various models of $k$-limited automata, which naturally extend Hibbard's [Inf. Control, vol. 11, pp. 196--238, 1967] scan limited automata, each of which is a single-tape linear-bounded automaton satisfying the $k$-limitedness requirement that the content of each tape cell should be modified only during the first $k$ visits of a tape head. One central computation model is a probabilistic $k$-limited automaton (abbreviated as a $k$-lpa), which accepts an input exactly when its accepting states are reachable from its initial state with probability more than 1/2 within expected polynomial time. We also study the behaviors of one-sided-error and bounded-error variants of such $k$-lpa's as well as the deterministic, nondeterministic, and unambiguous models of $k$-limited automata, which can be viewed as natural restrictions of $k$-lpa's. We discuss fundamental properties of these machine models and obtain inclusions and separations among language families induced by them. In due course, we study special features -- the blank skipping property and the closure under reversal -- which are keys to the robustness of $k$-lpa's.
We introduce a new test for conditional independence which is based on what we call the weighted generalised covariance measure (WGCM). It is an extension of the recently introduced generalised covariance measure (GCM). To test the null hypothesis of X and Y being conditionally independent given Z, our test statistic is a weighted form of the sample covariance between the residuals of nonlinearly regressing X and Y on Z. We propose different variants of the test for both univariate and multivariate X and Y. We give conditions under which the tests yield the correct type I error rate. Finally, we compare our novel tests to the original GCM using simulation and on real data sets. Typically, our tests have power against a wider class of alternatives compared to the GCM. This comes at the cost of having less power against alternatives for which the GCM already works well.
This paper introduces a novel Ito diffusion process to model high-frequency financial data, which can accommodate low-frequency volatility dynamics by embedding the discrete-time non-linear exponential GARCH structure with log-integrated volatility in a continuous instantaneous volatility process. The key feature of the proposed model is that, unlike existing GARCH-Ito models, the instantaneous volatility process has a non-linear structure, which ensures that the log-integrated volatilities have the realized GARCH structure. We call this the exponential realized GARCH-Ito (ERGI) model. Given the auto-regressive structure of the log-integrated volatility, we propose a quasi-likelihood estimation procedure for parameter estimation and establish its asymptotic properties. We conduct a simulation study to check the finite sample performance of the proposed model and an empirical study with 50 assets among the S\&P 500 compositions. The numerical studies show the advantages of the new proposed model.
Despite their many appealing properties, kernel methods are heavily affected by the curse of dimensionality. For instance, in the case of inner product kernels in $\mathbb{R}^d$, the Reproducing Kernel Hilbert Space (RKHS) norm is often very large for functions that depend strongly on a small subset of directions (ridge functions). Correspondingly, such functions are difficult to learn using kernel methods. This observation has motivated the study of generalizations of kernel methods, whereby the RKHS norm -- which is equivalent to a weighted $\ell_2$ norm -- is replaced by a weighted functional $\ell_p$ norm, which we refer to as $\mathcal{F}_p$ norm. Unfortunately, tractability of these approaches is unclear. The kernel trick is not available and minimizing these norms requires to solve an infinite-dimensional convex problem. We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size. Hence, learning with $\mathcal{F}_p$ norms is tractable in these cases. We introduce a proof technique based on uniform concentration in the dual, which can be of broader interest in the study of overparametrized models. For $p= 1$, our guarantees for the random features approximation break down. We prove instead that learning with the $\mathcal{F}_1$ norm is $\mathsf{NP}$-hard under a randomized reduction based on the problem of learning halfspaces with noise.
We consider nonparametric prediction with multiple covariates, in particular categorical or functional predictors, or a mixture of both. The method proposed bases on an extension of the Nadaraya-Watson estimator where a kernel function is applied on a linear combination of distance measures each calculated on single covariates, with weights being estimated from the training data. The dependent variable can be categorical (binary or multi-class) or continuous, thus we consider both classification and regression problems. The methodology presented is illustrated and evaluated on artificial and real world data. Particularly it is observed that prediction accuracy can be increased, and irrelevant, noise variables can be identified/removed by "downgrading" the corresponding distance measures in a completely data-driven way.
Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.
We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activiation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by (stochastic) gradient descent produces a sequence of iterates that stay inside a small perturbation region centering around the initial weights, in which the empirical loss function of deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of (stochastic) gradient descent. Our theoretical results shed light on understanding the optimization of deep learning, and pave the way to study the optimization dynamics of training modern deep neural networks.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.