{mayi_des}
The paper studies non-stationary high-dimensional vector autoregressions of order $k$, VAR($k$). Additional deterministic terms such as trend or seasonality are allowed. The number of time periods, $T$, and number of coordinates, $N$, are assumed to be large and of the same order. Under such regime the first-order asymptotics of the Johansen likelihood ratio (LR), Pillai-Barlett, and Hotelling-Lawley tests for cointegration is derived: Test statistics converge to non-random integrals. For more refined analysis, the paper proposes and analyzes a modification of the Johansen test. The new test for the absence of cointegration converges to the partial sum of the Airy$_1$ point process. Supporting Monte Carlo simulations indicate that the same behavior persists universally in many situations beyond our theorems. The paper presents an empirical implementation of the approach to the analysis of stocks in S$\&$P$100$ and of cryptocurrencies. The latter example has strong presence of multiple cointegrating relationships, while the former is consistent with the null of no cointegration.
The method of instrumental variables provides a fundamental and practical tool for causal inference in many empirical studies where unmeasured confounding between the treatments and the outcome is present. Modern data such as the genetical genomics data from these studies are often high-dimensional. The high-dimensional linear instrumental-variables regression has been considered in the literature due to its simplicity albeit a true nonlinear relationship may exist. We propose a more data-driven approach by considering the nonparametric additive models between the instruments and the treatments while keeping a linear model between the treatments and the outcome so that the coefficients therein can directly bear causal interpretation. We provide a two-stage framework for estimation and inference under this more general setup. The group lasso regularization is first employed to select optimal instruments from the high-dimensional additive models, and the outcome variable is then regressed on the fitted values from the additive models to identify and estimate important treatment effects. We provide non-asymptotic analysis of the estimation error of the proposed estimator. A debiasing procedure is further employed to yield valid inference. Extensive numerical experiments show that our method can rival or outperform existing approaches in the literature. We finally analyze the mouse obesity data and discuss new findings from our method.
The 4th Industrial Revolution is the culmination of the digital age. Nowadays, technologies such as robotics, nanotechnology, genetics, and artificial intelligence promise to transform our world and the way we live. Artificial Intelligence Ethics and Safety is an emerging research field that has been gaining popularity in recent years. Several private, public and non-governmental organizations have published guidelines proposing ethical principles for regulating the use and development of autonomous intelligent systems. Meta-analyses of the AI Ethics research field point to convergence on certain principles that supposedly govern the AI industry. However, little is known about the effectiveness of this form of Ethics. In this paper, we would like to conduct a critical analysis of the current state of AI Ethics and suggest that this form of governance based on principled ethical guidelines is not sufficient to norm the AI industry and its developers. We believe that drastic changes are necessary, both in the training processes of professionals in the fields related to the development of software and intelligent systems and in the increased regulation of these professionals and their industry. To this end, we suggest that law should benefit from recent contributions from bioethics, to make the contributions of AI ethics to governance explicit in legal terms.
This paper is concerned with developing an efficient numerical algorithm for fast implementation of the sparse grid method for computing the $d$-dimensional integral of a given function. The new algorithm, called the MDI-SG ({\em multilevel dimension iteration sparse grid}) method, implements the sparse grid method based on a dimension iteration/reduction procedure, it does not need to store the integration points, neither does it compute the function values independently at each integration point, instead, it re-uses the computation for function evaluations as much as possible by performing the function evaluations at all integration points in a cluster and iteratively along coordinate directions. It is shown numerically that the computational complexity (in terms of CPU time) of the proposed MDI-SG method is of polynomial order $O(Nd^3 )$ or better, compared to the exponential order $O(N(\log N)^{d-1})$ for the standard sparse grid method, where $N$ denotes the maximum number of integration points in each coordinate direction. As a result, the proposed MDI-SG method effectively circumvents the curse of dimensionality suffered by the standard sparse grid method for high-dimensional numerical integration.
Bayesian nonparametric mixture models are common for modeling complex data. While these models are well-suited for density estimation, their application for clustering has some limitations. Miller and Harrison (2014) proved posterior inconsistency in the number of clusters when the true number of clusters is finite for Dirichlet process and Pitman--Yor process mixture models. In this work, we extend this result to additional Bayesian nonparametric priors such as Gibbs-type processes and finite-dimensional representations of them. The latter include the Dirichlet multinomial process and the recently proposed Pitman--Yor and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a post-processing algorithm introduced by Guha et al. (2021) for the Dirichlet process extends to more general models and provides a consistent method to estimate the number of components.
In this paper, we propose and study a fast multilevel dimension iteration (MDI) algorithm for computing arbitrary $d$-dimensional integrals based on tensor product approximations. It reduces the computational complexity (in terms of the CPU time) of a tensor product method from the exponential order $O(N^d)$ to the polynomial order {\color{black} $O(d^3N^2)$ or better}, where $N$ stands for the number of quadrature points in each coordinate direction. As a result, the proposed MDI algorithm effectively circumvents the curse of the dimensionality of tensor product methods for high dimensional numerical integration. The main idea of the proposed MDI algorithm is to compute the function evaluations at all integration points in the cluster and iteratively along each coordinate direction, so lots of computations for function evaluations can be reused in each iteration. This idea is also applicable to any quadrature rule whose integration points have a lattice-like structure.
In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower-dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.
We propose the Terminating-Random Experiments (T-Rex) selector, a fast variable selection method for high-dimensional data. The T-Rex selector controls a user-defined target false discovery rate (FDR) while maximizing the number of selected variables. This is achieved by fusing the solutions of multiple early terminated random experiments. The experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors. A finite sample proof based on martingale theory for the FDR control property is provided. Numerical simulations confirm that the FDR is controlled at the target level while allowing for a high power. We prove under mild conditions that the dummies can be sampled from any univariate probability distribution with finite expectation and variance. The computational complexity of the proposed method is linear in the number of variables. The T-Rex selector outperforms state-of-the-art methods for FDR control on a simulated genome-wide association study (GWAS), while its sequential computation time is more than two orders of magnitude lower than that of the strongest benchmark methods. The open source R package TRexSelector containing the implementation of the T-Rex selector is available on CRAN.
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space, the Rademacher complexity of the class and the training error under any continuous loss $\ell$ can control the test error under all Moreau envelopes of the loss $\ell$. We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021) for linear regression with the square loss, which is known to be tight for minimal $\ell_2$-norm interpolation, but we also handle more general settings where the label is generated by a potentially misspecified multi-index model. The same argument can analyze noisy interpolation of max-margin classifiers through the squared hinge loss, and establishes consistency results in spiked-covariance settings. More generally, when the loss is only assumed to be Lipschitz, our bound effectively improves Talagrand's well-known contraction lemma by a factor of two, and we prove uniform convergence of interpolators (Koehler et al. 2021) for all smooth, non-negative losses. Finally, we show that application of our generalization bound using localized Gaussian width will generally be sharp for empirical risk minimizers, establishing a non-asymptotic Moreau envelope theory for generalization that applies outside of proportional scaling regimes, handles model misspecification, and complements existing asymptotic Moreau envelope theories for M-estimation.
In this paper, we propose new geometrically unfitted space-time Finite Element methods for partial differential equations posed on moving domains of higher order accuracy in space and time. As a model problem, the convection-diffusion problem on a moving domain is studied. For geometrically higher order accuracy, we apply a parametric mapping on a background space-time tensor-product mesh. Concerning discretisation in time, we consider discontinuous Galerkin, as well as related continuous (Petrov-)Galerkin and Galerkin collocation methods. For stabilisation with respect to bad cut configurations and as an extension mechanism that is required for the latter two schemes, a ghost penalty stabilisation is employed. The article puts an emphasis on the techniques that allow to achieve a robust but higher order geometry handling for smooth domains. We investigate the computational properties of the respective methods in a series of numerical experiments. These include studies in different dimensions for different polynomial degrees in space and time, validating the higher order accuracy in both variables.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.