We investigate extreme value theory of a class of random sequences defined by the all-time suprema of aggregated self-similar Gaussian processes with trend. This study is motivated by its potential applications in various areas and its theoretical interestingness. We consider both stationary sequences and non-stationary sequences obtained by considering whether the trend functions are identical or not. We show that a sequence of suitably normalised $k$th order statistics converges in distribution to a limiting random variable which can be a negative log transformed Erlang distributed random variable, a Normal random variable or a mixture of them, according to three conditions deduced through the model parameters. Remarkably, this phenomenon resembles that for the stationary Normal sequence. We also show that various moments of the normalised $k$th order statistics converge to the moments of the corresponding limiting random variable. The obtained results enable us to analyze various properties of these random sequences, which reveals the interesting particularities of this class of random sequences in extreme value theory.
Often the rows (cases, objects) of a dataset have weights. For instance, the weight of a case may reflect the number of times it has been observed, or its reliability. For analyzing such data many rowwise weighted techniques are available, the most well known being the weighted average. But there are also situations where the individual cells (entries) of the data matrix have weights assigned to them. The purpose of this note is to provide an approach to analyze such data. We define a cellwise weighted likelihood function, that corresponds to a transformation of the dataset which we call unpacking. Using this weighted likelihood one can carry out multivariate statistical methods such as maximum likelihood estimation and likelihood ratio tests. We pay particular attention to the estimation of covariance matrices, because these are the building blocks of much of multivariate statistics. An R implementation of the cellwise maximum likelihood estimator is provided, which employs a version of the EM algorithm. Also a faster approximate method is proposed, which is asymptotically equivalent to it.
This paper studies inference in linear models with a high-dimensional parameter matrix that can be well-approximated by a ``spiked low-rank matrix.'' A spiked low-rank matrix has rank that grows slowly compared to its dimensions and nonzero singular values that diverge to infinity. We show that this framework covers a broad class of models of latent-variables which can accommodate matrix completion problems, factor models, varying coefficient models, and heterogeneous treatment effects. For inference, we apply a procedure that relies on an initial nuclear-norm penalized estimation step followed by two ordinary least squares regressions. We consider the framework of estimating incoherent eigenvectors and use a rotation argument to argue that the eigenspace estimation is asymptotically unbiased. Using this framework we show that our procedure provides asymptotically normal inference and achieves the semiparametric efficiency bound. We illustrate our framework by providing low-level conditions for its application in a treatment effects context where treatment assignment might be strongly dependent.
Supplier selection and order allocation (SSOA) are key strategic decisions in supply chain management which greatly impact the performance of the supply chain. Although, the SSOA problem has been studied extensively but less attention paid to scalability presents a significant gap preventing adoption of SSOA algorithms by industrial practitioners. This paper presents a novel multi-item, multi-supplier double order allocations with dual-sourcing and penalty constraints across two-tiers of a supply chain, resulting in cooperation and in facilitating supplier preferences to work with other suppliers through bidding. We propose Mixed-Integer Programming models for allocations at individual-tiers as well as an integrated allocations. An application to a real-time large-scale case study of a manufacturing company is presented, which is the largest scale studied in terms of supply chain size and number of variables so far in literature. The use case allows us to highlight how problem formulation and implementation can help reduce computational complexity using Mathematical Programming (MP) and Genetic Algorithm (GA) approaches. The results show an interesting observation that MP outperforms GA to solve SSOA. Sensitivity analysis is presented for sourcing strategy, penalty threshold and penalty factor. The developed model was successfully deployed in a large international sourcing conference with multiple bidding rounds, which helped in more than 10% procurement cost reductions to the manufacturing company.
With the attention mechanism, transformers achieve significant empirical successes. Despite the intuitive understanding that transformers perform relational inference over long sequences to produce desirable representations, we lack a rigorous theory on how the attention mechanism achieves it. In particular, several intriguing questions remain open: (a) What makes a desirable representation? (b) How does the attention mechanism infer the desirable representation within the forward pass? (c) How does a pretraining procedure learn to infer the desirable representation through the backward pass? We observe that, as is the case in BERT and ViT, input tokens are often exchangeable since they already include positional encodings. The notion of exchangeability induces a latent variable model that is invariant to input sizes, which enables our theoretical analysis. - To answer (a) on representation, we establish the existence of a sufficient and minimal representation of input tokens. In particular, such a representation instantiates the posterior distribution of the latent variable given input tokens, which plays a central role in predicting output labels and solving downstream tasks. - To answer (b) on inference, we prove that attention with the desired parameter infers the latent posterior up to an approximation error, which is decreasing in input sizes. In detail, we quantify how attention approximates the conditional mean of the value given the key, which characterizes how it performs relational inference over long sequences. - To answer (c) on learning, we prove that both supervised and self-supervised objectives allow empirical risk minimization to learn the desired parameter up to a generalization error, which is independent of input sizes. Particularly, in the self-supervised setting, we identify a condition number that is pivotal to solving downstream tasks.
General log-linear models specified by non-negative integer design matrices have a potentially wide range of applications, although using models without the genuine overall effect, that is, ones which cannot be reparameterized to include a normalizing constant, is still rare. The log-linear models without the overall effect arise naturally in practice, and can be handled in a similar manner to models with the overall effect. A novel iterative scaling procedure for the MLE computation under such models is proposed, and its convergence is proved. The results are illustrated using data from a recent clinical study.
In many investigations, the primary outcome of interest is difficult or expensive to collect. Examples include long-term health effects of medical interventions, measurements requiring expensive testing or follow-up, and outcomes only measurable on small panels as in marketing. This reduces effective sample sizes for estimating the average treatment effect (ATE). However, there is often an abundance of observations on surrogate outcomes not of primary interest, such as short-term health effects or online-ad click-through. We study the role of such surrogate observations in the efficient estimation of treatment effects. To quantify their value, we derive the semiparametric efficiency bounds on ATE estimation with and without the presence of surrogates and several intermediary settings. The difference between these characterizes the efficiency gains from optimally leveraging surrogates. We study two regimes: when the number of surrogate observations is comparable to primary-outcome observations and when the former dominates the latter. We take an agnostic missing-data approach circumventing strong surrogate conditions previously assumed. To leverage surrogates' efficiency gains, we develop efficient ATE estimation and inference based on flexible machine-learning estimates of nuisance functions appearing in the influence functions we derive. We empirically demonstrate the gains by studying the long-term earnings effect of job training.
We study a 1D geometry of a plasma confined between two conducting floating walls with applications to laboratory plasmas. These plasmas are characterized by a quasi-neutral bulk that is joined to the wall by a thin boundary layer called sheath that is positively charged. Although analytical solutions are available in the sheath and the pre-sheath, joining the two areas by one analytical solution is still an open problem which requires the numerical resolution of the fluid equations coupled to Poisson equation. Current numerical schemes use high-order discretizations to correctly capture the electron current in the sheath, presenting unsatisfactory results in the boundary layer and they are not adapted to all the possible collisional regimes. In this work, we identify the main numerical challenges that arise when attempting the simulations of such configuration and we propose explanations for the observed phenomena via numerical analysis. We propose a numerical scheme with controlled diffusion as well as new discrete boundary conditions that address the identified issues.
We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation (TV): the sum of weighted absolute differences of parameters $\theta_i,\theta_j$ (which estimate the function values $f_0(x_i),f_0(x_j)$) at all neighboring cells $i,j$ in the Voronoi diagram. This is seen to be equivalent to a variational optimization problem that regularizes according to the usual continuum (measure-theoretic) notion of TV, once we restrict the domain to functions that are piecewise constant over the Voronoi diagram. The regression estimator under consideration hence performs (shrunken) local averaging over adaptively formed unions of Voronoi cells, and we refer to it as the Voronoigram, following the ideas in Koenker (2005), and drawing inspiration from Tukey's regressogram (Tukey, 1961). Our contributions in this paper span both the conceptual and theoretical frontiers: we discuss some of the unique properties of the Voronoigram in comparison to TV-regularized estimators that use other graph-based discretizations; we derive the asymptotic limit of the Voronoi TV functional; and we prove that the Voronoigram is minimax rate optimal (up to log factors) for estimating BV functions that are essentially bounded.
Sequential testing, always-valid $p$-values, and confidence sequences promise flexible statistical inference and on-the-fly decision making. However, unlike fixed-$n$ inference based on asymptotic normality, existing sequential tests either make parametric assumptions and end up under-covering/over-rejecting when these fail or use non-parametric but conservative concentration inequalities and end up over-covering/under-rejecting. To circumvent these issues, we sidestep exact at-least-$\alpha$ coverage and focus on asymptotically exact coverage and asymptotic optimality. That is, we seek sequential tests whose probability of ever rejecting a true hypothesis asymptotically approaches $\alpha$ and whose expected time to reject a false hypothesis approaches a lower bound on all tests with asymptotic coverage at least $\alpha$, both under an appropriate asymptotic regime. We permit observations to be both non-parametric and dependent and focus on testing whether the observations form a martingale difference sequence. We propose the universal sequential probability ratio test (uSPRT), a slight modification to the normal-mixture sequential probability ratio test, where we add a burn-in period and adjust thresholds accordingly. We show that even in this very general setting, the uSPRT is asymptotically optimal under mild generic conditions. We apply the results to stabilized estimating equations to test means, treatment effects, etc. Our results also provide corresponding guarantees for the implied confidence sequences. Numerical simulations verify our guarantees and the benefits of the uSPRT over alternatives.
For several classes of mathematical models that yield linear systems, the splitting of the matrix into its Hermitian and skew Hermitian parts is naturally related to properties of the underlying model. This is particularly so for discretizations of dissipative Hamiltonian ODEs, DAEs and port Hamiltonian systems where, in addition, the Hermitian part is positive definite or semi-definite. It is then possible to develop short recurrence optimal Krylov subspace methods in which the Hermitian part is used as a preconditioner. In this paper we develop new, right preconditioned variants of this approach which as their crucial new feature allow the systems with the Hermitian part to be solved only approximately in each iteration while keeping the short recurrences. This new class of methods is particularly efficient as it allows, for example, to use few steps of a multigrid solver or a (preconditioned) CG method for the Hermitian part in each iteration. We illustrate this with several numerical experiments for large scale systems.