This paper presents a new method to estimate systematic errors in the maximum-likelihood regression of count data. The method is applicable in particular to X-ray spectra in situations where the Poisson log-likelihood, or the Cash goodness-of-fit statistic, indicate a poor fit that is attributable to overdispersion of the data. Overdispersion in Poisson data is treated as an intrinsic model variance that can be estimated from the best-fit model, using the maximum-likelihood Cmin statistic. The paper also studies the effects of such systematic errors on the Delta C likelihood-ratio statistic, which can be used to test for the presence of a nested model component in the regression of Poisson count data. The paper introduces an overdispersed chi-square distribution that results from the convolution of a chi-square distribution that models the usual Delta C statistic, and a zero-mean Gaussian that models the overdispersion in the data. This is proposed as the distribution of choice for the Delta C statistic in the presence of systematic errors. The methods presented in this paper are applied to XMM-Newton data of the quasar 1ES 1553+113 that were used to detect absorption lines from an intervening warm-hot intergalactic medium (WHIM). This case study illustrates how systematic errors can be estimated from the data, and their effect on the detection of a nested component, such as an absorption line, with the Delta C statistic.
Reliable and prompt identification of active users is critical for enabling random access in massive machine-to-machine type networks which typically operate within stringent access delay and energy constraints. In this paper, an energy efficient active user identification protocol is envisioned in which the active users simultaneously transmit On-Off Keying (OOK) modulated preambles whereas the base station uses non-coherent detection to avoid the channel estimation overheads. The minimum number of channel-uses required for active user identification in the asymptotic regime of total number of users $\ell$ when the number of active devices k scales as $k = \Theta(1)$ is characterized along with an achievability scheme relying on the equivalence of activity detection to a group testing problem. A practical scheme for active user identification based on a belief propagation strategy is also proposed and its performance is compared against the theoretical bounds.
Quantum entanglement distribution between remote nodes is key to many promising quantum applications. Existing mechanisms have mainly focused on improving throughput and fidelity via entanglement routing or single-node scheduling. This paper considers entanglement scheduling and distribution among many source-destination pairs with different requests over an entire quantum network topology. Two practical scenarios are considered. When requests do not have deadlines, we seek to minimize the average completion time of the communication requests. If deadlines are specified, we seek to maximize the number of requests whose deadlines are met. Inspired by optimal scheduling disciplines in conventional single-queue scenarios, we design a general optimization framework for entanglement scheduling and distribution called ESDI, and develop a probabilistic protocol to implement the optimized solutions in a general buffered quantum network. We develop a discrete-time quantum network simulator for evaluation. Results show the superior performance of ESDI compared to existing solutions.
PCA-Net is a recently proposed neural operator architecture which combines principal component analysis (PCA) with neural networks to approximate operators between infinite-dimensional function spaces. The present work develops approximation theory for this approach, improving and significantly extending previous work in this direction: First, a novel universal approximation result is derived, under minimal assumptions on the underlying operator and the data-generating distribution. Then, two potential obstacles to efficient operator learning with PCA-Net are identified, and made precise through lower complexity bounds; the first relates to the complexity of the output distribution, measured by a slow decay of the PCA eigenvalues. The other obstacle relates to the inherent complexity of the space of operators between infinite-dimensional input and output spaces, resulting in a rigorous and quantifiable statement of the curse of dimensionality. In addition to these lower bounds, upper complexity bounds are derived. A suitable smoothness criterion is shown to ensure an algebraic decay of the PCA eigenvalues. Furthermore, it is shown that PCA-Net can overcome the general curse of dimensionality for specific operators of interest, arising from the Darcy flow and the Navier-Stokes equations.
In the context of simulation-based methods, multiple challenges arise, two of which are considered in this work. As a first challenge, problems including time-dependent phenomena with complex domain deformations, potentially even with changes in the domain topology, need to be tackled appropriately. The second challenge arises when computational resources and the time for evaluating the model become critical in so-called many query scenarios for parametric problems. For example, these problems occur in optimization, uncertainty quantification (UQ), or automatic control and using highly resolved full-order models (FOMs) may become impractical. To address both types of complexity, we present a novel projection-based model order reduction (MOR) approach for deforming domain problems that takes advantage of the time-continuous space-time formulation. We apply it to two examples that are relevant for engineering or biomedical applications and conduct an error and performance analysis. In both cases, we are able to drastically reduce the computational expense for a model evaluation and, at the same time, to maintain an adequate accuracy level. All in all, this work indicates the effectiveness of the presented MOR approach for deforming domain problems taking advantage of a time-continuous space-time setting.
Over the past few years, there has been a significant amount of research focused on studying the ReLU activation function, with the aim of achieving neural network convergence through over-parametrization. However, recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions, specifically in the attention mechanism. Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function. Given a set of data points with labels $\{(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\} \subset \mathbb{R}^d \times \mathbb{R}$ where $n$ denotes the number of the data. Here $F(W(t),x)$ can be expressed as $F(W(t),x) := \sum_{r=1}^m a_r \exp(\langle w_r, x \rangle)$, where $m$ represents the number of neurons, and $w_r(t)$ are weights at time $t$. It's standard in literature that $a_r$ are the fixed weights and it's never changed during the training. We initialize the weights $W(0) \in \mathbb{R}^{d \times m}$ with random Gaussian distributions, such that $w_r(0) \sim \mathcal{N}(0, I_d)$ and initialize $a_r$ from random sign distribution for each $r \in [m]$. Using the gradient descent algorithm, we can find a weight $W(T)$ such that $\| F(W(T), X) - y \|_2 \leq \epsilon$ holds with probability $1-\delta$, where $\epsilon \in (0,0.1)$ and $m = \Omega(n^{2+o(1)}\log(n/\delta))$. To optimize the over-parameterization bound $m$, we employ several tight analysis techniques from previous studies [Song and Yang arXiv 2019, Munteanu, Omlor, Song and Woodruff ICML 2022].
Factor models have been widely used in economics and finance. However, the heavy-tailed nature of macroeconomic and financial data is often neglected in the existing literature. To address this issue and achieve robustness, we propose an approach to estimate factor loadings and scores by minimizing the Huber loss function, which is motivated by the equivalence of conventional Principal Component Analysis (PCA) and the constrained least squares method in the factor model. We provide two algorithms that use different penalty forms. The first algorithm, which we refer to as Huber PCA, minimizes the $\ell_2$-norm-type Huber loss and performs PCA on the weighted sample covariance matrix. The second algorithm involves an element-wise type Huber loss minimization, which can be solved by an iterative Huber regression algorithm. Our study examines the theoretical minimizer of the element-wise Huber loss function and demonstrates that it has the same convergence rate as conventional PCA when the idiosyncratic errors have bounded second moments. We also derive their asymptotic distributions under mild conditions. Moreover, we suggest a consistent model selection criterion that relies on rank minimization to estimate the number of factors robustly. We showcase the benefits of Huber PCA through extensive numerical experiments and a real financial portfolio selection example. An R package named ``HDRFA" has been developed to implement the proposed robust factor analysis.
This paper revisits classical works of Rauch (1963, et al. 1965) and develops a novel method for maximum likelihood (ML) smoothing estimation from incomplete information/data of stochastic state-space systems. Score function and conditional observed information matrices of incomplete data are introduced and their distributional identities are established. Using these identities, the ML smoother $\widehat{x}_{k\vert n}^s =\argmax_{x_k} \log f(x_k,\widehat{x}_{k+1\vert n}^s, y_{0:n}\vert\theta)$, $k\leq n-1$, is presented. The result shows that the ML smoother gives an estimate of state $x_k$ with more adherence of loglikehood having less standard errors than that of the ML state estimator $\widehat{x}_k=\argmax_{x_k} \log f(x_k,y_{0:k}\vert\theta)$, with $\widehat{x}_{n\vert n}^s=\widehat{x}_n$. Recursive estimation is given in terms of an EM-gradient-particle algorithm which extends the work of \cite{Lange} for ML smoothing estimation. The algorithm has an explicit iteration update which lacks in (\cite{Ramadan}) EM-algorithm for smoothing. A sequential Monte Carlo method is developed for valuation of the score function and observed information matrices. A recursive equation for the covariance matrix of estimation error is developed to calculate the standard errors. In the case of linear systems, the method shows that the Rauch-Tung-Striebel (RTS) smoother is a fully efficient smoothing state-estimator whose covariance matrix coincides with the Cram\'er-Rao lower bound, the inverse of expected information matrix. Furthermore, the RTS smoother coincides with the Kalman filter having less covariance matrix. Numerical studies are performed, confirming the accuracy of the main results.
To improve precision of estimation and power of testing hypothesis for an unconditional treatment effect in randomized clinical trials with binary outcomes, researchers and regulatory agencies recommend using g-computation as a reliable method of covariate adjustment. However, the practical application of g-computation is hindered by the lack of an explicit robust variance formula that can be used for different unconditional treatment effects of interest. To fill this gap, we provide explicit and robust variance estimators for g-computation estimators and demonstrate through simulations that the variance estimators can be reliably applied in practice.
Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.