We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.
We propose several algorithms for learning unitary operators from quantum statistical queries (QSQs) with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our methods hinge on a novel technique for estimating the Fourier mass of a unitary on a subset of Pauli strings with a single quantum statistical query, generalizing a previous result for uniform quantum examples. Exploiting this insight, we show that the quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. Moreover, we prove that $\mathcal{O}(\log n)$-juntas and quantum Boolean functions with constant total influence are efficiently learnable in our model, and constant-depth circuits are learnable sample-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks require direct access to the Choi-Jamiolkowski state or oracle access to the unitary. In addition, our upper bounds imply that the actions of those classes of unitaries on locally scrambled ensembles can be efficiently learned. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger sample complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels, adapting to our setting previous arguments for quantum states. Finally, we propose a new definition of average-case surrogate models, showing a potential application of our results to hybrid quantum machine learning.
Two CNF formulas are called ucp-equivalent, if they behave in the same way with respect to the unit clause propagation (UCP). A formula is called ucp-irredundant, if removing any clause leads to a formula which is not ucp-equivalent to the original one. As a consequence of known results, the ratio of the size of a ucp-irredundant formula and the size of a smallest ucp-equivalent formula is at most $n^2$, where $n$ is the number of the variables. We demonstrate an example of a ucp-irredundant formula for a symmetric definite Horn function which is larger than a smallest ucp-equivalent formula by a factor $\Omega(n/\ln n)$ and, hence, a general upper bound on the above ratio cannot be smaller than this.
We report evidence of an undocumented method to manipulate citation counts involving 'sneaked' references. Sneaked references are registered as metadata for scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metadata from various sources, we show that extra undue references are actually sneaked in at Digital Object Identifier (DOI) registration time, resulting in artificially inflated citation counts. As a case study, focusing on three journals from a given publisher, we identified at least 9% sneaked references (5,978/65,836) mainly benefiting two authors. Despite not existing in the articles, these sneaked references exist in metadata registries and inappropriately propagate to bibliometric dashboards. Furthermore, we discovered 'lost' references: the studied bibliometric platform failed to index at least 56% (36,939/65,836) of the references listed in the HTML version of the publications. The extent of the sneaked and lost references in the global literature remains unknown and requires further investigations. Bibliometric platforms producing citation counts should identify, quantify, and correct these flaws to provide accurate data to their patrons and prevent further citation gaming.
Recent works in medical image registration have proposed the use of Implicit Neural Representations, demonstrating performance that rivals state-of-the-art learning-based methods. However, these implicit representations need to be optimized for each new image pair, which is a stochastic process that may fail to converge to a global minimum. To improve robustness, we propose a deformable registration method using pairs of cycle-consistent Implicit Neural Representations: each implicit representation is linked to a second implicit representation that estimates the opposite transformation, causing each network to act as a regularizer for its paired opposite. During inference, we generate multiple deformation estimates by numerically inverting the paired backward transformation and evaluating the consensus of the optimized pair. This consensus improves registration accuracy over using a single representation and results in a robust uncertainty metric that can be used for automatic quality control. We evaluate our method with a 4D lung CT dataset. The proposed cycle-consistent optimization method reduces the optimization failure rate from 2.4% to 0.0% compared to the current state-of-the-art. The proposed inference method improves landmark accuracy by 4.5% and the proposed uncertainty metric detects all instances where the registration method fails to converge to a correct solution. We verify the generalizability of these results to other data using a centerline propagation task in abdominal 4D MRI, where our method achieves a 46% improvement in propagation consistency compared with single-INR registration and demonstrates a strong correlation between the proposed uncertainty metric and registration accuracy.
Sewer pipe network systems are an important part of civil infrastructure, and in order to find a good trade-off between maintenance costs and system performance, reliable sewer pipe degradation models are essential. In this paper, we present a large-scale case study in the city of Breda in the Netherlands. Our dataset has information on sewer pipes built since the 1920s and contains information on different covariates. We also have several types of damage, but we focus our attention on infiltrations, surface damage, and cracks. Each damage has an associated severity index ranging from 1 to 5. To account for the characteristics of sewer pipes, we defined 6 cohorts of interest. Two types of discrete-time Markov chains (DTMC), which we called Chain `Multi' and `Single' (where Chain `Multi' contains additional transitions compared to Chain `Single'), are commonly used to model sewer pipe degradation at the pipeline level, and we want to evaluate which suits better our case study. To calibrate the DTMCs, we define an optimization process using Sequential Least-Squares Programming to find the DTMC parameter that best minimizes the root mean weighted square error. Our results show that for our case study, there is no substantial difference between Chain `Multi' and `Single', but the latter has fewer parameters and can be easily trained. Our DTMCs are useful to compare the cohorts via the expected values, e.g., concrete pipes carrying mixed and waste content reach severe levels of surface damage more quickly compared to concrete pipes carrying rainwater, which is a phenomenon typically identified in practice.
Sparse linear regression methods for high-dimensional data often assume that residuals have constant variance. When this assumption is violated, it can lead to bias in estimated coefficients, prediction intervals (PI) with improper length, and increased type I errors. We propose a heteroscedastic high-dimensional linear regression model through a partitioned empirical Bayes Expectation Conditional Maximization (H-PROBE) algorithm. H-PROBE is a computationally efficient maximum a posteriori estimation approach based on a Parameter-Expanded Expectation-Conditional-Maximization algorithm. It requires minimal prior assumptions on the regression parameters through plug-in empirical Bayes estimates of hyperparameters. The variance model uses a multivariate log-Gamma prior on coefficients that can incorporate covariates hypothesized to impact heterogeneity. The motivation of our approach is a study relating Aphasia Quotient (AQ) to high-resolution T2 neuroimages of brain damage in stroke patients. AQ is a vital measure of language impairment and informs treatment decisions, but it is challenging to measure and subject to heteroscedastic errors. It is, therefore, of clinical importance -- and the goal of this paper -- to use high-dimensional neuroimages to predict and provide PIs for AQ that accurately reflect the heterogeneity in residual variance. Our analysis demonstrates that H-PROBE can use markers of heterogeneity to provide narrower PI widths than standard methods without sacrificing coverage. Through extensive simulation studies, we exhibit that H-PROBE results in superior prediction, variable selection, and predictive inference than competing methods.
This paper concerns about the limiting distributions of change point estimators, in a high-dimensional linear regression time series context, where a regression object $(y_t, X_t) \in \mathbb{R} \times \mathbb{R}^p$ is observed at every time point $t \in \{1, \ldots, n\}$. At unknown time points, called change points, the regression coefficients change, with the jump sizes measured in $\ell_2$-norm. We provide limiting distributions of the change point estimators in the regimes where the minimal jump size vanishes and where it remains a constant. We allow for both the covariate and noise sequences to be temporally dependent, in the functional dependence framework, which is the first time seen in the change point inference literature. We show that a block-type long-run variance estimator is consistent under the functional dependence, which facilitates the practical implementation of our derived limiting distributions. We also present a few important byproducts of our analysis, which are of their own interest. These include a novel variant of the dynamic programming algorithm to boost the computational efficiency, consistent change point localisation rates under temporal dependence and a new Bernstein inequality for data possessing functional dependence. Extensive numerical results are provided to support our theoretical results. The proposed methods are implemented in the R package \texttt{changepoints} \citep{changepoints_R}.
We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.
Reservoir Computing (RC) is a type of recursive neural network (RNN), and there can be no doubt that the RC will be more and more widely used for building future prediction models for time-series data, with low training cost, high speed and high computational power. However, research into the mathematical structure of RC neural networks has only recently begun. Bollt (2021) clarified the necessity of the autoregressive (AR) model for gaining the insight into the mathematical structure of RC neural networks, and indicated that the Wold decomposition theorem is the milestone for understanding of these. Keeping this celebrated result in mind, in this paper, we clarify hidden structures of input and recurrent weight matrices in RC neural networks, and show that such structures attain perfect prediction for the AR type of time series data.
In order to cope with the task allocation in national economic mobilization, a task allocation planning method based on Hierarchical Task Network (HTN) for national economic mobilization is proposed. An HTN planning algorithm is designed to solve and optimize task allocation, and a method is explored to deal with the resource shortage. Finally, based on a real task allocation case in national economic mobilization, an experimental study verifies the effectiveness of the proposed method.