In this paper, we prove that functional sliced inverse regression (FSIR) achieves the optimal (minimax) rate for estimating the central space in functional sufficient dimension reduction problems. First, we provide a concentration inequality for the FSIR estimator of the covariance of the conditional mean, i.e., $\var(\E[\boldsymbol{X}\mid Y])$. Based on this inequality, we establish the root-$n$ consistency of the FSIR estimator of the image of $\var(\E[\boldsymbol{X}\mid Y])$. Second, we apply the most widely used truncated scheme to estimate the inverse of the covariance operator and identify the truncation parameter which ensures that FSIR can achieve the optimal minimax convergence rate for estimating the central space. Finally, we conduct simulations to demonstrate the optimal choice of truncation parameter and the estimation efficiency of FSIR. To the best of our knowledge, this is the first paper to rigorously prove the minimax optimality of FSIR in estimating the central space for multiple-index models and general $Y$ (not necessarily discrete).
The quantum alternating operator ansatz (QAOA) is a heuristic hybrid quantum-classical algorithm for finding high-quality approximate solutions to combinatorial optimization problems, such as Maximum Satisfiability. While QAOA is well-studied, theoretical results as to its runtime or approximation ratio guarantees are still relatively sparse. We provide some of the first lower bounds for the number of rounds (the dominant component of QAOA runtimes) required for QAOA. For our main result, (i) we leverage a connection between quantum annealing times and the angles of QAOA to derive a lower bound on the number of rounds of QAOA with respect to the guaranteed approximation ratio. We apply and calculate this bound with Grover-style mixing unitaries and (ii) show that this type of QAOA requires at least a polynomial number of rounds to guarantee any constant approximation ratios for most problems. We also (iii) show that the bound depends only on the statistical values of the objective functions, and when the problem can be modeled as a $k$-local Hamiltonian, can be easily estimated from the coefficients of the Hamiltonians. For the conventional transverse field mixer, (iv) our framework gives a trivial lower bound to all bounded occurrence local cost problems and all strictly $k$-local cost Hamiltonians matching known results that constant approximation ratio is obtainable with constant round QAOA for a few optimization problems from these classes. Using our novel proof framework, (v) we recover the Grover lower bound for unstructured search and -- with small modification -- show that our bound applies to any QAOA-style search protocol that starts in the ground state of the mixing unitaries.
Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of Fake News produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification. In this paper, instead, we address the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines. In particular, we focus on summarization approaches over unstructured knowledge (i.e. news articles) and we experiment with several extractive and abstractive strategies. We employed two datasets with different styles and structures, in order to assess the generalizability of our findings. Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances. Finally, we show that although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.
The computation of approximating e^tA B, where A is a large sparse matrix and B is a rectangular matrix, serves as a crucial element in numerous scientific and engineering calculations. A powerful way to consider this problem is to use Krylov subspace methods. The purpose of this work is to approximate the matrix exponential and some Cauchy-Stieltjes functions on a block vectors B of R^n*p using a rational block Lanczos algorithm. We also derive some error estimates and error bound for the convergence of the rational approximation and finally numerical results attest to the computational efficiency of the proposed method.
It is shown that the psychometric test reliability, based on any true-score model with randomly sampled items and conditionally independent errors, converges to 1 as the test length goes to infinity, assuming some fairly general regularity conditions. The asymptotic rate of convergence is given by the Spearman-Brown formula, and for this it is not needed that the items are parallel, or latent unidimensional, or even finite dimensional. Simulations with the 2-parameter logistic item response theory model reveal that there can be a positive bias in the reliability of short multidimensional tests, meaning that applying the Spearman-Brown formula in these cases would lead to overprediction of the reliability that will result from lengthening the tests. For short unidimensional tests under the 2-parameter logistic model the reliabilities are almost unbiased, meaning that application of the Spearman-Brown formula in these cases leads to predictions that are approximately unbiased.
We study the problem of estimating the derivatives of a regression function, which has a wide range of applications as a key nonparametric functional of unknown functions. Standard analysis may be tailored to specific derivative orders, and parameter tuning remains a daunting challenge particularly for high-order derivatives. In this article, we propose a simple plug-in kernel ridge regression (KRR) estimator in nonparametric regression with random design that is broadly applicable for multi-dimensional support and arbitrary mixed-partial derivatives. We provide a non-asymptotic analysis to study the behavior of the proposed estimator in a unified manner that encompasses the regression function and its derivatives, leading to two error bounds for a general class of kernels under the strong $L_\infty$ norm. In a concrete example specialized to kernels with polynomially decaying eigenvalues, the proposed estimator recovers the minimax optimal rate up to a logarithmic factor for estimating derivatives of functions in H\"older and Sobolev classes. Interestingly, the proposed estimator achieves the optimal rate of convergence with the same choice of tuning parameter for any order of derivatives. Hence, the proposed estimator enjoys a \textit{plug-in property} for derivatives in that it automatically adapts to the order of derivatives to be estimated, enabling easy tuning in practice. Our simulation studies show favorable finite sample performance of the proposed method relative to several existing methods and corroborate the theoretical findings on its minimax optimality.
We prove the first unconditional consistency result for superpolynomial circuit lower bounds with a relatively strong theory of bounded arithmetic. Namely, we show that the theory V$^0_2$ is consistent with the conjecture that NEXP $\not\subseteq$ P/poly, i.e., some problem that is solvable in non-deterministic exponential time does not have polynomial size circuits. We suggest this is the best currently available evidence for the truth of the conjecture. The same techniques establish the same results with NEXP replaced by the class of problems that are decidable in non-deterministic barely superpolynomial time such as NTIME$(n^{O(\log\log\log n)})$. Additionally, we establish a magnification result on the hardness of proving circuit lower bounds.
We provide methods which recover planar scene geometry by utilizing the transient histograms captured by a class of close-range time-of-flight (ToF) distance sensor. A transient histogram is a one dimensional temporal waveform which encodes the arrival time of photons incident on the ToF sensor. Typically, a sensor processes the transient histogram using a proprietary algorithm to produce distance estimates, which are commonly used in several robotics applications. Our methods utilize the transient histogram directly to enable recovery of planar geometry more accurately than is possible using only proprietary distance estimates, and consistent recovery of the albedo of the planar surface, which is not possible with proprietary distance estimates alone. This is accomplished via a differentiable rendering pipeline, which simulates the transient imaging process, allowing direct optimization of scene geometry to match observations. To validate our methods, we capture 3,800 measurements of eight planar surfaces from a wide range of viewpoints, and show that our method outperforms the proprietary-distance-estimate baseline by an order of magnitude in most scenarios. We demonstrate a simple robotics application which uses our method to sense the distance to and slope of a planar surface from a sensor mounted on the end effector of a robot arm.
Kuiper's statistic is a good measure for the difference of ideal distribution and empirical distribution in the goodness-of-fit test. However, it is a challenging problem to solve the critical value and upper tail quantile, or simply Kuiper pair, of Kuiper's statistics due to the difficulties of solving the nonlinear equation and reasonable approximation of infinite series. The pioneering work by Kuiper just provided the key ideas and few numerical tables created from the upper tail probability $\alpha$ and sample capacity $n$, which limited its propagation and possible applications in various fields since there are infinite configurations for the parameters $\alpha$ and $n$. In this work, the contributions lie in two perspectives: firstly, the second order approximation for the infinite series of the cumulative distribution of the critical value is used to achieve higher precision; secondly, the principles and fixed-point algorithms for solving the Kuiper pair are presented with details. The algorithms are verified and validated by comparing with the table provided by Kuiper. The methods and algorithms proposed are enlightening and worthy of introducing to the college students, computer programmers, engineers, experimental psychologists and so on.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.