SCoTLASS is the first sparse principal component analysis (SPCA) model which imposes extra l1 norm constraints on the measured variables to obtain sparse loadings. Due to the the difficulty of finding projections on the intersection of an l1 ball/sphere and an l2 ball/sphere, early approaches to solving the SCoTLASS problems were focused on penalty function methods or conditional gradient methods. In this paper, we re-examine the SCoTLASS problems, denoted by SPCA-P1, SPCA-P2 or SPCA-P3 when using the intersection of an l1 ball and an l2 ball, an l1 sphere and an l2 sphere, or an l1 ball and an l2 sphere as constrained set, respectively. We prove the equivalence of the solutions to SPCA-P1 and SPCA-P3, and the solutions to SPCA-P2 and SPCA-P3 are the same in most case. Then by employing the projection method onto the intersection of an l1 ball/sphere and an l2 ball/sphere, we design a gradient projection method (GPSPCA for short) and an approximate Newton algorithm (ANSPCA for short) for SPCA-P1, SPCA-P2 and SPCA-P3 problems, and prove the global convergence of the proposed GPSPCA and ANSPCA algorithms. Finally, we conduct several numerical experiments in MATLAB environment to evaluate the performance of our proposed GPSPCA and ANSPCA algorithms. Simulation results confirm the assertions that the solutions to SPCA-P1 and SPCA-P3 are the same, and the solutions to SPCA-P2 and SPCA-P3 are the same in most case, and show that ANSPCA is faster than GPSPCA for large-scale data. Furthermore, GPSPCA and ANSPCA perform well as a whole comparing with the typical SPCA methods: the l0-constrained GPBB algorithm, the l1-constrained BCD-SPCAl1 algorithm, the l1-penalized ConGradU and Gpowerl1 algorithms, and can be used for large-scale computation.
We study various aspects of the first-order transduction quasi-order, which provides a way of measuring the relative complexity of classes of structures based on whether one can encode the other using a formula of first-order (FO) logic. In contrast with the conjectured simplicity of the transduction quasi-order for monadic second-order logic, the FO-transduction quasi-order is very complex; in particular, we prove that the quotient partial order is not a lattice, although it is a bounded distributive join-semilattice, as is the subposet of additive classes. Many standard properties from structural graph theory and model theory naturally appear in this quasi-order. For example, we characterize transductions of paths, cubic graphs, and cubic trees in terms of bandwidth, bounded degree, and treewidth. We establish that the classes of all graphs with pathwidth at most~$k$, for $k\geq 1$, form a strict hierarchy in the FO-transduction quasi-order and leave open whether same is true for treewidth. This leads to considering whether properties admit maximum or minimum classes in this quasi-order. We prove that many properties do not admit a maximum class, and that star forests are the minimum class that is not a transduction of a class with bounded degree, which can be seen as an instance of transduction duality. We close with a notion of dense analogues of sparse classes, and discuss several related conjectures. As a ubiquitous tool in our results, we prove a normal form for FO-transductions that manifests the locality of FO logic. This is among several other technical results about FO-transductions which we anticipate being broadly useful.
Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex constrained optimization that sequentially minimizes majorizing surrogates of the objective function in each block coordinate while the other coordinates are held fixed. BMM entails a large class of optimization algorithms such as block coordinate descent and its proximal-point variant, expectation-minimization, and block projected gradient descent. We establish that for general constrained nonconvex optimization, BMM with strongly convex surrogates can produce an $\epsilon$-stationary point within $O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$ iterations and asymptotically converges to the set of stationary points. Furthermore, we propose a trust-region variant of BMM that can handle surrogates that are only convex and still obtain the same iteration complexity and asymptotic stationarity. These results hold robustly even when the convex sub-problems are inexactly solved as long as the optimality gaps are summable. As an application, we show that a regularized version of the celebrated multiplicative update algorithm for nonnegative matrix factorization by Lee and Seung has iteration complexity of $O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$. The same result holds for a wide class of regularized nonnegative tensor decomposition algorithms as well as the classical block projected gradient descent algorithm. These theoretical results are validated through various numerical experiments.
This paper aims to reconstruct the initial condition of a hyperbolic equation with an unknown damping coefficient. Our approach involves approximating the hyperbolic equation's solution by its truncated Fourier expansion in the time domain and using a polynomial-exponential basis. This truncation process facilitates the elimination of the time variable, consequently, yielding a system of quasi-linear elliptic equations. To globally solve the system without needing an accurate initial guess, we employ the Carleman contraction principle. We provide several numerical examples to illustrate the efficacy of our method. The method not only delivers precise solutions but also showcases remarkable computational efficiency.
In many scientific applications the aim is to infer a function which is smooth in some areas, but rough or even discontinuous in other areas of its domain. Such spatially inhomogeneous functions can be modelled in Besov spaces with suitable integrability parameters. In this work we study adaptive Bayesian inference over Besov spaces, in the white noise model from the point of view of rates of contraction, using $p$-exponential priors, which range between Laplace and Gaussian and possess regularity and scaling hyper-parameters. To achieve adaptation, we employ empirical and hierarchical Bayes approaches for tuning these hyper-parameters. Our results show that, while it is known that Gaussian priors can attain the minimax rate only in Besov spaces of spatially homogeneous functions, Laplace priors attain the minimax or nearly the minimax rate in both Besov spaces of spatially homogeneous functions and Besov spaces permitting spatial inhomogeneities.
The design of automatic speech pronunciation assessment can be categorized into closed and open response scenarios, each with strengths and limitations. A system with the ability to function in both scenarios can cater to diverse learning needs and provide a more precise and holistic assessment of pronunciation skills. In this study, we propose a Multi-task Pronunciation Assessment model called MultiPA. MultiPA provides an alternative to Kaldi-based systems in that it has simpler format requirements and better compatibility with other neural network models. Compared with previous open response systems, MultiPA provides a wider range of evaluations, encompassing assessments at both the sentence and word-level. Our experimental results show that MultiPA achieves comparable performance when working in closed response scenarios and maintains more robust performance when directly used for open responses.
We study the power of randomness in the Number-on-Forehead (NOF) model in communication complexity. We construct an explicit 3-player function $f:[N]^3 \to \{0,1\}$, such that: (i) there exist a randomized NOF protocol computing it that sends a constant number of bits; but (ii) any deterministic or nondeterministic NOF protocol computing it requires sending about $(\log N)^{1/3}$ many bits. This exponentially improves upon the previously best-known such separation. At the core of our proof is an extension of a recent result of the first and third authors on sets of integers without 3-term arithmetic progressions into a non-arithmetic setting.
Accurately estimating parameters in complex nonlinear systems is crucial across scientific and engineering fields. We present a novel approach for parameter estimation using a neural network with the Huber loss function. This method taps into deep learning's abilities to uncover parameters governing intricate behaviors in nonlinear equations. We validate our approach using synthetic data and predefined functions that model system dynamics. By training the neural network with noisy time series data, it fine-tunes the Huber loss function to converge to accurate parameters. We apply our method to damped oscillators, Van der Pol oscillators, Lotka-Volterra systems, and Lorenz systems under multiplicative noise. The trained neural network accurately estimates parameters, evident from closely matching latent dynamics. Comparing true and estimated trajectories visually reinforces our method's precision and robustness. Our study underscores the Huber loss-guided neural network as a versatile tool for parameter estimation, effectively uncovering complex relationships in nonlinear systems. The method navigates noise and uncertainty adeptly, showcasing its adaptability to real-world challenges.
We consider the solution of large stiff systems of ordinary differential equations with explicit exponential Runge--Kutta integrators. These problems arise from semi-discretized semi-linear parabolic partial differential equations on continuous domains or on inherently discrete graph domains. A series of results reduces the requirement of computing linear combinations of $\varphi$-functions in exponential integrators to the approximation of the action of a smaller number of matrix exponentials on certain vectors. State-of-the-art computational methods use polynomial Krylov subspaces of adaptive size for this task. They have the drawback that the required number of Krylov subspace iterations to obtain a desired tolerance increase drastically with the spectral radius of the discrete linear differential operator, e.g., the problem size. We present an approach that leverages rational Krylov subspace methods promising superior approximation qualities. We prove a novel a-posteriori error estimate of rational Krylov approximations to the action of the matrix exponential on vectors for single time points, which allows for an adaptive approach similar to existing polynomial Krylov techniques. We discuss pole selection and the efficient solution of the arising sequences of shifted linear systems by direct and preconditioned iterative solvers. Numerical experiments show that our method outperforms the state of the art for sufficiently large spectral radii of the discrete linear differential operators. The key to this are approximately constant numbers of rational Krylov iterations, which enable a near-linear scaling of the runtime with respect to the problem size.
Many economic panel and dynamic models, such as rational behavior and Euler equations, imply that the parameters of interest are identified by conditional moment restrictions with high dimensional conditioning instruments. We develop a novel inference method for the parameters identified by conditional moment restrictions, where the dimension of the conditioning instruments is high and there is no prior information about which conditioning instruments are weak or irrelevant. Building on Bierens (1990), we propose penalized maximum statistics and combine bootstrap inference with model selection. Our method optimizes the asymptotic power against a set of $n^{-1/2}$-local alternatives of interest by solving a data-dependent max-min problem for tuning parameter selection. We demonstrate the efficacy of our method by two empirical examples: the elasticity of intertemporal substitution and rational unbiased reporting of ability status. Extensive Monte Carlo experiments based on the first empirical example show that our inference procedure is superior to those available in the literature in realistic settings.
In recent decades, a growing number of discoveries in fields of mathematics have been assisted by computer algorithms, primarily for exploring large parameter spaces that humans would take too long to investigate. As computers and algorithms become more powerful, an intriguing possibility arises - the interplay between human intuition and computer algorithms can lead to discoveries of novel mathematical concepts that would otherwise remain elusive. To realize this perspective, we have developed a massively parallel computer algorithm that discovers an unprecedented number of continued fraction formulas for fundamental mathematical constants. The sheer number of formulas discovered by the algorithm unveils a novel mathematical structure that we call the conservative matrix field. Such matrix fields (1) unify thousands of existing formulas, (2) generate infinitely many new formulas, and most importantly, (3) lead to unexpected relations between different mathematical constants, including multiple integer values of the Riemann zeta function. Conservative matrix fields also enable new mathematical proofs of irrationality. In particular, we can use them to generalize the celebrated proof by Ap\'ery for the irrationality of $\zeta(3)$. Utilizing thousands of personal computers worldwide, our computer-supported research strategy demonstrates the power of experimental mathematics, highlighting the prospects of large-scale computational approaches to tackle longstanding open problems and discover unexpected connections across diverse fields of science.