This paper proposes a new multilinear projection method for dimension-reduction in modeling high-dimensional matrix-variate time series. It assumes that a $p_1\times p_2$ matrix-variate time series consists of a dynamically dependent, lower-dimensional matrix-variate factor process and a $p_1\times p_2$ matrix white noise series. Covariance matrix of the vectorized white noises assumes a Kronecker structure such that the row and column covariances of the noise all have diverging/spiked eigenvalues to accommodate the case of low signal-to-noise ratio often encountered in applications, such as in finance and economics. We use an iterative projection procedure to {reduce the dimensions and noise effects in estimating} front and back loading matrices and {to} obtain faster convergence rates than those of the traditional methods available in the literature. Furthermore, we introduce a two-way projected Principal Component Analysis to mitigate the diverging noise effects, and implement a high-dimensional white-noise testing procedure to estimate the dimension of the factor matrix. Asymptotic properties of the proposed method are established as the dimensions and sample size go to infinity. Simulated and real examples are used to assess the performance of the proposed method. We also compared the proposed method with some existing ones in the literature concerning the forecasting ability of the identified factors and found that the proposed approach fares well in out-of-sample forecasting.
Due to the limited availability of data, existing few-shot learning methods trained from scratch fail to achieve satisfactory performance. In contrast, large-scale pre-trained models such as CLIP demonstrate remarkable few-shot and zero-shot capabilities. To enhance the performance of pre-trained models for downstream tasks, fine-tuning the model on downstream data is frequently necessary. However, fine-tuning the pre-trained model leads to a decrease in its generalizability in the presence of distribution shift, while the limited number of samples in few-shot learning makes the model highly susceptible to overfitting. Consequently, existing methods for fine-tuning few-shot learning primarily focus on fine-tuning the model's classification head or introducing additional structure. In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align). Our method aims to bolster the model's generalizability by preserving the consistency of spurious features across the fine-tuning process. Extensive experimental results validate the efficacy of our approach for both ID and OOD tasks. Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements. Our code can be found in //github.com/skingorz/FD-Align.
This paper proposes a new method for differentiating through optimal trajectories arising from non-convex, constrained discrete-time optimal control (COC) problems using the implicit function theorem (IFT). Previous works solve a differential Karush-Kuhn-Tucker (KKT) system for the trajectory derivative, and achieve this efficiently by solving an auxiliary Linear Quadratic Regulator (LQR) problem. In contrast, we directly evaluate the matrix equations which arise from applying variable elimination on the Lagrange multiplier terms in the (differential) KKT system. By appropriately accounting for the structure of the terms within the resulting equations, we show that the trajectory derivatives scale linearly with the number of timesteps. Furthermore, our approach allows for easy parallelization, significantly improved scalability with model size, direct computation of vector-Jacobian products and improved numerical stability compared to prior works. As an additional contribution, we unify prior works, addressing claims that computing trajectory derivatives using IFT scales quadratically with the number of timesteps. We evaluate our method on a both synthetic benchmark and four challenging, learning from demonstration benchmarks including a 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
This study analyzes the nonasymptotic convergence behavior of the quasi-Monte Carlo (QMC) method with applications to linear elliptic partial differential equations (PDEs) with lognormal coefficients. Building upon the error analysis presented in (Owen, 2006), we derive a nonasymptotic convergence estimate depending on the specific integrands, the input dimensionality, and the finite number of samples used in the QMC quadrature. We discuss the effects of the variance and dimensionality of the input random variable. Then, we apply the QMC method with importance sampling (IS) to approximate deterministic, real-valued, bounded linear functionals that depend on the solution of a linear elliptic PDE with a lognormal diffusivity coefficient in bounded domains of $\mathbb{R}^d$, where the random coefficient is modeled as a stationary Gaussian random field parameterized by the trigonometric and wavelet-type basis. We propose two types of IS distributions, analyze their effects on the QMC convergence rate, and observe the improvements.
This paper explores a fine-grained version of the Watrous conjecture, including the randomized and quantum algorithms with success probabilities arbitrarily close to $1/2$. Our contributions include the following: i) An analysis of the optimal success probability of quantum and randomized query algorithms of two fundamental partial symmetric Boolean functions given a fixed number of queries. We prove that for any quantum algorithm computing these two functions using $T$ queries, there exist randomized algorithms using $\mathsf{poly}(T)$ queries that achieve the same success probability as the quantum algorithm, even if the success probability is arbitrarily close to 1/2. ii) We establish that for any total symmetric Boolean function $f$, if a quantum algorithm uses $T$ queries to compute $f$ with success probability $1/2+\beta$, then there exists a randomized algorithm using $O(T^2)$ queries to compute $f$ with success probability $1/2+\Omega(\delta\beta^2)$ on a $1-\delta$ fraction of inputs, where $\beta,\delta$ can be arbitrarily small positive values. As a corollary, we prove a randomized version of Aaronson-Ambainis Conjecture for total symmetric Boolean functions in the regime where the success probability of algorithms can be arbitrarily close to 1/2. iii) We present polynomial equivalences for several fundamental complexity measures of partial symmetric Boolean functions. Specifically, we first prove that for certain partial symmetric Boolean functions, quantum query complexity is at most quadratic in approximate degree for any error arbitrarily close to 1/2. Next, we show exact quantum query complexity is at most quadratic in degree. Additionally, we give the tight bounds of several complexity measures, indicating their polynomial equivalence.
We introduce fluctuating hydrodynamics approaches on surfaces for capturing the drift-diffusion dynamics of particles and microstructures immersed within curved fluid interfaces of spherical shape. We take into account the interfacial hydrodynamic coupling, traction coupling with the surrounding bulk fluid, and thermal fluctuations. For fluid-structure interactions, we introduce Immersed Boundary Methods (IBM) and related Stochastic Eulerian-Lagrangian Methods (SELM) for curved surfaces. We use these approaches to investigate the statistics of surface fluctuating hydrodynamics and microstructures. For velocity autocorrelations, we find characteristic power-law scalings $\tau^{-1}$, $\tau^{-2}$, and plateaus can emerge. This depends on the physical regime associated with the geometry, surface viscosity, and bulk viscosity. This differs from the characteristic $\tau^{-3/2}$ scaling for bulk three dimensional fluids. We develop theory explaining these observed power-laws associated with time-scales for dissipation within the fluid interface and coupling to the surrounding fluid. We then use our introduced methods to investigate a few example systems and roles of hydrodynamic coupling and thermal fluctuations including for the kinetics of passive particles and active microswimmers in curved fluid interfaces.
Spectral clustering and its extensions usually consist of two steps: (1) constructing a graph and computing the relaxed solution; (2) discretizing relaxed solutions. Although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. Unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. In other words, the primary drawback is the neglect of the original objective when computing the discrete solution. Inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. Since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. We also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. Sufficient experiments significantly show the superiority of our method.
In this paper, a new method is given for counting cycles in the Tanner graph of a (Type-I) quasi-cyclic (QC) low-density parity-check (LDPC) code which the complexity mainly is dependent on the base matrix, independent from the CPM-size of the constructed code. Interestingly, for large CPM-sizes, in comparison of the existing methods, this algorithm is the first approach which efficiently counts the cycles in the Tanner graphs of QC-LDPC codes. In fact, the algorithm recursively counts the cycles in the parity-check matrix column-by-column by finding all non-isomorph tailless backtrackless closed (TBC) walks in the base graph and enumerating theoretically their corresponding cycles in the same equivalent class. Moreover, this approach can be modified in few steps to find the cycle distributions of a class of LDPC codes based on Affine permutation matrices (APM-LDPC codes). Interestingly, unlike the existing methods which count the cycles up to $2g-2$, where $g$ is the girth, the proposed algorithm can be used to enumerate the cycles of arbitrary length in the Tanner graph. Moreover, the proposed cycle searching algorithm improves upon various previously known methods, in terms of computational complexity and memory requirements.
Most multilingual vision-and-language (V&L) research aims to accomplish multilingual and multimodal capabilities within one model. However, the scarcity of multilingual captions for images has hindered the development. To overcome this obstacle, we propose ICU, Image Caption Understanding, which divides a V&L task into two stages: a V&L model performs image captioning in English, and a multilingual language model (mLM), in turn, takes the caption as the alt text and performs crosslingual language understanding. The burden of multilingual processing is lifted off V&L model and placed on mLM. Since the multilingual text data is relatively of higher abundance and quality, ICU can facilitate the conquering of language barriers for V&L models. In experiments on two tasks across 9 languages in the IGLUE benchmark, we show that ICU can achieve new state-of-the-art results for five languages, and comparable results for the rest.
We propose a dependence-aware predictive modeling framework for multivariate risks stemmed from an insurance contract with bundling features - an important type of policy increasingly offered by major insurance companies. The bundling feature naturally leads to longitudinal measurements of multiple insurance risks, and correct pricing and management of such risks is of fundamental interest to financial stability of the macroeconomy. We build a novel predictive model that fully captures the dependence among the multivariate repeated risk measurements. Specifically, the longitudinal measurement of each individual risk is first modeled using pair copula construction with a D-vine structure, and the multiple D-vines are then integrated by a flexible copula. The proposed model provides a unified modeling framework for multivariate longitudinal data that can accommodate different scales of measurements, including continuous, discrete, and mixed observations, and thus can be potentially useful for various economic studies. A computationally efficient sequential method is proposed for model estimation and inference, and its performance is investigated both theoretically and via simulation studies. In the application, we examine multivariate bundled risks in multi-peril property insurance using proprietary data from a commercial property insurance provider. The proposed model is found to provide improved decision making for several key insurance operations. For underwriting, we show that the experience rate priced by the proposed model leads to a 9% lift in the insurer's net revenue. For reinsurance, we show that the insurer underestimates the risk of the retained insurance portfolio by 10% when ignoring the dependence among bundled insurance risks.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.