We provide a unified analysis of two-timescale gradient descent ascent (TTGDA) for solving structured nonconvex minimax optimization problems in the form of $\min_\textbf{x} \max_{\textbf{y} \in Y} f(\textbf{x}, \textbf{y})$, where the objective function $f(\textbf{x}, \textbf{y})$ is nonconvex in $\textbf{x}$ and concave in $\textbf{y}$, and the constraint set $Y \subseteq \mathbb{R}^n$ is convex and bounded. In the convex-concave setting, the single-timescale gradient descent ascent (GDA) algorithm is widely used in applications and has been shown to have strong convergence guarantees. In more general settings, however, it can fail to converge. Our contribution is to design TTGDA algorithms that are effective beyond the convex-concave setting, efficiently finding a stationary point of the function $\Phi(\cdot) := \max_{\textbf{y} \in Y} f(\cdot, \textbf{y})$. We also establish theoretical bounds on the complexity of solving both smooth and nonsmooth nonconvex-concave minimax optimization problems. To the best of our knowledge, this is the first systematic analysis of TTGDA for nonconvex minimax optimization, shedding light on its superior performance in training generative adversarial networks (GANs) and in other real-world application problems.
This work solves an open problem regarding the rate of time-bounded Kolmogorov complexity and polynomial-time dimension, conditioned on a hardness assumption. Hitchcock and Vinodchandran (CCC 2004) show that the polynomial-time dimension of infinite sequences (denoted ${\mathrm{cdim}}_\mathrm{P}$) defined using betting algorithms called gales, is lower bounded by the asymptotic lower rate of polynomial-time Kolmogorov complexity (denoted $\mathcal{K}_\text{poly}$). Hitchcock and Vindochandran and Stull asked whether the converse relationship also holds. This question has thus far resisted resolution. The corresponding unbounded notions, namely, the constructive dimension and the asymptotic lower rate of unbounded Kolmogorov complexity are equal for every sequence. Analogous notions are equal even at finite-state level. In view of these results, it is reasonable to conjecture that the polynomial-time quantities are identical for every sequence and set of sequences. However, under a plausible assumption which underlies modern cryptography - namely the existence of one-way functions, we refute the conjecture thereby giving a negative answer to the open question posed by Hitchcock and Vinodchandran and Stull . We show the following, conditioned on the existence of one-way functions: There are sets $\mathcal{F}$ of infinite sequences whose polytime dimension strictly exceeds $\mathcal{K}_\text{poly}(\mathcal{F})$, that is ${\mathrm{cdim}}_\mathrm{P}(\mathcal{F}) > \mathcal{K}_\text{poly}(\mathcal{F})$. We establish a stronger version of this result, that there are individual sequences $X$ whose poly-time dimension strictly exceeds $\mathcal{K}_\text{poly}(X)$, that is ${\mathrm{cdim}}_\mathrm{P}(X) > \mathcal{K}_\text{poly}(X)$. Further, we show that the gap between these quantities can be made arbitrarily close to 1. We also establish similar bounds for strong poly-time dimension
This paper proposes a novel collocation-type numerical stochastic homogenization method for prototypical stochastic homogenization problems with random coefficient fields of small correlation lengths. The presented method is based on a recently introduced localization technique that enforces a super-exponential decay of the basis functions relative to the underlying coarse mesh, resulting in considerable computational savings during the sampling phase. More generally, the collocation-type structure offers a particularly simple and computationally efficient construction in the stochastic setting with minimized communication between the patches where the basis functions of the method are computed. An error analysis that bridges numerical homogenization and the quantitative theory of stochastic homogenization is performed. In a series of numerical experiments, we study the effect of the correlation length and the discretization parameters on the approximation quality of the method.
Endoscopic surgery relies on two-dimensional views, posing challenges for surgeons in depth perception and instrument manipulation. While Monocular Visual Simultaneous Localization and Mapping (MVSLAM) has emerged as a promising solution, its implementation in endoscopic procedures faces significant challenges due to hardware limitations, such as the use of a monocular camera and the absence of odometry sensors. This study presents BodySLAM, a robust deep learning-based MVSLAM approach that addresses these challenges through three key components: CycleVO, a novel unsupervised monocular pose estimation module; the integration of the state-of-the-art Zoe architecture for monocular depth estimation; and a 3D reconstruction module creating a coherent surgical map. The approach is rigorously evaluated using three publicly available datasets (Hamlyn, EndoSLAM, and SCARED) spanning laparoscopy, gastroscopy, and colonoscopy scenarios, and benchmarked against four state-of-the-art methods. Results demonstrate that CycleVO exhibited competitive performance with the lowest inference time among pose estimation methods, while maintaining robust generalization capabilities, whereas Zoe significantly outperformed existing algorithms for depth estimation in endoscopy. BodySLAM's strong performance across diverse endoscopic scenarios demonstrates its potential as a viable MVSLAM solution for endoscopic applications.
We present a novel algorithm for real-time planar semantic mapping tailored for humanoid robots navigating complex terrains such as staircases. Our method is adaptable to any odometry input and leverages GPU-accelerated processes for planar extraction, enabling the rapid generation of globally consistent semantic maps. We utilize an anisotropic diffusion filter on depth images to effectively minimize noise from gradient jumps while preserving essential edge details, enhancing normal vector images' accuracy and smoothness. Both the anisotropic diffusion and the RANSAC-based plane extraction processes are optimized for parallel processing on GPUs, significantly enhancing computational efficiency. Our approach achieves real-time performance, processing single frames at rates exceeding $30~Hz$, which facilitates detailed plane extraction and map management swiftly and efficiently. Extensive testing underscores the algorithm's capabilities in real-time scenarios and demonstrates its practical application in humanoid robot gait planning, significantly improving its ability to navigate dynamic environments.
Under some initial and boundary conditions, the rapid reaction-thermal diffusion process taking place during frontal polymerization (FP) destabilizes the planar mode of front propagation, leading to spatially varying, complex hierarchical patterns in thermoset polymeric materials. Although modern reaction-diffusion models can predict the patterns resulting from unstable FP, the inverse design of patterns, which aims to retrieve process conditions that produce a desired pattern, remains an open challenge due to the non-unique and non-intuitive mapping between process conditions and manufactured patterns. In this work, we propose a probabilistic generative model named univariate conditional variational autoencoder (UcVAE) for the inverse design of hierarchical patterns in FP-based manufacturing. Unlike the cVAE, which encodes both the design space and the design target, the UcVAE encodes only the design space. In the encoder of the UcVAE, the number of training parameters is significantly reduced compared to the cVAE, resulting in a shorter training time while maintaining comparable performance. Given desired pattern images, the trained UcVAE can generate multiple process condition solutions that produce high-fidelity hierarchical patterns.
We demonstrate the inter-translatability of proofs between the most prominent sequent-based formalisms for G\"odel-L\"ob provability logic. In particular, we consider Sambin and Valentini's sequent system GLseq, Shamkanov's non-wellfounded and cyclic sequent systems GL$\infty$ and GLcirc, Poggiolesi's tree-hypersequent system CSGL, and Negri's labeled sequent system G3GL. Shamkanov showed how to transform proofs between GLseq, GL$\infty$, and GLcirc, and Gor\'e and Ramanayake showed how to transform proofs between CSGL and G3GL, however, the exact nature of proof transformations between the former three systems and the latter two systems has remained an open problem. We solve this open problem by showing how to restructure tree-hypersequent proofs into an end-active form and introduce a novel linearization technique that transforms such proofs into linear nested sequent proofs. As a result, we obtain a new proof-theoretic tool for extracting linear nested sequent systems from tree-hypersequent systems, which yields the first cut-free linear nested sequent calculus LNGL for G\"odel-L\"ob provability logic. We show how to transform proofs in LNGL into a certain normal form, where proofs repeat in stages of modal and local rule applications, and which are translatable into GLseq and G3GL proofs. These new syntactic transformations, together with those mentioned above, establish full proof-theoretic correspondences between GLseq, GL$\infty$, GLcirc, CSGL, G3GL, and LNGL while also giving (to the best of the author's knowledge) the first constructive proof mappings between structural (viz. labeled, tree-hypersequent, and linear nested sequent) systems and a cyclic sequent system.
This paper introduces a novel staggered discontinuous Galerkin (SDG) method tailored for solving elliptic equations on polytopal meshes. Our approach utilizes a primal-dual grid framework to ensure local conservation of fluxes, significantly improving stability and accuracy. The method is hybridizable and reduces the degrees of freedom compared to existing approaches. It also bridges connections to other numerical methods on polytopal meshes. Numerical experiments validate the method's optimal convergence rates and computational efficiency.
This paper proposes a new multi-linear projection method for denoising and estimation of high-dimensional matrix-variate factor time series. It assumes that a $p_1\times p_2$ matrix-variate time series consists of a dynamically dependent, lower-dimensional matrix-variate factor process and a $p_1\times p_2$ matrix idiosyncratic series. In addition, the latter series assumes a matrix-variate factor structure such that its row and column covariances may have diverging/spiked eigenvalues to accommodate the case of low signal-to-noise ratio often encountered in applications. We use an iterative projection procedure to reduce the dimensions and noise effects in estimating front and back loading matrices and to obtain faster convergence rates than those of the traditional methods available in the literature. We further introduce a two-way projected Principal Component Analysis to mitigate the diverging noise effects, and implement a high-dimensional white-noise testing procedure to estimate the dimension of the matrix factor process. Asymptotic properties of the proposed method are established if the dimensions and sample size go to infinity. We also use simulations and real examples to assess the performance of the proposed method in finite samples and to compare its forecasting ability with some existing ones in the literature. The proposed method fares well in out-of-sample forecasting. In a supplement, we demonstrate the efficacy of the proposed approach even when the idiosyncratic terms exhibit serial correlations with or without a diverging white noise effect.
The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.