A matchstick graph is a plane graph with edges drawn as unit-distance line segments. Harborth introduced these graphs in 1981 and conjectured that the maximum number of edges for a matchstick graph on $n$ vertices is $\lfloor 3n-\sqrt{12n-3} \rfloor$. In this paper we prove this conjecture for all $n\geq 1$. The main geometric ingredient of the proof is an isoperimetric inequality related to L'Huilier's inequality.
We present a novel unsupervised machine learning shock capturing algorithm based on Gaussian Mixture Models (GMMs). The proposed GMM sensor demonstrates remarkable accuracy in detecting shocks and is robust across diverse test cases without the need for parameter tuning. We compare the GMM-based sensor with state-of-the-art alternatives. All methods are integrated into a high-order compressible discontinuous Galerkin solver where artificial viscosity can be modulated to capture shocks. Supersonic test cases, including high Reynolds numbers, showcase the sensor's performance, demonstrating the same effectiveness as fine-tuned state-of-the-art sensors. %The nodal DG aproach allows for potential applications in sub-cell flux-differencing formulations, supersonic feature detection, and mesh refinement. The adaptive nature and ability to function without extensive training datasets make this GMM-based sensor suitable for complex geometries and varied flow configurations. Our study reveals the potential of unsupervised machine learning methods, exemplified by the GMM sensor, to improve the robustness and efficiency of advanced CFD codes.
We propose a numerical algorithm for computing approximately optimal solutions of the matching for teams problem. Our algorithm is efficient for problems involving a large number of agent categories and allows for the measures describing the agent types to be non-discrete. Specifically, we parametrize the so-called transfer functions and develop a parametric version of the dual formulation. Our algorithm tackles this parametric formulation and produces feasible and approximately optimal solutions for the primal and dual formulations of the matching for teams problem. These solutions also yield upper and lower bounds for the optimal value, and the difference between the upper and lower bounds provides a direct sub-optimality estimate of the computed solutions. Moreover, we are able to control a theoretical upper bound on the sub-optimality to be arbitrarily close to 0 under mild conditions. We subsequently prove that the approximate primal and dual solutions converge when the sub-optimality goes to 0 and their limits constitute a true matching equilibrium. Thus, the outputs of our algorithm are regarded as an approximate matching equilibrium. We also analyze the theoretical computational complexity of our parametric formulation as well as the sparsity of the resulting approximate matching equilibrium. Through numerical experiments, we showcase that the proposed algorithm can produce high-quality approximate matching equilibria and is applicable to versatile settings, including a high-dimensional setting involving 100 agent categories.
We develop two new sets of stable, rank-adaptive Dynamically Orthogonal Runge-Kutta (DORK) schemes that capture the high-order curvature of the nonlinear low-rank manifold. The DORK schemes asymptotically approximate the truncated singular value decomposition at a greatly reduced cost while preserving mode continuity using newly derived retractions. We show that arbitrarily high-order optimal perturbative retractions can be obtained, and we prove that these new retractions are stable. In addition, we demonstrate that repeatedly applying retractions yields a gradient-descent algorithm on the low-rank manifold that converges superlinearly when approximating a low-rank matrix. When approximating a higher-rank matrix, iterations converge linearly to the best low-rank approximation. We then develop a rank-adaptive retraction that is robust to overapproximation. Building off of these retractions, we derive two rank-adaptive integration schemes that dynamically update the subspace upon which the system dynamics are projected within each time step: the stable, optimal Dynamically Orthogonal Runge-Kutta (so-DORK) and gradient-descent Dynamically Orthogonal Runge-Kutta (gd-DORK) schemes. These integration schemes are numerically evaluated and compared on an ill-conditioned matrix differential equation, an advection-diffusion partial differential equation, and a nonlinear, stochastic reaction-diffusion partial differential equation. Results show a reduced error accumulation rate with the new stable, optimal and gradient-descent integrators. In addition, we find that rank adaptation allows for highly accurate solutions while preserving computational efficiency.
In this article we design a finite volume semi-implicit IMEX scheme for the incompressible Navier-Stokes equations on evolving Chimera meshes. We employ a time discretization technique that separates explicit and implicit terms which encompass both slow and fast scales. The finite volume approach for both explicit and implicit terms allows to encode into the nonlinear flux the velocity of displacement of the Chimera mesh via integration on moving cells. To attain second-order time accuracy, we employ semi-implicit IMEX Runge-Kutta schemes. These novel schemes are combined with a fractional-step method, thus the governing equations are eventually solved using a projection method to satisfy the divergence-free constraint of the velocity field. The implicit discretization of the viscous terms allows the CFL-type stability condition for the maximum admissible time step to be only defined by the relative fluid velocity referred to the movement of the frame and not depending also on the viscous eigenvalues. Communication between different grid blocks is enabled through compact exchange of information from the fringe cells of one mesh block to the field cells of the other block. Taking advantage of the continuity of the solution and the definition of a minimal compact stencil, the numerical solution of any system of differential equations is characterized by continuous data extrapolation. Free-stream preservation property, i.e. compliance with the Geometric Conservation Law (GCL), is respected. The accuracy and capabilities of the new numerical schemes is proved through an extensive range of test cases, demonstrating ability to solve relevant benchmarks in the field of incompressible fluids.
A lattice of integers is the collection of all linear combinations of a set of vectors for which all entries of the vectors are integers and all coefficients in the linear combinations are also integers. Lattice reduction refers to the problem of finding a set of vectors in a given lattice such that the collection of all integer linear combinations of this subset is still the entire original lattice and so that the Euclidean norms of the subset are reduced. The present paper proposes simple, efficient iterations for lattice reduction which are guaranteed to reduce the Euclidean norms of the basis vectors (the vectors in the subset) monotonically during every iteration. Each iteration selects the basis vector for which projecting off (with integer coefficients) the components of the other basis vectors along the selected vector minimizes the Euclidean norms of the reduced basis vectors. Each iteration projects off the components along the selected basis vector and efficiently updates all information required for the next iteration to select its best basis vector and perform the associated projections.
In this paper, we propose a differentiable version of the short-time Fourier transform (STFT) that allows for gradient-based optimization of the hop length or the frame temporal position by making these parameters continuous. Our approach provides improved control over the temporal positioning of frames, as the continuous nature of the hop length allows for a more finely-tuned optimization. Furthermore, our contribution enables the use of optimization methods such as gradient descent, which are more computationally efficient than conventional discrete optimization methods. Our differentiable STFT can also be easily integrated into existing algorithms and neural networks. We present a simulated illustration to demonstrate the efficacy of our approach and to garner interest from the research community.
This paper surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years. We group these approaches into three categories: ($i$) VLP for image-text tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding; ($ii$) VLP for core computer vision tasks, such as (open-set) image classification, object detection, and segmentation; and ($iii$) VLP for video-text tasks, such as video captioning, video-text retrieval, and video question answering. For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies. In addition, for each category, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.
In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey on VLP. We hope that this survey can shed light on future research in the VLP field.
Video captioning is a challenging task that requires a deep understanding of visual scenes. State-of-the-art methods generate captions using either scene-level or object-level information but without explicitly modeling object interactions. Thus, they often fail to make visually grounded predictions, and are sensitive to spurious correlations. In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid unstable performance caused by the variable number of objects, we further propose an object-aware knowledge distillation mechanism, in which local object information is used to regularize global scene features. We demonstrate the efficacy of our approach through extensive experiments on two benchmarks, showing our approach yields competitive performance with interpretable predictions.
Training a deep architecture using a ranking loss has become standard for the person re-identification task. Increasingly, these deep architectures include additional components that leverage part detections, attribute predictions, pose estimators and other auxiliary information, in order to more effectively localize and align discriminative image regions. In this paper we adopt a different approach and carefully design each component of a simple deep architecture and, critically, the strategy for training it effectively for person re-identification. We extensively evaluate each design choice, leading to a list of good practices for person re-identification. By following these practices, our approach outperforms the state of the art, including more complex methods with auxiliary components, by large margins on four benchmark datasets. We also provide a qualitative analysis of our trained representation which indicates that, while compact, it is able to capture information from localized and discriminative regions, in a manner akin to an implicit attention mechanism.