We revisit the well-known Curve Shortening Flow for immersed curves in the $d$-dimensional Euclidean space. We exploit a fundamental structure of the problem to derive a new global construction of a solution, that is, a construction that is valid for all times and is insensitive to singularities. The construction is characterized by discretization in time and the approximant, while still exhibiting the possibile formation of finitely many singularities at a finite set of singular times, exists globally and is well behaved and simpler to analyze. A solution of the CSF is obtained in the limit. Estimates for a natural (geometric) norm involving length and total absolute curvature allow passage to the limit. Many classical qualitative results about the flow can be recovered by exploiting the simplicity of the approximant and new ones can be proved. The construction also suggests a numerical procedure for the computation of the flow which proves very effective as demonstrated by a series of numerical experiments scattered throughout the paper.
Transformers have achieved promising results on a variety of tasks. However, the quadratic complexity in self-attention computation has limited the applications, especially in low-resource settings and mobile or edge devices. Existing works have proposed to exploit hand-crafted attention patterns to reduce computation complexity. However, such hand-crafted patterns are data-agnostic and may not be optimal. Hence, it is likely that relevant keys or values are being reduced, while less important ones are still preserved. Based on this key insight, we propose a novel deformable audio Transformer for audio recognition, named DATAR, where a deformable attention equipping with a pyramid transformer backbone is constructed and learnable. Such an architecture has been proven effective in prediction tasks,~\textit{e.g.}, event classification. Moreover, we identify that the deformable attention map computation may over-simplify the input feature, which can be further enhanced. Hence, we introduce a learnable input adaptor to alleviate this issue, and DATAR achieves state-of-the-art performance.
The Kaczmarz algorithm is an iterative method that solves linear systems of equations. It stands out among iterative algorithms when dealing with large systems for two reasons. First, at each iteration, the Kaczmarz algorithm uses a single equation, resulting in minimal computational work per iteration. Second, solving the entire system may only require the use of a small subset of the equations. These characteristics have attracted significant attention to the Kaczmarz algorithm. Researchers have observed that randomly choosing equations can improve the convergence rate of the algorithm. This insight led to the development of the Randomized Kaczmarz algorithm and, subsequently, several other variations emerged. In this paper, we extensively analyze the native Kaczmarz algorithm and many of its variations using large-scale dense random systems as benchmarks. Through our investigation, we have verified that, for consistent systems, various row sampling schemes can outperform both the original and Randomized Kaczmarz method. Specifically, sampling without replacement and using quasirandom numbers are the fastest techniques. However, for inconsistent systems, the Conjugate Gradient method for Least-Squares problems overcomes all variations of the Kaczmarz method for these types of systems.
Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives -- including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially because GPT-4 follows (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at //decodingtrust.github.io/; our dataset can be previewed at //huggingface.co/datasets/AI-Secure/DecodingTrust; a concise version of this work is at //openreview.net/pdf?id=kaHpo8OZw2.
Foundation models (e.g., ChatGPT, DALL-E, PengCheng Mind, PanGu-$\Sigma$) have demonstrated extraordinary performance in key technological areas, such as natural language processing and visual recognition, and have become the mainstream trend of artificial general intelligence. This has led more and more major technology giants to dedicate significant human and financial resources to actively develop their foundation model systems, which drives continuous growth of these models' parameters. As a result, the training and serving of these models have posed significant challenges, including substantial computing power, memory consumption, bandwidth demands, etc. Therefore, employing efficient training and serving strategies becomes particularly crucial. Many researchers have actively explored and proposed effective methods. So, a comprehensive survey of them is essential for system developers and researchers. This paper extensively explores the methods employed in training and serving foundation models from various perspectives. It provides a detailed categorization of these state-of-the-art methods, including finer aspects such as network, computing, and storage. Additionally, the paper summarizes the challenges and presents a perspective on the future development direction of foundation model systems. Through comprehensive discussion and analysis, it hopes to provide a solid theoretical basis and practical guidance for future research and applications, promoting continuous innovation and development in foundation model systems.
Car detection is an important task that serves as a crucial prerequisite for many automated driving functions. The large variations in lighting/weather conditions and vehicle densities of the scenes pose significant challenges to existing car detection algorithms to meet the highly accurate perception demand for safety, due to the unstable/limited color information, which impedes the extraction of meaningful/discriminative features of cars. In this work, we present a novel learning-based car detection method that leverages trichromatic linear polarization as an additional cue to disambiguate such challenging cases. A key observation is that polarization, characteristic of the light wave, can robustly describe intrinsic physical properties of the scene objects in various imaging conditions and is strongly linked to the nature of materials for cars (e.g., metal and glass) and their surrounding environment (e.g., soil and trees), thereby providing reliable and discriminative features for robust car detection in challenging scenes. To exploit polarization cues, we first construct a pixel-aligned RGB-Polarization car detection dataset, which we subsequently employ to train a novel multimodal fusion network. Our car detection network dynamically integrates RGB and polarization features in a request-and-complement manner and can explore the intrinsic material properties of cars across all learning samples. We extensively validate our method and demonstrate that it outperforms state-of-the-art detection methods. Experimental results show that polarization is a powerful cue for car detection.
Very recently, the first mathematical runtime analyses of the multi-objective evolutionary optimizer NSGA-II have been conducted. We continue this line of research with a first runtime analysis of this algorithm on a benchmark problem consisting of two multimodal objectives. We prove that if the population size $N$ is at least four times the size of the Pareto front, then the NSGA-II with four different ways to select parents and bit-wise mutation optimizes the OneJumpZeroJump benchmark with jump size~$2 \le k \le n/4$ in time $O(N n^k)$. When using fast mutation, a recently proposed heavy-tailed mutation operator, this guarantee improves by a factor of $k^{\Omega(k)}$. Overall, this work shows that the NSGA-II copes with the local optima of the OneJumpZeroJump problem at least as well as the global SEMO algorithm.
We introduce the Density Formula for (topological) drawings of graphs in the plane or on the sphere, which relates the number of edges, vertices, crossings, and sizes of cells in the drawing. We demonstrate its capability by providing several applications: we prove tight upper bounds on the edge density of various beyond-planar graph classes, including so-called $k$-planar graphs with $k=1,2$, fan-crossing / fan-planar graphs, $k$-bend RAC-graphs with $k=0,1,2$, and quasiplanar graphs. In some cases ($1$-bend and $2$-bend RAC-graphs and fan-crossing / fan-planar graphs), we thereby obtain the first tight upper bounds on the edge density of the respective graph classes. In other cases, we give new streamlined and significantly shorter proofs for bounds that were already known in the literature. Thanks to the Density Formula, all of our proofs are mostly elementary counting and mostly circumvent the typical intricate case analysis found in earlier proofs. Further, in some cases (simple and non-homotopic quasiplanar graphs), our alternative proofs using the Density Formula lead to the first tight lower bound examples.
We address the choice of penalty parameter in the Smoothness-Penalized Deconvolution (SPeD) method of estimating a probability density under additive measurement error. Cross-validation gives an unbiased estimate of the risk (for the present sample size n) with a given penalty parameter, and this function can be minimized as a function of the penalty parameter. Least-squares cross-validation, which has been proposed for the similar Deconvoluting Kernel Density Estimator (DKDE), performs quite poorly for SPeD. We instead estimate the risk function for a smaller sample size n_1 < n with a given penalty parameter, using this to choose the penalty parameter for sample size n_1, and then use the asymptotics of the optimal penalty parameter to choose for sample size n. In a simulation study, we find that this has dramatically better performance than cross-validation, is an improvement over a SURE-type method previously proposed for this estimator, and compares favorably to the classic DKDE with its recommended plug-in method. We prove that the maximum error in estimating the risk function is of smaller order than its optimal rate of convergence.
We study the lift-and-project rank of the stable set polytopes of graphs with respect to the Lov\'{a}sz--Schrijver SDP operator $\text{LS}_+$, with a particular focus on finding and characterizing the smallest graphs with a given $\text{LS}_+$-rank (the least number of iterations of the $\text{LS}_+$ operator on the fractional stable set polytope to compute the stable set polytope). We introduce a generalized vertex-stretching operation that appears to be promising in generating $\text{LS}_+$-minimal graphs and study its properties. We also provide several new $\text{LS}_+$-minimal graphs, most notably the first known instances of $12$-vertex graphs with $\text{LS}_+$-rank $4$, which provides the first advance in this direction since Escalante, Montelar, and Nasini's discovery of a $9$-vertex graph with $\text{LS}_+$-rank $3$ in 2006.
We study the lift-and-project rank of the stable set polytopes of graphs with respect to the Lov{\'a}sz--Schrijver SDP operator $\LS_+$, with a particular focus on a search for relatively small graphs with high $\LS_+$-rank (the least number of iterations of the $\LS_+$ operator on the fractional stable set polytope to compute the stable set polytope). We provide families of graphs whose $\LS_+$-rank is asymptotically a linear function of its number of vertices, which is the best possible up to improvements in the constant factor (previous best result in this direction, from 1999, yielded graphs whose $\LS_+$-rank only grew with the square root of the number of vertices).