We consider a graph coloring algorithm that processes vertices in order taken uniformly at random and assigns colors to them using First-Fit strategy. We show that this algorithm uses, in expectation, at most $(1 + o(1))\cdot \ln n \,/\, \ln\ln n$ different colors to color any forest with $n$ vertices. We also construct a family of forests that shows that this bound is best possible.
Real-Time DEVS (RT-DEVS) can model systems with quantitative temporal requirements. Ensuring that such models verify some temporal properties requires to use something beyond simulation. In this work we use the model checker Uppaal to verify a class of recurrent quantitative temporal properties appearing in RT-DEVS models. Secondly, by introducing mutations to quantitative temporal properties we are able to find errors in RT-DEVS models and their implementations. A case study from the railway domain is presented.
Reinforcement Learning (RL) has shown its remarkable and generalizable capability in legged locomotion through sim-to-real transfer. However, while adaptive methods like domain randomization are expected to make policy more robust to diverse environments, such comprehensiveness potentially detracts from the policy's performance in any specific environment according to the No Free Lunch theorem, leading to a suboptimal solution once deployed in the real world. To address this issue, we propose a lifelong policy adaptation framework named LoopSR, which utilizes a transformer-based encoder to project real-world trajectories into a latent space, and accordingly reconstruct the real-world environments back in simulation for further improvement. Autoencoder architecture and contrastive learning methods are adopted to better extract the characteristics of real-world dynamics. The simulation parameters for continual training are derived by combining predicted parameters from the decoder with retrieved parameters from the simulation trajectory dataset. By leveraging the continual training, LoopSR achieves superior data efficiency compared with strong baselines, with only a limited amount of data to yield eminent performance in both sim-to-sim and sim-to-real experiments.
MAX NAE-SAT is a natural optimization problem, closely related to its better-known relative MAX SAT. The approximability status of MAX NAE-SAT is almost completely understood if all clauses have the same size $k$, for some $k\ge 2$. We refer to this problem as MAX NAE-$\{k\}$-SAT. For $k=2$, it is essentially the celebrated MAX CUT problem. For $k=3$, it is related to the MAX CUT problem in graphs that can be fractionally covered by triangles. For $k\ge 4$, it is known that an approximation ratio of $1-\frac{1}{2^{k-1}}$, obtained by choosing a random assignment, is optimal, assuming $P\ne NP$. For every $k\ge 2$, an approximation ratio of at least $\frac{7}{8}$ can be obtained for MAX NAE-$\{k\}$-SAT. There was some hope, therefore, that there is also a $\frac{7}{8}$-approximation algorithm for MAX NAE-SAT, where clauses of all sizes are allowed simultaneously. Our main result is that there is no $\frac{7}{8}$-approximation algorithm for MAX NAE-SAT, assuming the unique games conjecture (UGC). In fact, even for almost satisfiable instances of MAX NAE-$\{3,5\}$-SAT (i.e., MAX NAE-SAT where all clauses have size $3$ or $5$), the best approximation ratio that can be achieved, assuming UGC, is at most $\frac{3(\sqrt{21}-4)}{2}\approx 0.8739$. Using calculus of variations, we extend the analysis of O'Donnell and Wu for MAX CUT to MAX NAE-$\{3\}$-SAT. We obtain an optimal algorithm, assuming UGC, for MAX NAE-$\{3\}$-SAT, slightly improving on previous algorithms. The approximation ratio of the new algorithm is $\approx 0.9089$. We complement our theoretical results with some experimental results. We describe an approximation algorithm for almost satisfiable instances of MAX NAE-$\{3,5\}$-SAT with a conjectured approximation ratio of 0.8728, and an approximation algorithm for almost satisfiable instances of MAX NAE-SAT with a conjectured approximation ratio of 0.8698.
We develop and investigate a general theory of representations of second-order functionals, based on a notion of a right comodule for a monad on the category of containers. We show how the notion of comodule representability naturally subsumes classic representations of continuous functionals with well-founded trees. We find other kinds of representations by varying the monad, the comodule, and in some cases the underlying category of containers. Examples include uniformly continuous or finitely supported functionals, functionals querying their arguments precisely once, or at most once, functionals interacting with an ambient environment through computational effects, as well as functionals trivially representing themselves. Many of these rely on our construction of a monad on containers from a monad on shapes and a weak Mendler-style monad algebra on the universe for positions. We show that comodule representability on the category of propositional containers, which have positions valued in a universe of propositions, is closely related to instance reducibility in constructive mathematics, and through it to Weihrauch reducibility in computability theory.
Theoretical and empirical evidence suggests that joint graph embedding algorithms induce correlation across the networks in the embedding space. In the Omnibus joint graph embedding framework, previous results explicitly delineated the dual effects of the algorithm-induced and model-inherent correlations on the correlation across the embedded networks. Accounting for and mitigating the algorithm-induced correlation is key to subsequent inference, as sub-optimal Omnibus matrix constructions have been demonstrated to lead to loss in inference fidelity. This work presents the first efforts to automate the Omnibus construction in order to address two key questions in this joint embedding framework: the correlation-to-OMNI problem and the flat correlation problem. In the flat correlation problem, we seek to understand the minimum algorithm-induced flat correlation (i.e., the same across all graph pairs) produced by a generalized Omnibus embedding. Working in a subspace of the fully general Omnibus matrices, we prove both a lower bound for this flat correlation and that the classical Omnibus construction induces the maximal flat correlation. In the correlation-to-OMNI problem, we present an algorithm -- named corr2Omni -- that, from a given matrix of estimated pairwise graph correlations, estimates the matrix of generalized Omnibus weights that induces optimal correlation in the embedding space. Moreover, in both simulated and real data settings, we demonstrate the increased effectiveness of our corr2Omni algorithm versus the classical Omnibus construction.
We propose a multi-tier paradigm to preserve various components of Morse-Smale complexes in lossy compressed scalar fields, including extrema, saddles, separatrices, and persistence diagrams. Existing error-bounded lossy compressors rarely consider preserving topological structures such as discrete Morse-Smale complexes, leading to significant inaccuracies in data interpretation and potentially resulting in incorrect scientific conclusions. This paper mainly focuses on preserving the Morse-Smale complexes in 2D or 3D discrete scalar fields by precisely preserving critical simplices and the separatrices that connect them. Our approach generates a series of edits during compression time, which are applied to the decompressed data to accurately reconstruct the complexes while maintaining the error within prescribed bounds. We design a workflow that iteratively fixes critical simplices and separatrices in alternating steps until convergence within finite iterations. Our approach addresses diverse application needs by offering users flexible options to balance compression efficiency and feature preservation. To enable effective integration with lossy compressors, we use GPU parallelism to enhance the performance of each workflow component. We conduct experiments on various datasets to demonstrate the effectiveness of our method in accurately preserving Morse-Smale complexes.
Due to the diversity of scene text in aspects such as font, color, shape, and size, accurately and efficiently detecting text is still a formidable challenge. Among the various detection approaches, segmentation-based approaches have emerged as prominent contenders owing to their flexible pixel-level predictions. However, these methods typically model text instances in a bottom-up manner, which is highly susceptible to noise. In addition, the prediction of pixels is isolated without introducing pixel-feature interaction, which also influences the detection performance. To alleviate these problems, we propose a multi-information level arbitrary-shaped text detector consisting of a focus entirety module (FEM) and a perceive environment module (PEM). The former extracts instance-level features and adopts a top-down scheme to model texts to reduce the influence of noises. Specifically, it assigns consistent entirety information to pixels within the same instance to improve their cohesion. In addition, it emphasizes the scale information, enabling the model to distinguish varying scale texts effectively. The latter extracts region-level information and encourages the model to focus on the distribution of positive samples in the vicinity of a pixel, which perceives environment information. It treats the kernel pixels as positive samples and helps the model differentiate text and kernel features. Extensive experiments demonstrate the FEM's ability to efficiently support the model in handling different scale texts and confirm the PEM can assist in perceiving pixels more accurately by focusing on pixel vicinities. Comparisons show the proposed model outperforms existing state-of-the-art approaches on four public datasets.
Current emotional text-to-speech (TTS) systems face challenges in mimicking a broad spectrum of human emotions due to the inherent complexity of emotions and limitations in emotional speech datasets and models. This paper proposes a TTS framework that facilitates control over pleasure, arousal, and dominance, and can synthesize a diversity of emotional styles without requiring any emotional speech data during TTS training. We train an emotional attribute predictor using only categorical labels from speech data, aligning with psychological research and incorporating anchored dimensionality reduction on self-supervised learning (SSL) features. The TTS framework converts text inputs into phonetic tokens via an autoregressive language model and uses pseudo-emotional dimensions to guide the parallel prediction of fine-grained acoustic details. Experiments conducted on the LibriTTS dataset demonstrate that our framework can synthesize speech with enhanced naturalness and a variety of emotional styles by effectively controlling emotional dimensions, even without the inclusion of any emotional speech during TTS training.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.