Certain statistical models are capable of interpreting input strings as instructions, or prompts, and carry out tasks based on them. Many approaches to prompting and pre-training these models involve the automated generation of these prompts. We call these approaches meta-prompting, or prompting to obtain prompts. We propose a theoretical framework based on category theory to generalize and describe them. This framework is flexible enough to account for LLM stochasticity; and allows us to obtain formal results around task agnosticity and equivalence of various meta-prompting approaches. We experiment with meta-prompting in two active areas of model research: creativity and ideation. We find that user preference favors (p < 0.01) the prompts generated under meta-prompting, as well as their corresponding outputs, over a series of hardcoded baseline prompts that include the original task prompt. Using our framework, we argue that meta-prompting is more effective than basic prompting at generating desirable outputs.
Subsumption resolution is an expensive but highly effective simplifying inference for first-order saturation theorem provers. We present a new SAT-based reasoning technique for subsumption resolution, without requiring radical changes to the underlying saturation algorithm. We implemented our work in the theorem prover Vampire, and show that it is noticeably faster than the state of the art.
Independent parallel q-ary symmetric channels are a suitable transmission model for several applications. The proposed weighted-Hamming metric is tailored to this setting and enables optimal decoding performance. We show that some weighted-Hamming-metric codes exhibit the unusual property that all errors beyond half the minimum distance can be corrected. Nevertheless, a tight relation between the error-correction capability of a code and its minimum distance can be established. Generalizing their Hamming-metric counterparts, upper and lower bounds on the cardinality of a code with a given weighted-Hamming distance are obtained. Finally, we propose a simple code construction with optimal minimum distance for specific parameters.
Conformal prediction (CP) is a method for constructing a prediction interval around the output of a fitted model, whose validity does not rely on the model being correct--the CP interval offers a coverage guarantee that is distribution-free, but relies on the training data being drawn from the same distribution as the test data. A recent variant, weighted conformal prediction (WCP), reweights the method to allow for covariate shift between the training and test distributions. However, WCP requires knowledge of the nature of the covariate shift-specifically,the likelihood ratio between the test and training covariate distributions. In practice, since this likelihood ratio is estimated rather than known exactly, the coverage guarantee may degrade due to the estimation error. In this paper, we consider a special scenario where observations belong to a finite number of groups, and these groups determine the covariate shift between the training and test distributions-for instance, this may arise if the training set is collected via stratified sampling. Our results demonstrate that in this special case, the predictive coverage guarantees of WCP can be drastically improved beyond the bounds given by existing estimation error bounds.
Despite advances in generative methods, accurately modeling the distribution of graphs remains a challenging task primarily because of the absence of predefined or inherent unique graph representation. Two main strategies have emerged to tackle this issue: 1) restricting the number of possible representations by sorting the nodes, or 2) using permutation-invariant/equivariant functions, specifically Graph Neural Networks (GNNs). In this paper, we introduce a new framework named Discrete Graph Auto-Encoder (DGAE), which leverages the strengths of both strategies and mitigate their respective limitations. In essence, we propose a strategy in 2 steps. We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations, each node being represented by a sequence of quantized vectors. In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model based on the Transformer architecture. Through multiple experimental evaluations, we demonstrate the competitive performances of our model in comparison to the existing state-of-the-art across various datasets. Various ablation studies support the interest of our method.
The training datasets used in long-tailed recognition are extremely unbalanced, resulting in significant variation in per-class accuracy across categories. Prior works mostly used average accuracy to evaluate their algorithms, which easily ignores those worst-performing categories. In this paper, we aim to enhance the accuracy of the worst-performing categories and utilize the harmonic mean and geometric mean to assess the model's performance. We revive the balanced undersampling idea to achieve this goal. In few-shot learning, balanced subsets are few-shot and will surely under-fit, hence it is not used in modern long-tailed learning. But, we find that it produces a more equitable distribution of accuracy across categories with much higher harmonic and geometric mean accuracy, and, but lower average accuracy. Moreover, we devise a straightforward model ensemble strategy, which does not result in any additional overhead and achieves improved harmonic and geometric mean while keeping the average accuracy almost intact when compared to state-of-the-art long-tailed learning methods. We validate the effectiveness of our approach on widely utilized benchmark datasets for long-tailed learning. Our code is at \href{//github.com/yuhao318/BTM/}{//github.com/yuhao318/BTM/}.
For turbulent problems of industrial scale, computational cost may become prohibitive due to the stability constraints associated with explicit time discretization of the underlying conservation laws. On the other hand, implicit methods allow for larger time-step sizes but require exorbitant computational resources. Implicit-explicit (IMEX) formulations combine both temporal approaches, using an explicit method in nonstiff portions of the domain and implicit in stiff portions. While these methods can be shown to be orders of magnitude faster than typical explicit discretizations, they are still limited by their implicit discretization in terms of cost. Hybridization reduces the scaling of these systems to an effective lower dimension, which allows the system to be solved at significant speedup factors compared to standard implicit methods. This work proposes an IMEX scheme that combines hybridized and standard flux reconstriction (FR) methods to tackle geometry-induced stiffness. By using the so-called transmission conditions, an overall conservative formulation can be obtained after combining both explicit FR and hybridized implicit FR methods. We verify and apply our approach to a series of numerical examples, including a multi-element airfoil at Reynolds number 1.7 million. Results demonstrate speedup factors of four against standard IMEX formulations and at least 15 against standard explicit formulations for the same problem.
Battery-constrained power consumption, compute limitations, and high frame rate requirements in head-mounted displays present unique challenges in the drive to present increasingly immersive and comfortable imagery in virtual reality. However, humans are not equally sensitive to all regions of the visual field, and perceptually-optimized rendering techniques are increasingly utilized to address these bottlenecks. Many of these techniques are gaze-contingent and often render reduced detail away from a user's fixation. Such techniques are dependent on spatio-temporally-accurate gaze tracking and can result in obvious visual artifacts when eye tracking is inaccurate. In this work we present a gaze-contingent rendering technique which only requires saccade detection, bypassing the need for highly-accurate eye tracking. In our first experiment, we show that visual acuity is reduced for several hundred milliseconds after a saccade. In our second experiment, we use these results to reduce the rendered image resolution after saccades in a controlled psychophysical setup, and find that observers cannot discriminate between saccade-contingent reduced-resolution rendering and full-resolution rendering. Finally, in our third experiment, we introduce a 90 pixels per degree headset and validate our saccade-contingent rendering method under typical VR viewing conditions.
The transformer architecture has shown remarkable success in various domains, such as natural language processing and computer vision. When it comes to graph learning, transformers are required not only to capture the interactions between pairs of nodes but also to preserve graph structures connoting the underlying relations and proximity between them, showing the expressive power to capture different graph structures. Accordingly, various structure-preserving graph transformers have been proposed and widely used for various tasks, such as graph-level tasks in bioinformatics and chemoinformatics. However, strategies related to graph structure preservation have not been well organized and systematized in the literature. In this paper, we provide a comprehensive overview of structure-preserving graph transformers and generalize these methods from the perspective of their design objective. First, we divide strategies into four main groups: node feature modulation, context node sampling, graph rewriting, and transformer architecture improvements. We then further divide the strategies according to the coverage and goals of graph structure preservation. Furthermore, we also discuss challenges and future directions for graph transformer models to preserve the graph structure and understand the nature of graphs.
Federated learning enables multiple parties to collaboratively train a machine learning model without communicating their local data. A key challenge in federated learning is to handle the heterogeneity of local data distribution across parties. Although many studies have been proposed to address this challenge, we find that they fail to achieve high performance in image datasets with deep learning models. In this paper, we propose MOON: model-contrastive federated learning. MOON is a simple and effective federated learning framework. The key idea of MOON is to utilize the similarity between model representations to correct the local training of individual parties, i.e., conducting contrastive learning in model-level. Our extensive experiments show that MOON significantly outperforms the other state-of-the-art federated learning algorithms on various image classification tasks.
The essence of multivariate sequential learning is all about how to extract dependencies in data. These data sets, such as hourly medical records in intensive care units and multi-frequency phonetic time series, often time exhibit not only strong serial dependencies in the individual components (the "marginal" memory) but also non-negligible memories in the cross-sectional dependencies (the "joint" memory). Because of the multivariate complexity in the evolution of the joint distribution that underlies the data generating process, we take a data-driven approach and construct a novel recurrent network architecture, termed Memory-Gated Recurrent Networks (mGRN), with gates explicitly regulating two distinct types of memories: the marginal memory and the joint memory. Through a combination of comprehensive simulation studies and empirical experiments on a range of public datasets, we show that our proposed mGRN architecture consistently outperforms state-of-the-art architectures targeting multivariate time series.