In this paper we present a quantifier elimination method for conjunctions of linear real arithmetic constraints. Our algorithm is based on the Fourier-Motzkin variable elimination procedure, but by case splitting we are able to reduce the worst-case complexity from doubly to singly exponential. The adaption of the procedure for SMT solving has strong correspondence to the simplex algorithm, therefore we name it FMplex. Besides the theoretical foundations, we provide an experimental evaluation in the context of SMT solving. This is an extended version of the authors' work previously published at the fourteenth International Symposium on Games, Automata, Logics, and Formal Verification (GandALF 2023).
Fair and trustworthy AI is becoming ever more important in both machine learning and legal domains. One important consequence is that decision makers must seek to guarantee a 'fair', i.e., non-discriminatory, algorithmic decision procedure. However, there are several competing notions of algorithmic fairness that have been shown to be mutually incompatible under realistic factual assumptions. This concerns, for example, the widely used fairness measures of 'calibration within groups' and 'balance for the positive/negative class'. In this paper, we present a novel algorithm (FAir Interpolation Method: FAIM) for continuously interpolating between these three fairness criteria. Thus, an initially unfair prediction can be remedied to, at least partially, meet a desired, weighted combination of the respective fairness conditions. We demonstrate the effectiveness of our algorithm when applied to synthetic data, the COMPAS data set, and a new, real-world data set from the e-commerce sector. Finally, we discuss to what extent FAIM can be harnessed to comply with conflicting legal obligations. The analysis suggests that it may operationalize duties in traditional legal fields, such as credit scoring and criminal justice proceedings, but also for the latest AI regulations put forth in the EU, like the Digital Markets Act and the recently enacted AI Act.
In various academic and professional settings, such as mathematics lectures or research presentations, it is often necessary to convey mathematical expressions orally. However, reading mathematical expressions aloud without accompanying visuals can significantly hinder comprehension, especially for those who are hearing-impaired or rely on subtitles due to language barriers. For instance, when a presenter reads Euler's Formula, current Automatic Speech Recognition (ASR) models often produce a verbose and error-prone textual description (e.g., e to the power of i x equals cosine of x plus i $\textit{side}$ of x), instead of the concise $\LaTeX{}$ format (i.e., $ e^{ix} = \cos(x) + i\sin(x) $), which hampers clear understanding and communication. To address this issue, we introduce MathSpeech, a novel pipeline that integrates ASR models with small Language Models (sLMs) to correct errors in mathematical expressions and accurately convert spoken expressions into structured $\LaTeX{}$ representations. Evaluated on a new dataset derived from lecture recordings, MathSpeech demonstrates $\LaTeX{}$ generation capabilities comparable to leading commercial Large Language Models (LLMs), while leveraging fine-tuned small language models of only 120M parameters. Specifically, in terms of CER, BLEU, and ROUGE scores for $\LaTeX{}$ translation, MathSpeech demonstrated significantly superior capabilities compared to GPT-4o. We observed a decrease in CER from 0.390 to 0.298, and higher ROUGE/BLEU scores compared to GPT-4o.
As the outputs of generative AI (GenAI) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI-generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAI, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of watermarking techniques for GenAI, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. Practical evaluation strategies are also explored, providing insights into the development of robust watermarking techniques capable of resisting various attacks. Additionally, we review recent representative works, highlight open challenges, and discuss potential directions for this emerging field. By offering a thorough understanding of watermarking in GenAI, this work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.
In this paper, we introduce AceMath, a suite of frontier math models that excel in solving complex math problems, along with highly effective reward models capable of evaluating generated solutions and reliably identifying the correct ones. To develop the instruction-tuned math models, we propose a supervised fine-tuning (SFT) process that first achieves competitive performance across general domains, followed by targeted fine-tuning for the math domain using a carefully curated set of prompts and synthetically generated responses. The resulting model, AceMath-72B-Instruct greatly outperforms Qwen2.5-Math-72B-Instruct, GPT-4o and Claude-3.5 Sonnet. To develop math-specialized reward model, we first construct AceMath-RewardBench, a comprehensive and robust benchmark for evaluating math reward models across diverse problems and difficulty levels. After that, we present a systematic approach to build our math reward models. The resulting model, AceMath-72B-RM, consistently outperforms state-of-the-art reward models. Furthermore, when combining AceMath-72B-Instruct with AceMath-72B-RM, we achieve the highest average rm@8 score across the math reasoning benchmarks. We will release model weights, training data, and evaluation benchmarks at: //research.nvidia.com/labs/adlr/acemath
Obeying constraints imposed by classical physics, we give optimal fine-grained algorithms for matrix multiplication and problems involving graphs and mazes, where all calculations are done in 3-dimensional space. We assume that whatever the technology is, a bit requires a minimum volume and communication travels at a bounded speed. These imply that multiplying $n \times n$ matrices takes $\Omega(n^{2/3})$ time, and we show that this can be achieved by a fine-grained 3-d mesh of $n^2$ processors. While the constants are impractically large, this is asymptotically faster than parallel implementations of Strassen's algorithm, while the lower bound shows that some claims about parallelizing faster serial algorithms are impossible in 3-space. If the matrices are not over a ring then multiplication can be done in $\Theta(n^{3/4})$ time by expanding to a mesh larger than the input. In 2-d (such as the surface of a chip) this approach is useless and $\Theta(n)$ systolic algorithms are optimal even when the matrices are over a ring. Similarly, for path and maze problems there are approaches useful in 3-d but not 2-d.
Given an input video of a person and a new garment, the objective of this paper is to synthesize a new video where the person is wearing the specified garment while maintaining spatiotemporal consistency. Although significant advances have been made in image-based virtual try-on, extending these successes to video often leads to frame-to-frame inconsistencies. Some approaches have attempted to address this by increasing the overlap of frames across multiple video chunks, but this comes at a steep computational cost due to the repeated processing of the same frames, especially for long video sequences. To tackle these challenges, we reconceptualize video virtual try-on as a conditional video inpainting task, with garments serving as input conditions. Specifically, our approach enhances image diffusion models by incorporating temporal attention layers to improve temporal coherence. To reduce computational overhead, we propose ShiftCaching, a novel technique that maintains temporal consistency while minimizing redundant computations. Furthermore, we introduce the TikTokDress dataset, a new video try-on dataset featuring more complex backgrounds, challenging movements, and higher resolution compared to existing public datasets. Extensive experiments demonstrate that our approach outperforms current baselines, particularly in terms of video consistency and inference speed. The project page is available at //swift-try.github.io/.
In order to make the foundation model more efficient and effective, our idea is combining sequence transformation and state transformation. First, we prove the availability of rotary position embedding in the state space duality algorithm, which reduces the perplexity of the hybrid quadratic causal self-attention and state space duality by more than 4%, to ensure that the combining sequence transformation unifies position encoding. Second, we propose dynamic mask attention, which maintains 100% accuracy in the more challenging multi-query associative recall task, improving by more than 150% compared to quadratic causal self-attention and state space duality, to ensure that the combining sequence transformation selectively filters relevant information. Third, we design cross domain mixture of experts, which makes the computational speed of expert retrieval with more than 1024 experts 8 to 10 times faster than the mixture of experts, to ensure that the combining state transformation quickly retrieval mixture. Finally, we summarize these matrix algorithms that can form the foundation model: Wonderful Matrices, which can be a competitor to popular model architectures.
Invariants withstand transformations and, therefore, represent the essence of objects or phenomena. In mathematics, transformations often constitute a group action. Since the 19th century, studying the structure of various types of invariants and designing methods and algorithms to compute them remains an active area of ongoing research with an abundance of applications. In this incredibly vast topic, we focus on two particular themes displaying a fruitful interplay between the differential and algebraic invariant theories. First, we show how an algebraic adaptation of the moving frame method from differential geometry leads to a practical algorithm for computing a generating set of rational invariants. Then we discuss the notion of differential invariant signature, its role in solving equivalence problems in geometry and algebra, and some successes and challenges in designing algorithms based on this notion.
Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is to enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. Our aim is to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 6 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as ten distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods.
Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.