The problem PosSLP is the problem of determining whether a given straight-line program (SLP) computes a positive integer. PosSLP was introduced by Allender et al. to study the complexity of numerical analysis (Allender et al., 2009). PosSLP can also be reformulated as the problem of deciding whether the integer computed by a given SLP can be expressed as the sum of squares of four integers, based on the well-known result by Lagrange in 1770, which demonstrated that every natural number can be represented as the sum of four non-negative integer squares. In this paper, we explore several natural extensions of this problem by investigating whether the positive integer computed by a given SLP can be written as the sum of squares of two or three integers. We delve into the complexity of these variations and demonstrate relations between the complexity of the original PosSLP problem and the complexity of these related problems. Additionally, we introduce a new intriguing problem called Div2SLP and illustrate how Div2SLP is connected to DegSLP and the problem of whether an SLP computes an integer expressible as the sum of three squares. By comprehending the connections between these problems, our results offer a deeper understanding of decision problems associated with SLPs and open avenues for further exciting research
Large Language Models have become an integral part of new intelligent and interactive writing assistants. Many are offered commercially with a chatbot-like UI, such as ChatGPT, and provide little information about their inner workings. This makes this new type of widespread system a potential target for deceptive design patterns. For example, such assistants might exploit hidden costs by providing guidance up until a certain point before asking for a fee to see the rest. As another example, they might sneak unwanted content/edits into longer generated or revised text pieces (e.g. to influence the expressed opinion). With these and other examples, we conceptually transfer several deceptive patterns from the literature to the new context of AI writing assistants. Our goal is to raise awareness and encourage future research into how the UI and interaction design of such systems can impact people and their writing.
Conformal prediction is a generic methodology for finite-sample valid distribution-free prediction. This technique has garnered a lot of attention in the literature partly because it can be applied with any machine learning algorithm that provides point predictions to yield valid prediction regions. Of course, the efficiency (width/volume) of the resulting prediction region depends on the performance of the machine learning algorithm. In the context of point prediction, several techniques (such as cross-validation) exist to select one of many machine learning algorithms for better performance. In contrast, such selection techniques are seldom discussed in the context of set prediction (or prediction regions). In this paper, we consider the problem of obtaining the smallest conformal prediction region given a family of machine learning algorithms. We provide two general-purpose selection algorithms and consider coverage as well as width properties of the final prediction region. The first selection method yields the smallest width prediction region among the family of conformal prediction regions for all sample sizes but only has an approximate coverage guarantee. The second selection method has a finite sample coverage guarantee but only attains close to the smallest width. The approximate optimal width property of the second method is quantified via an oracle inequality. As an illustration, we consider the use of aggregation of non-parametric regression estimators in the split conformal method with the absolute residual conformal score.
This work analyzes unobservable directions of Vision-aided Inertial Navigation System (VINS) and Lidar-aided Inertial Navigation System (LINS) nonlinear model. Under the assumption that there exist two features observed by the camera without occlusion, the unobservable directions of VINS are uniformly globally translation and global rotations about the gravity vector. The unobservable directions of LINS are same as VINS, while only one feature need to be observed. Also, a constraint in Observability-Constrained VINS (OC-VINS) is proved.
A key task in the context of consistent query answering is to count the number of repairs that entail the query, with the ultimate goal being a precise data complexity classification. This has been achieved in the case of primary keys and self-join-free conjunctive queries (CQs) via an FP/#P-complete dichotomy. We lift this result to the more general case of functional dependencies (FDs). Another important task in this context is whenever the counting problem in question is intractable, to classify it as approximable, i.e., the target value can be efficiently approximated with error guarantees via a fully polynomial-time randomized approximation scheme (FPRAS), or as inapproximable. Although for primary keys and CQs (even with self-joins) the problem is always approximable, we prove that this is not the case for FDs. We show, however, that the class of FDs with a left-hand side chain forms an island of approximability. We see these results, apart from being interesting in their own right, as crucial steps towards a complete classification of approximate counting of repairs in the case of FDs and self-join-free CQs.
Face morphing is a problem in computer graphics with numerous artistic and forensic applications. It is challenging due to variations in pose, lighting, gender, and ethnicity. This task consists of a warping for feature alignment and a blending for a seamless transition between the warped images. We propose to leverage coord-based neural networks to represent such warpings and blendings of face images. During training, we exploit the smoothness and flexibility of such networks by combining energy functionals employed in classical approaches without discretizations. Additionally, our method is time-dependent, allowing a continuous warping/blending of the images. During morphing inference, we need both direct and inverse transformations of the time-dependent warping. The first (second) is responsible for warping the target (source) image into the source (target) image. Our neural warping stores those maps in a single network dismissing the need for inverting them. The results of our experiments indicate that our method is competitive with both classical and generative models under the lens of image quality and face-morphing detectors. Aesthetically, the resulting images present a seamless blending of diverse faces not yet usual in the literature.
The Zak-OTFS input/output (I/O) relation is predictable and non-fading when the delay and Doppler periods are greater than the effective channel delay and Doppler spreads, a condition which we refer to as the crystallization condition. The filter taps can simply be read off from the response to a single Zak-OTFS point (impulse) pulsone waveform, and the I/O relation can be reconstructed for a sampled system that operates under finite duration and bandwidth constraints. Predictability opens up the possibility of a model-free mode of operation. The time-domain realization of a Zak-OTFS point pulsone is a pulse train modulated by a tone, hence the name, pulsone. The Peak-to-Average Power Ratio (PAPR) of a pulsone is about $15$ dB, and we describe a general method for constructing a spread pulsone for which the time-domain realization has a PAPR of about 6dB. We construct the spread pulsone by applying a type of discrete spreading filter to a Zak-OTFS point pulsone. The self-ambiguity function of the point pulsone is supported on the period lattice ${\Lambda}_{p}$, and by applying a discrete chirp filter, we obtain a spread pulsone with a self-ambiguity function that is supported on a rotated lattice ${\Lambda^*}$. We show that if the channel satisfies the crystallization conditions with respect to ${\Lambda^*}$ then the effective DD domain filter taps can simply be read off from the cross-ambiguity between the channel response to the spread pulsone and the transmitted spread pulsone. If, in addition, the channel satisfies the crystallization conditions with respect to the period lattice ${\Lambda}_{p}$, then in an OTFS frame consisting of a spread pilot pulsone and point data pulsones, after cancelling the received signal corresponding to the spread pulsone, we can recover the channel response to any data pulsone.
Revolutionizing the field of deep learning, Transformer-based models have achieved remarkable performance in many tasks. Recent research has recognized these models are robust to shuffling but are limited to inter-token permutation in the forward propagation. In this work, we propose our definition of permutation equivariance, a broader concept covering both inter- and intra- token permutation in the forward and backward propagation of neural networks. We rigorously proved that such permutation equivariance property can be satisfied on most vanilla Transformer-based models with almost no adaptation. We examine the property over a range of state-of-the-art models including ViT, Bert, GPT, and others, with experimental validations. Further, as a proof-of-concept, we explore how real-world applications including privacy-enhancing split learning, and model authorization, could exploit the permutation equivariance property, which implicates wider, intriguing application scenarios.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.