We show, using three empirical applications, that linear regression estimates which rely on the assumption of sparsity are fragile in two ways. First, we document that different choices of the regressor matrix that do not impact ordinary least squares (OLS) estimates, such as the choice of baseline category with categorical controls, can move sparsity-based estimates two standard errors or more. Second, we develop two tests of the sparsity assumption based on comparing sparsity-based estimators with OLS. The tests tend to reject the sparsity assumption in all three applications. Unless the number of regressors is comparable to or exceeds the sample size, OLS yields more robust results at little efficiency cost.
Euler diagrams are a tool for the graphical representation of set relations. Due to their simple way of visualizing elements in the sets by geometric containment, they are easily readable by an inexperienced reader. Euler diagrams where the sets are visualized as aligned rectangles are of special interest. In this work, we link the existence of such rectangular Euler diagrams to the order dimension of an associated order relation. For this, we consider Euler diagrams in one and two dimensions. In the one-dimensional case, this correspondence provides us with a polynomial-time algorithm to compute the Euler diagrams, while the two-dimensional case results in an exponential-time algorithm.
We present an optimal transport framework for performing regression when both the covariate and the response are probability distributions on a compact Euclidean subset $\Omega\subset\mathbb{R}^d$, where $d>1$. Extending beyond compactly supported distributions, this method also applies when both the predictor and responses are Gaussian distributions on $\mathbb{R}^d$. Our approach generalizes an existing transportation-based regression model to higher dimensions. This model postulates that the conditional Fr\'echet mean of the response distribution is linked to the covariate distribution via an optimal transport map. We establish an upper bound for the rate of convergence of a plug-in estimator. We propose an iterative algorithm for computing the estimator, which is based on DC (Difference of Convex Functions) Programming. In the Gaussian case, the estimator achieves a parametric rate of convergence, and the computation of the estimator simplifies to a finite-dimensional optimization over positive definite matrices, allowing for an efficient solution. The performance of the estimator is demonstrated in a simulation study.
Version Control Systems, such as Git and Mercurial, manage the history of a project as a Directed Acyclic Graph encoding the various divergences and synchronizations happening in its life cycle. A popular workflow in the industry, called the feature branch workflow, constrains these graphs to be of a particular shape: a unique main branch, and non-interfering feature branches. Here we focus on the uniform random generation of those graphs with n vertices, including k on the main branch, for which we provide three algorithms, for three different use-cases. The first, based on rejection, is efficient when aiming for small values of k (more precisely whenever k = O($\sqrt$ n)). The second takes as input any number k of commits in the main branch, but requires costly precalculation. The last one is a Boltzmann generator and enables us to generate very large graphs while targeting a constant k/n ratio. All these algorithms are linear in the size of their outputs.
Face morphing is a problem in computer graphics with numerous artistic and forensic applications. It is challenging due to variations in pose, lighting, gender, and ethnicity. This task consists of a warping for feature alignment and a blending for a seamless transition between the warped images. We propose to leverage coord-based neural networks to represent such warpings and blendings of face images. During training, we exploit the smoothness and flexibility of such networks by combining energy functionals employed in classical approaches without discretizations. Additionally, our method is time-dependent, allowing a continuous warping/blending of the images. During morphing inference, we need both direct and inverse transformations of the time-dependent warping. The first (second) is responsible for warping the target (source) image into the source (target) image. Our neural warping stores those maps in a single network dismissing the need for inverting them. The results of our experiments indicate that our method is competitive with both classical and generative models under the lens of image quality and face-morphing detectors. Aesthetically, the resulting images present a seamless blending of diverse faces not yet usual in the literature.
The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and computational complexity constraints. However, the efficiency of these methods depends on the preconditioner utilized. The development of the preconditioner normally requires some insight into the sparse linear system and the desired trade-off of generating the preconditioner and the reduction in the number of iterations. Incomplete factorization methods tend to be black box methods to generate these preconditioners but may fail for a number of reasons. These reasons include numerical issues that require searching for adequate scaling, shifting, and fill-in while utilizing a difficult to parallelize algorithm. With a move towards heterogeneous computing, many sparse applications find GPUs that are optimized for dense tensor applications like training neural networks being underutilized. In this work, we demonstrate that a simple artificial neural network trained either at compile time or in parallel to the running application on a GPU can provide an incomplete sparse Cholesky factorization that can be used as a preconditioner. This generated preconditioner is as good or better in terms of reduction of iterations than the one found using multiple preconditioning techniques such as scaling and shifting. Moreover, the generated method also works and never fails to produce a preconditioner that does not reduce the iteration count.
The governance of online communities has been a critical issue since the first USENET groups, and a number of serious constitutions -- declarations of goals, values, and rights -- have emerged since the mid-1990s. More recently, decentralized autonomous organizations (DAOs) have begun to publish their own constitutions, manifestos, and other governance documents. There are two unique aspects to these documents: they (1) often govern significantly more resources than previously-observed online communities, and (2) are used in conjunction with smart contracts that can secure certain community rights and processes through code. In this article, we analyze 25 DAO constitutions, observe a number of common patterns, and provide a template and a set of recommendations to support the crafting and dissemination of future DAO constitutions. We conclude with a report on how our template and recommendations were then used within the actual constitutional drafting process of a major blockchain.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy which groups existing techniques into coherent categories. We review the different neural architectures in which attention has been incorporated, and also show how attention improves interpretability of neural models. Finally, we discuss some applications in which modeling attention has a significant impact. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.