We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage, we perform a small amount of finetuning. In particular, we use parameter-efficient finetuning (PEFT) methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of our experiments can be performed on a single A100 GPU. From a practical perspective, these results suggest that layer pruning methods can complement other PEFT strategies to further reduce computational resources of finetuning on the one hand, and can improve the memory and latency of inference on the other hand. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.
This paper examines the complex nature of cyber attacks through an analysis of the LastPass breach. It argues for the integration of human-centric considerations into cybersecurity measures, focusing on mitigating factors such as goal-directed behavior, cognitive overload, human biases (e.g., optimism, anchoring), and risky behaviors. Findings from an analysis of this breach offers support to the perspective that addressing both the human and technical dimensions of cyber defense can significantly enhance the resilience of cyber systems against complex threats. This means maintaining a balanced approach while simultaneously simplifying user interactions, making users aware of biases, and discouraging risky practices are essential for preventing cyber incidents.
Multi-camera systems have been shown to improve the accuracy and robustness of SLAM estimates, yet state-of-the-art SLAM systems predominantly support monocular or stereo setups. This paper presents a generic sparse visual SLAM framework capable of running on any number of cameras and in any arrangement. Our SLAM system uses the generalized camera model, which allows us to represent an arbitrary multi-camera system as a single imaging device. Additionally, it takes advantage of the overlapping fields of view (FoV) by extracting cross-matched features across cameras in the rig. This limits the linear rise in the number of features with the number of cameras and keeps the computational load in check while enabling an accurate representation of the scene. We evaluate our method in terms of accuracy, robustness, and run time on indoor and outdoor datasets that include challenging real-world scenarios such as narrow corridors, featureless spaces, and dynamic objects. We show that our system can adapt to different camera configurations and allows real-time execution for typical robotic applications. Finally, we benchmark the impact of the critical design parameters - the number of cameras and the overlap between their FoV that define the camera configuration for SLAM. All our software and datasets are freely available for further research.
We present a computational modelling approach which targets capturing the specifics on how to virtually augment a Metaverse user's available social time capacity via using an independent and autonomous version of her digital representation in the Metaverse. We motivate why this is a fundamental building block to model large-scale social networks in the Metaverse, and emerging properties herein. We envision a Metaverse-focused extension of the traditional avatar concept: An avatar can be as well programmed to operate independently when its user is not controlling it directly, thus turning it into an agent-based digital human representation. This way, we highlight how such an independent avatar could help its user to better navigate their social relationships and optimize their socializing time in the Metaverse by (partly) offloading some interactions to the avatar. We model the setting and identify the characteristic variables by using selected concepts from social sciences: ego networks, social presence, and social cues. Then, we formulate the problem of maximizing the user's non-avatar-mediated spare time as a linear optimization. Finally, we analyze the feasible region of the problem and we present some initial insights on the spare time that can be achieved for different parameter values of the avatar-mediated interactions.
To characterize the computational complexity of satisfiability problems for probabilistic and causal reasoning within the Pearl's Causal Hierarchy, arXiv:2305.09508 [cs.AI] introduce a new natural class, named succ-$\exists$R. This class can be viewed as a succinct variant of the well-studied class $\exists$R based on the Existential Theory of the Reals (ETR). Analogously to $\exists$R, succ-$\exists$R is an intermediate class between NEXP and EXPSPACE, the exponential versions of NP and PSPACE. The main contributions of this work are threefold. Firstly, we characterize the class succ-$\exists$R in terms of nondeterministic real RAM machines and develop structural complexity theoretic results for real RAMs, including translation and hierarchy theorems. Notably, we demonstrate the separation of $\exists$R and succ-$\exists$R. Secondly, we examine the complexity of model checking and satisfiability of fragments of existential second-order logic and probabilistic independence logic. We show succ-$\exists$R- completeness of several of these problems, for which the best-known complexity lower and upper bounds were previously NEXP-hardness and EXPSPACE, respectively. Thirdly, while succ-$\exists$R is characterized in terms of ordinary (non-succinct) ETR instances enriched by exponential sums and a mechanism to index exponentially many variables, in this paper, we prove that when only exponential sums are added, the corresponding class $\exists$R^{\Sigma} is contained in PSPACE. We conjecture that this inclusion is strict, as this class is equivalent to adding a VNP-oracle to a polynomial time nondeterministic real RAM. Conversely, the addition of exponential products to ETR, yields PSPACE. Additionally, we study the satisfiability problem for probabilistic reasoning, with the additional requirement of a small model and prove that this problem is complete for $\exists$R^{\Sigma}.
We propose an analytical solution for approximating the gradient of the Evidence Lower Bound (ELBO) in variational inference problems where the statistical model is a Bayesian network consisting of observations drawn from a mixture of a Gaussian distribution embedded in unrelated clutter, known as the clutter problem. The method employs the reparameterization trick to move the gradient operator inside the expectation and relies on the assumption that, because the likelihood factorizes over the observed data, the variational distribution is generally more compactly supported than the Gaussian distribution in the likelihood factors. This allows efficient local approximation of the individual likelihood factors, which leads to an analytical solution for the integral defining the gradient expectation. We integrate the proposed gradient approximation as the expectation step in an EM (Expectation Maximization) algorithm for maximizing ELBO and test against classical deterministic approaches in Bayesian inference, such as the Laplace approximation, Expectation Propagation and Mean-Field Variational Inference. The proposed method demonstrates good accuracy and rate of convergence together with linear computational complexity.
We consider self-stabilizing algorithms to compute a Maximal Independent Set (MIS) in the extremely weak beeping communication model. The model consists of an anonymous network with synchronous rounds. In each round, each vertex can optionally transmit a signal to all its neighbors (beep). After the transmission of a signal, each vertex can only differentiate between no signal received, or at least one signal received. We assume that vertices have some knowledge about the topology of the network. We revisit the not self-stabilizing algorithm proposed by Jeavons, Scott, and Xu (2013), which computes an MIS in the beeping model. We enhance this algorithm to be self-stabilizing, and explore two different variants, which differ in the knowledge about the topology available to the vertices. In the first variant, every vertex knows an upper bound on the maximum degree $\Delta$ of the graph. For this case, we prove that the proposed self-stabilizing version maintains the same run-time as the original algorithm, i.e. it stabilizes after $O(\log n)$ rounds w.h.p. on any $n$-vertex graph. In the second variant, each vertex only knows an upper bound on its own degree. For this case, we prove that the algorithm stabilizes after $O(\log n\cdot \log \log n)$ rounds on any $n$-vertex graph, w.h.p.
We present counter-intuitive examples of a viscous regularizations of a two-dimensional strictly hyperbolic system of conservation laws. The regularizations are obtained using two different viscosity matrices. While for both of the constructed ``viscous'' systems waves propagating in either $x$- or $y$-directions are stable, oblique waves may be linearly unstable. Numerical simulations fully corroborate these analytical results. To the best of our knowledge, this is the first nontrivial result related to the multidimensional Gelfand problem. Our conjectures provide direct answer to Gelfand's problem both in one- and multi-dimensional cases.
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.