Social behavior, defined as the process by which individuals act and react in response to others, is crucial for the function of societies and holds profound implications for mental health. To fully grasp the intricacies of social behavior and identify potential therapeutic targets for addressing social deficits, it is essential to understand its core principles. Although machine learning algorithms have made it easier to study specific aspects of complex behavior, current methodologies tend to focus primarily on single-animal behavior. In this study, we introduce LISBET (seLf-supervIsed Social BEhavioral Transformer), a model designed to detect and segment social interactions. Our model eliminates the need for feature selection and extensive human annotation by using self-supervised learning to detect and quantify social behaviors from dynamic body parts tracking data. LISBET can be used in hypothesis-driven mode to automate behavior classification using supervised finetuning, and in discovery-driven mode to segment social behavior motifs using unsupervised learning. We found that motifs recognized using the discovery-driven approach not only closely match the human annotations but also correlate with the electrophysiological activity of dopaminergic neurons in the Ventral Tegmental Area (VTA). We hope LISBET will help the community improve our understanding of social behaviors and their neural underpinnings.
There exist multiple regression applications in engineering and industry where the outcomes are not conditionally independent given the covariates, but where instead the covariates follow a sequential experimental design in which the next measurement depends on the previous outcomes, introducing dependence. Such designs are commonly employed for example for choosing test values when estimating the sensitivity of a material under physical stimulus. Apart from the extensive study of the Robbins--Monro procedure, virtually no attention has been given to verifying asymptotic normality of the maximum likelihood estimator in the general sequential setting, despite the wide use of such designs in industry since at least the 1940s. This is a considerable gap in the literature, since said properties underlie the construction of confidence intervals and hypothesis testing. In this paper we close this gap by establishing a large-sample theory for sequential experimental designs other than the Robbins--Monro procedure. First, we use martingale theory to prove a general result for when such asymptotic normality may be asserted. Second, we consider the special case where the covariate process forms a Markov chain. In doing so, we verify asymptotic normality for the widely applied Bruceton design and a proposed Markovian version of the Langlie design.
Nominal terms extend first-order terms with binding. They lack some properties of first- and higher-order terms: Terms must be reasoned about in a context of 'freshness assumptions'; it is not always possible to 'choose a fresh variable symbol' for a nominal term; it is not always possible to 'alpha-convert a bound variable symbol' or to 'quotient by alpha-equivalence'; the notion of unifier is not based just on substitution. Permissive nominal terms closely resemble nominal terms but they recover these properties, and in particular the 'always fresh' and 'always rename' properties. In the permissive world, freshness contexts are elided, equality is fixed, and the notion of unifier is based on substitution alone rather than on nominal terms' notion of unification based on substitution plus extra freshness conditions. We prove that expressivity is not lost moving to the permissive case and provide an injection of nominal terms unification problems and their solutions into permissive nominal terms problems and their solutions. We investigate the relation between permissive nominal unification and higher-order pattern unification. We show how to translate permissive nominal unification problems and solutions in a sound, complete, and optimal manner, in suitable senses which we make formal.
It is often claimed that the theory of function levels proposed by Frege in Grundgesetze der Arithmetik anticipates the hierarchy of types that underlies Church's simple theory of types. This claim roughly states that Frege presupposes a type of functions in the sense of simple type theory in the expository language of Grundgesetze. However, this view makes it hard to accommodate function names of two arguments and view functions as incomplete entities. I propose and defend an alternative interpretation of first-level function names in Grundgesetze into simple type-theoretic open terms rather than into closed terms of a function type. This interpretation offers a still unhistorical but more faithful type-theoretic approximation of Frege's theory of levels and can be naturally extended to accommodate second-level functions. It is made possible by two key observations that Frege's Roman markers behave essentially like open terms and that Frege lacks a clear criterion for distinguishing between Roman markers and function names.
We consider the numerical approximation of variational problems with orthotropic growth, that is those where the integrand depends strongly on the coordinate directions with possibly different growth in each direction. Under realistic regularity assumptions we derive optimal error estimates. These estimates depend on the existence of an orthotropically stable interpolation operator. Over certain meshes we construct an orthotropically stable interpolant that is also a projection. Numerical experiments illustrate and explore the limits of our theory.
The Galerkin method is often employed for numerical integration of evolutionary equations, such as the Navier-Stokes equation or the magnetic induction equation. Application of the method requires solving an equation of the form $P(Av-f)=0$ at each time step, where $v$ is an element of a finite-dimensional space $V$ with a basis satisfying boundary conditions, $P$ is the orthogonal projection on this space and $A$ is a linear operator. Usually the coefficients of $v$ expanded in the basis are found by calculating the matrix of $PA$ acting on $V$ and solving the respective system of linear equations. For physically realistic boundary conditions (such as the no-slip boundary conditions for the velocity, or for a dielectric outside the fluid volume for the magnetic field) the basis is often not orthogonal and solving the problem can be computationally demanding. We propose an algorithm giving an opportunity to reduce the computational cost for such a problem. Suppose there exists a space $W$ that contains $V$, the difference between the dimensions of $W$ and $V$ is small relative to the dimension of $V$, and solving the problem $P(Aw-f)=0$, where $w$ is an element of $W$, requires less operations than solving the original problem. The equation $P(Av-f)=0$ is then solved in two steps: we solve the problem $P(Aw-f)=0$ in $W$, find a correction $h=v-w$ that belongs to a complement to $V$ in $W$, and obtain the solution $w+h$. When the dimension of the complement is small the proposed algorithm is more efficient than the traditional one.
We propose an adaptive model-predictive controller that balances driving the system to a goal state and seeking system observations that are informative with respect to the parameters of a nonlinear autoregressive exogenous model. The controller's objective function is derived from an expected free energy functional and contains information-theoretic terms expressing uncertainty over model parameters and output predictions. Experiments illustrate how parameter uncertainty affects the control objective and evaluate the proposed controller for a pendulum swing-up task.
Scientific claims gain credibility by replicability, especially if replication under different circumstances and varying designs yields equivalent results. Aggregating results over multiple studies is, however, not straightforward, and when the heterogeneity between studies increases, conventional methods such as (Bayesian) meta-analysis and Bayesian sequential updating become infeasible. *Bayesian Evidence Synthesis*, built upon the foundations of the Bayes factor, allows to aggregate support for conceptually similar hypotheses over studies, regardless of methodological differences. We assess the performance of Bayesian Evidence Synthesis over multiple effect and sample sizes, with a broad set of (inequality-constrained) hypotheses using Monte Carlo simulations, focusing explicitly on the complexity of the hypotheses under consideration. The simulations show that this method can evaluate complex (informative) hypotheses regardless of methodological differences between studies, and performs adequately if the set of studies considered has sufficient statistical power. Additionally, we pinpoint challenging conditions that can lead to unsatisfactory results, and provide suggestions on handling these situations. Ultimately, we show that Bayesian Evidence Synthesis is a promising tool that can be used when traditional research synthesis methods are not applicable due to insurmountable between-study heterogeneity.
Many complex tasks and environments can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to expedite adaptation and enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. While most of these systems are monolithic, modularity promises to allow capturing the compositional nature of many tasks. However, it is unclear under which circumstances modular systems discover this hidden compositional structure. To shed light on this question, we study a teacher-student setting with a modular teacher where we have full control over the composition of ground truth modules. This allows us to relate the problem of compositional generalization to that of identification of the underlying modules. We show theoretically that identification up to linear transformation purely from demonstrations is possible in hypernetworks without having to learn an exponential number of module combinations. While our theory assumes the infinite data limit, in an extensive empirical study we demonstrate how meta-learning from finite data can discover modular solutions that generalize compositionally in modular but not monolithic architectures. We further show that our insights translate outside the teacher-student setting and demonstrate that in tasks with compositional preferences and tasks with compositional goals hypernetworks can discover modular policies that compositionally generalize.
Dependence is undoubtedly a central concept in statistics. Though, it proves difficult to locate in the literature a formal definition which goes beyond the self-evident 'dependence = non-independence'. This absence has allowed the term 'dependence' and its declination to be used vaguely and indiscriminately for qualifying a variety of disparate notions, leading to numerous incongruities. For example, the classical Pearson's, Spearman's or Kendall's correlations are widely regarded as 'dependence measures' of major interest, in spite of returning 0 in some cases of deterministic relationships between the variables at play, evidently not measuring dependence at all. Arguing that research on such a fundamental topic would benefit from a slightly more rigid framework, this paper suggests a general definition of the dependence between two random variables defined on the same probability space. Natural enough for aligning with intuition, that definition is still sufficiently precise for allowing unequivocal identification of a 'universal' representation of the dependence structure of any bivariate distribution. Links between this representation and familiar concepts are highlighted, and ultimately, the idea of a dependence measure based on that universal representation is explored and shown to satisfy Renyi's postulates.
Text normalization is a crucial technology for low-resource languages which lack rigid spelling conventions or that have undergone multiple spelling reforms. Low-resource text normalization has so far relied upon hand-crafted rules, which are perceived to be more data efficient than neural methods. In this paper we examine the case of text normalization for Ligurian, an endangered Romance language. We collect 4,394 Ligurian sentences paired with their normalized versions, as well as the first open source monolingual corpus for Ligurian. We show that, in spite of the small amounts of data available, a compact transformer-based model can be trained to achieve very low error rates by the use of backtranslation and appropriate tokenization.