Cooperative coevolutionary algorithms (CCEAs) divide a given problem in to a number of subproblems and use an evolutionary algorithm to solve each subproblem. This short paper is concerned with the scenario under which only a single, global fitness measure exists. By removing the typically used subproblem partnering mechanism, it is suggested that such CCEAs can be viewed as making use of a generalised version of the global crossover operator introduced in early Evolution Strategies. Using the well-known NK model of fitness landscapes, the effects of varying aspects of global crossover with respect to the ruggedness of the underlying fitness landscape are explored. Results suggest improvements over the most widely used form of CCEAs, something further demonstrated using other well-known test functions.
The capabilities of large language models (LLMs) have sparked debate over whether such systems just learn an enormous collection of superficial statistics or a coherent model of the data generating process -- a world model. We find evidence for the latter by analyzing the learned representations of three spatial datasets (world, US, NYC places) and three temporal datasets (historical figures, artworks, news headlines) in the Llama-2 family of models. We discover that LLMs learn linear representations of space and time across multiple scales. These representations are robust to prompting variations and unified across different entity types (e.g. cities and landmarks). In addition, we identify individual ``space neurons'' and ``time neurons'' that reliably encode spatial and temporal coordinates. Our analysis demonstrates that modern LLMs acquire structured knowledge about fundamental dimensions such as space and time, supporting the view that they learn not merely superficial statistics, but literal world models.
Belief propagation (BP) is a useful probabilistic inference algorithm for efficiently computing approximate marginal probability densities of random variables. However, in its standard form, BP is only applicable to the vector-type random variables with a fixed and known number of vector elements, while certain applications rely on RFSs with an unknown number of vector elements. In this paper, we develop BP rules for factor graphs defined on sequences of RFSs where each RFS has an unknown number of elements, with the intention of deriving novel inference methods for RFSs. Furthermore, we show that vector-type BP is a special case of set-type BP, where each RFS follows the Bernoulli process. To demonstrate the validity of developed set-type BP, we apply it to the PMB filter for SLAM, which naturally leads to new set-type BP-mapping, SLAM, multi-target tracking, and simultaneous localization and tracking filters. Finally, we explore the relationships between the vector-type BP and the proposed set-type BP PMB-SLAM implementations and show a performance gain of the proposed set-type BP PMB-SLAM filter in comparison with the vector-type BP-SLAM filter.
Probabilistic numerical solvers for ordinary differential equations (ODEs) treat the numerical simulation of dynamical systems as problems of Bayesian state estimation. Aside from producing posterior distributions over ODE solutions and thereby quantifying the numerical approximation error of the method itself, one less-often noted advantage of this formalism is the algorithmic flexibility gained by formulating numerical simulation in the framework of Bayesian filtering and smoothing. In this paper, we leverage this flexibility and build on the time-parallel formulation of iterated extended Kalman smoothers to formulate a parallel-in-time probabilistic numerical ODE solver. Instead of simulating the dynamical system sequentially in time, as done by current probabilistic solvers, the proposed method processes all time steps in parallel and thereby reduces the span cost from linear to logarithmic in the number of time steps. We demonstrate the effectiveness of our approach on a variety of ODEs and compare it to a range of both classic and probabilistic numerical ODE solvers.
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.
We demonstrate possibility for consensus under the model and conditions used by Fischer, Lynch, and Patterson (FLP) to prove impossibility of binary consensus - in complete asynchrony and up to one unannounced process crash-fail. We also show that: i) assembling by every process a dataset containing the initial values of individual processes is an inevitable phase of binary consensus; and ii) agreeing on this dataset is sufficient for a quasi-binary consensus. Key findings: Direct causal relationship between complete asynchrony and the impossibility to solve consensus does not exist. The impossibility to solve consensus is caused only and entirely by the dependence of agreement on the content of the initial values.
We propose an efficient $\epsilon$-differentially private algorithm, that given a simple {\em weighted} $n$-vertex, $m$-edge graph $G$ with a \emph{maximum unweighted} degree $\Delta(G) \leq n-1$, outputs a synthetic graph which approximates the spectrum with $\widetilde{O}(\min\{\Delta(G), \sqrt{n}\})$ bound on the purely additive error. To the best of our knowledge, this is the first $\epsilon$-differentially private algorithm with a non-trivial additive error for approximating the spectrum of the graph. One of the subroutines of our algorithm also precisely simulates the exponential mechanism over a non-convex set, which could be of independent interest given the recent interest in sampling from a {\em log-concave distribution} defined over a convex set. Spectral approximation also allows us to approximate all possible $(S,T)$-cuts, but it incurs an error that depends on the maximum degree, $\Delta(G)$. We further show that using our sampler, we can also output a synthetic graph that approximates the sizes of all $(S,T)$-cuts on $n$ vertices weighted graph $G$ with $m$ edges while preserving $(\epsilon,\delta)$-differential privacy and an additive error of $\widetilde{O}(\sqrt{mn}/\epsilon)$. We also give a matching lower bound (with respect to all the parameters) on the private cut approximation for weighted graphs. This removes the gap of $\sqrt{W_{\mathsf{avg}}}$ in the upper and lower bound in Eli{\'a}{\v{s}}, Kapralov, Kulkarni, and Lee (SODA 2020), where $W_{\mathsf{avg}}$ is the average edge weight.
Contrastive learning often relies on comparing positive anchor samples with multiple negative samples to perform Self-Supervised Learning (SSL). However, non-contrastive approaches like BYOL, SimSiam, and Barlow Twins achieve SSL without explicit negative samples. In this paper, we introduce a unified matrix information-theoretic framework that explains many contrastive and non-contrastive learning methods. We then propose a novel method Matrix-SSL based on matrix information theory. Experimental results reveal that Matrix-SSL significantly outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing 100 epochs pre-training, our method outperforms SimCLR by 4.6%, and when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. Code available at //github.com/yifanzhang-pro/Matrix-SSL.
This paper develops the exact linear relationship between the leading eigenvector of the unnormalized modularity matrix and the eigenvectors of the adjacency matrix. We propose a method for approximating the leading eigenvector of the modularity matrix, and we derive the error of the approximation. There is also a complete proof of the equivalence between normalized adjacency clustering and normalized modularity clustering. Numerical experiments show that normalized adjacency clustering can be as twice efficient as normalized modularity clustering.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Cold-start problems are long-standing challenges for practical recommendations. Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenarios with few interactions. This paper addresses such problems using few-shot learning and meta learning. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. To accomplish this, we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to-end framework, namely Scenario-specific Sequential Meta learner (or s^2 meta). By doing so, our meta-learner produces a generic initial model through aggregating contextual information from a variety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge. Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation. Deployment is at the Guess You Like session, the front page of the Mobile Taobao.