We give a short proof of a stronger form of the Johansson-Molloy theorem which relies only on the first moment method. The proof adapts a clever counting argument developed by Rosenfeld in the context of non-repetitive colourings. We then extend that result to graphs where each neighbourhood has bounded density, which improves a recent result from Davies et al. Focusing on tightening the number of colours, we obtain the best known upper bound for the chromatic number of triangle-free graphs of maximum degree $\Delta \ge 224$.
We investigate the relationship between various isomorphism invariants for finite groups. Specifically, we use the Weisfeiler-Leman dimension (WL) to characterize, compare and quantify the effectiveness and complexity of invariants for group isomorphism. It turns out that a surprising number of invariants and characteristic subgroups that are classic to group theory can be detected and identified by a low dimensional Weisfeiler-Leman algorithm. These include the center, the inner automorphism group, the commutator subgroup and the derived series, the abelian radical, the solvable radical, the Fitting group and $\pi$-radicals. A low dimensional WL algorithm additionally determines the isomorphism type of the socle as well as the factors in the derived series and the upper and lower central series. We also analyze the behavior of the WL algorithm for group extensions and prove that a low dimensional WL algorithm determines the isomorphism types of the composition factors of a group. Finally we develop a new tool to define a canonical maximal central decomposition for groups. This allows us to show that the Weisfeiler-Leman dimension of a group is at most one larger than the dimensions of its direct indecomposable factors. In other words the Weisfeiler-Leman dimension increases by at most 1 when taking direct products.
The $(1 + (\lambda,\lambda))$ genetic algorithm is a younger evolutionary algorithm trying to profit also from inferior solutions. Rigorous runtime analyses on unimodal fitness functions showed that it can indeed be faster than classical evolutionary algorithms, though on these simple problems the gains were only moderate. In this work, we conduct the first runtime analysis of this algorithm on a multimodal problem class, the jump functions benchmark. We show that with the right parameters, the \ollga optimizes any jump function with jump size $2 \le k \le n/4$ in expected time $O(n^{(k+1)/2} e^{O(k)} k^{-k/2})$, which significantly and already for constant~$k$ outperforms standard mutation-based algorithms with their $\Theta(n^k)$ runtime and standard crossover-based algorithms with their $\tilde{O}(n^{k-1})$ runtime guarantee. For the isolated problem of leaving the local optimum of jump functions, we determine provably optimal parameters that lead to a runtime of $(n/k)^{k/2} e^{\Theta(k)}$. This suggests some general advice on how to set the parameters of the \ollga, which might ease the further use of this algorithm.
We study the modular Hamiltonian associated with a Gaussian state on the Weyl algebra. We obtain necessary/sufficient criteria for the local equivalence of Gaussian states, independently of the classical results by Araki and Yamagami, Van Daele, Holevo. We also present a criterion for a Bogoliubov automorphism to be weakly inner in the GNS representation. Our main application is the description of the vacuum modular Hamiltonian associated with a time-zero interval in the scalar, massive, free QFT in two spacetime dimensions, thus complementing recent results in higher space dimensions. In particular, we have the formula for the local entropy of a one-dimensional Klein-Gordon wave packet and Araki's vacuum relative entropy of a coherent state on a double cone von Neumann algebra. Besides, we derive the type III_1 factor property. Incidentally, we run across certain positive selfadjoint extensions of the Laplacian, with outer boundary conditions, seemingly not considered so far.
Structural causal models (SCMs), also known as (nonparametric) structural equation models (SEMs), are widely used for causal modeling purposes. In particular, acyclic SCMs, also known as recursive SEMs, form a well-studied subclass of SCMs that generalize causal Bayesian networks to allow for latent confounders. In this paper, we investigate SCMs in a more general setting, allowing for the presence of both latent confounders and cycles. We show that in the presence of cycles, many of the convenient properties of acyclic SCMs do not hold in general: they do not always have a solution; they do not always induce unique observational, interventional and counterfactual distributions; a marginalization does not always exist, and if it exists the marginal model does not always respect the latent projection; they do not always satisfy a Markov property; and their graphs are not always consistent with their causal semantics. We prove that for SCMs in general each of these properties does hold under certain solvability conditions. Our work generalizes results for SCMs with cycles that were only known for certain special cases so far. We introduce the class of simple SCMs that extends the class of acyclic SCMs to the cyclic setting, while preserving many of the convenient properties of acyclic SCMs. With this paper we aim to provide the foundations for a general theory of statistical causal modeling with SCMs.
The graphical balls-into-bins process is a generalization of the classical 2-choice balls-into-bins process, where the bins correspond to vertices of an arbitrary underlying graph $G$. At each time step an edge of $G$ is chosen uniformly at random, and a ball must be assigned to either of the two endpoints of this edge. The standard 2-choice process corresponds to the case of $G=K_n$. For any $k(n)$-edge-connected, $d(n)$-regular graph on $n$ vertices, and any number of balls, we give an allocation strategy that, with high probability, ensures a gap of $O((d/k) \log^4\hspace{-1pt}n \log \log n)$, between the load of any two bins. In particular, this implies polylogarithmic bounds for natural graphs such as cycles and tori, for which the classical greedy allocation strategy is conjectured to have a polynomial gap between the bin loads. For every graph $G$, we also show an $\Omega((d/k) + \log n)$ lower bound on the gap achievable by any allocation strategy. This implies that our strategy achieves the optimal gap, up to polylogarithmic factors, for every graph $G$. Our allocation algorithm is simple to implement and requires only $O(\log(n))$ time per allocation. It can be viewed as a more global version of the greedy strategy that compares average load on certain fixed sets of vertices, rather than on individual vertices. A key idea is to relate the problem of designing a good allocation strategy to that of finding suitable multi-commodity flows. To this end, we consider R\"{a}cke's cut-based decomposition tree and define certain orthogonal flows on it.
The equation $x^m = 0$ defines a fat point on a line. The algebra of regular functions on the arc space of this scheme is the quotient of $k[x, x', x^{(2)}, \ldots]$ by all differential consequences of $x^m = 0$. This infinite-dimensional algebra admits a natural filtration by finite dimensional algebras corresponding to the truncations of arcs. We show that the generating series for their dimensions equals $\frac{m}{1 - mt}$. We also determine the lexicographic initial ideal of the defining ideal of the arc space. These results are motivated by nonreduced version of the geometric motivic Poincar\'e series, multiplicities in differential algebra, and connections between arc spaces and the Rogers-Ramanujan identities. We also prove a recent conjecture put forth by Afsharijoo in the latter context.
We study the best-arm identification problem with fixed confidence when contextual (covariate) information is available in stochastic bandits. Although we can use contextual information in each round, we are interested in the marginalized mean reward over the contextual distribution. Our goal is to identify the best arm with a minimal number of samplings under a given value of the error rate. We show the instance-specific sample complexity lower bounds for the problem. Then, we propose a context-aware version of the "Track-and-Stop" strategy, wherein the proportion of the arm draws tracks the set of optimal allocations and prove that the expected number of arm draws matches the lower bound asymptotically. We demonstrate that contextual information can be used to improve the efficiency of the identification of the best marginalized mean reward compared with the results of Garivier & Kaufmann (2016). We experimentally confirm that context information contributes to faster best-arm identification.
In this paper, for an odd prime $p$, several classes of two-weight linear codes over the finite field $\mathbb{F}_p$ are constructed from the defining sets, and then their complete weight distributions are determined by employing character sums. These codes can be suitable for applications in secret sharing schemes. Furthermore, two new classes of projective two-weight codes are obtained, and then two new classes of strongly regular graphs are given.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.