In their recent work, C. Doerr and Krejca (Transactions on Evolutionary Computation, 2023) proved upper bounds on the expected runtime of the randomized local search heuristic on generalized Needle functions. Based on these upper bounds, they deduce in a not fully rigorous manner a drastic influence of the needle radius $k$ on the runtime. In this short article, we add the missing lower bound necessary to determine the influence of parameter $k$ on the runtime. To this aim, we derive an exact description of the expected runtime, which also significantly improves the upper bound given by C. Doerr and Krejca. We also describe asymptotic estimates of the expected runtime.
In recent years, the shortcomings of Bayes posteriors as inferential devices has received increased attention. A popular strategy for fixing them has been to instead target a Gibbs measure based on losses that connect a parameter of interest to observed data. While existing theory for such inference procedures relies on these losses to be analytically available, in many situations these losses must be stochastically estimated using pseudo-observations. The current paper fills this research gap, and derives the first asymptotic theory for Gibbs measures based on estimated losses. Our findings reveal that the number of pseudo-observations required to accurately approximate the exact Gibbs measure depends on the rates at which the bias and variance of the estimated loss converge to zero. These results are particularly consequential for the emerging field of generalised Bayesian inference, for estimated intractable likelihoods, and for biased pseudo-marginal approaches. We apply our results to three Gibbs measures that have been proposed to deal with intractable likelihoods and model misspecification.
We investigate the constant-depth circuit complexity of the Isomorphism Problem, Minimum Generating Set Problem (MGS), and Sub(quasi)group Membership Problem (Membership) for groups and quasigroups (=Latin squares), given as input in terms of their multiplication (Cayley) tables. Despite decades of research on these problems, lower bounds for these problems even against depth-$2$ AC circuits remain unknown. Perhaps surprisingly, Chattopadhyay, Tor\'an, and Wagner (FSTTCS 2010; ACM Trans. Comput. Theory, 2013) showed that Quasigroup Isomorphism could be solved by AC circuits of depth $O(\log \log n)$ using $O(\log^2 n)$ nondeterministic bits, a class we denote $\exists^{\log^2(n)}FOLL$. We narrow this gap by improving the upper bound for many of these problems to $quasiAC^0$, thus decreasing the depth to constant. In particular, we show: - MGS for quasigroups is in $\exists^{\log^2(n)}\forall^{\log n}NTIME(\mathrm{polylog}(n))\subseteq quasiAC^0$. Papadimitriou and Yannakakis (J. Comput. Syst. Sci., 1996) conjectured that this problem was $\exists^{\log^2(n)}P$-complete; our results refute a version of that conjecture for completeness under $quasiAC^0$ reductions unconditionally, and under polylog-space reductions assuming EXP $\neq$ PSPACE. - MGS for groups is in $AC^{1}(L)$, improving on the previous upper bound of $P$ (Lucchini & Thakkar, J. Algebra, 2024). - Quasigroup Isomorphism belongs to $\exists^{\log^2(n)}AC^0(DTISP(\mathrm{polylog},\log)\subseteq quasiAC^0$, improving on the previous bound of $\exists^{\log^2(n)}L\cap\exists^{\log^2(n)}FOLL\subseteq quasiFOLL$ (Chattopadhyay, Tor\'an, & Wagner, ibid.; Levet, Australas. J. Combin., 2023). Our results suggest that understanding the constant-depth circuit complexity may be key to resolving the complexity of problems concerning (quasi)groups in the multiplication table model.
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\text{poly}(k,d,1/\alpha,1/\varepsilon,\log(1/\delta))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $\mathbb{R}^d$ up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b).
According to the World Health Organization, the involvement of Vulnerable Road Users (VRUs) in traffic accidents remains a significant concern, with VRUs accounting for over half of traffic fatalities. The increase of automation and connectivity levels of vehicles has still an uncertain impact on VRU safety. By deploying the Collective Perception Service (CPS), vehicles can include information about VRUs in Vehicle-to-Everything (V2X) messages, thus raising the general perception of the environment. Although an increased awareness is considered positive, one could argue that the awareness ratio, the metric used to measure perception, is only implicitly connected to the VRUs' safety. This paper introduces a tailored metric, the Risk Factor (RF), to measure the risk level for the interactions between Connected Automated Vehicles (CAVs) and VRUs. By evaluating the RF, we assess the impact of V2X communication on VRU risk mitigation. Our results show that high V2X penetration rates can reduce mean risk, quantified by our proposed metric, by up to 44%. Although the median risk value shows a significant decrease, suggesting a reduction in overall risk, the distribution of risk values reveals that CPS's mitigation effectiveness is overestimated, which is indicated by the divergence between RF and awareness ratio. Additionally, by analyzing a real-world traffic dataset, we pinpoint high-risk locations within a scenario, identifying areas near intersections and behind parked cars as especially dangerous. Our methodology can be ported and applied to other scenarios in order to identify high-risk areas. We value the proposed RF as an insightful metric for quantifying VRU safety in a highly automated and connected environment.
We study the output length of one-way state generators (OWSGs), their weaker variants, and EFIs. - Standard OWSGs. Recently, Cavalar et al. (arXiv:2312.08363) give OWSGs with $m$-qubit outputs for any $m=\omega(\log \lambda)$, where $\lambda$ is the security parameter, and conjecture that there do not exist OWSGs with $O(\log \log \lambda)$-qubit outputs. We prove their conjecture in a stronger manner by showing that there do not exist OWSGs with $O(\log \lambda)$-qubit outputs. This means that their construction is optimal in terms of output length. - Inverse-polynomial-advantage OWSGs. Let $\epsilon$-OWSGs be a parameterized variant of OWSGs where a quantum polynomial-time adversary's advantage is at most $\epsilon$. For any constant $c\in \mathbb{N}$, we construct $\lambda^{-c}$-OWSGs with $((c+1)\log \lambda+O(1))$-qubit outputs assuming the existence of OWFs. We show that this is almost tight by proving that there do not exist $\lambda^{-c}$-OWSGs with at most $(c\log \lambda-2)$-qubit outputs. - Constant-advantage OWSGs. For any constant $\epsilon>0$, we construct $\epsilon$-OWSGs with $O(\log \log \lambda)$-qubit outputs assuming the existence of subexponentially secure OWFs. We show that this is almost tight by proving that there do not exist $O(1)$-OWSGs with $((\log \log \lambda)/2+O(1))$-qubit outputs. - Weak OWSGs. We refer to $(1-1/\mathsf{poly}(\lambda))$-OWSGs as weak OWSGs. We construct weak OWSGs with $m$-qubit outputs for any $m=\omega(1)$ assuming the existence of exponentially secure OWFs with linear expansion. We show that this is tight by proving that there do not exist weak OWSGs with $O(1)$-qubit outputs. - EFIs. We show that there do not exist $O(\log \lambda)$-qubit EFIs. We show that this is tight by proving that there exist $\omega(\log \lambda)$-qubit EFIs assuming the existence of exponentially secure PRGs.
This article examines the implicit regularization effect of Stochastic Gradient Descent (SGD). We consider the case of SGD without replacement, the variant typically used to optimize large-scale neural networks. We analyze this algorithm in a more realistic regime than typically considered in theoretical works on SGD, as, e.g., we allow the product of the learning rate and Hessian to be $O(1)$ and we do not specify any model architecture, learning task, or loss (objective) function. Our core theoretical result is that optimizing with SGD without replacement is locally equivalent to making an additional step on a novel regularizer. This implies that the expected trajectories of SGD without replacement can be decoupled in (i) following SGD with replacement (in which batches are sampled i.i.d.) along the directions of high curvature, and (ii) regularizing the trace of the noise covariance along the flat ones. As a consequence, SGD without replacement travels flat areas and may escape saddles significantly faster than SGD with replacement. On several vision tasks, the novel regularizer penalizes a weighted trace of the Fisher Matrix, thus encouraging sparsity in the spectrum of the Hessian of the loss in line with empirical observations from prior work. We also propose an explanation for why SGD does not train at the edge of stability (as opposed to GD).
We study the output length of one-way state generators (OWSGs), their weaker variants, and EFIs. - Standard OWSGs. Recently, Cavalar et al. (arXiv:2312.08363) give OWSGs with $m$-qubit outputs for any $m=\omega(\log \lambda)$, where $\lambda$ is the security parameter, and conjecture that there do not exist OWSGs with $O(\log \log \lambda)$-qubit outputs. We prove their conjecture in a stronger manner by showing that there do not exist OWSGs with $O(\log \lambda)$-qubit outputs. This means that their construction is optimal in terms of output length. - Inverse-polynomial-advantage OWSGs. Let $\epsilon$-OWSGs be a parameterized variant of OWSGs where a quantum polynomial-time adversary's advantage is at most $\epsilon$. For any constant $c\in \mathbb{N}$, we construct $\lambda^{-c}$-OWSGs with $((c+1)\log \lambda+O(1))$-qubit outputs assuming the existence of OWFs. We show that this is almost tight by proving that there do not exist $\lambda^{-c}$-OWSGs with at most $(c\log \lambda-2)$-qubit outputs. - Constant-advantage OWSGs. For any constant $\epsilon>0$, we construct $\epsilon$-OWSGs with $O(\log \log \lambda)$-qubit outputs assuming the existence of subexponentially secure OWFs. We show that this is almost tight by proving that there do not exist $O(1)$-OWSGs with $((\log \log \lambda)/2+O(1))$-qubit outputs. - Weak OWSGs. We refer to $(1-1/\mathsf{poly}(\lambda))$-OWSGs as weak OWSGs. We construct weak OWSGs with $m$-qubit outputs for any $m=\omega(1)$ assuming the existence of exponentially secure OWFs with linear expansion. We show that this is tight by proving that there do not exist weak OWSGs with $O(1)$-qubit outputs. - EFIs. We show that there do not exist $O(\log \lambda)$-qubit EFIs. We show that this is tight by proving that there exist $\omega(\log \lambda)$-qubit EFIs assuming the existence of exponentially secure PRGs.
The Densest Subgraph Problem requires to find, in a given graph, a subset of vertices whose induced subgraph maximizes a measure of density. The problem has received a great deal of attention in the algorithmic literature since the early 1970s, with many variants proposed and many applications built on top of this basic definition. Recent years have witnessed a revival of research interest in this problem with several important contributions, including some groundbreaking results, published in 2022 and 2023. This survey provides a deep overview of the fundamental results and an exhaustive coverage of the many variants proposed in the literature, with a special attention to the most recent results. The survey also presents a comprehensive overview of applications and discusses some interesting open problems for this evergreen research topic.
Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.
Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.