Construction of error-correcting codes achieving a designated minimum distance parameter is a central problem in coding theory. In this work, we study a very simple construction of binary linear codes that correct a given number of errors $K$. Moreover, we design a simple, nearly optimal syndrome decoder for the code as well. The running time of the decoder is only logarithmic in the block length of the code, and nearly linear in the number of errors $K$. This decoder can be applied to exact for-all sparse recovery over any field, improving upon previous results with the same number of measurements. Furthermore, computation of the syndrome from a received word can be done in nearly linear time in the block length. We also demonstrate an application of these techniques in non-adaptive group testing, and construct simple explicit measurement schemes with $O(K^2 \log^2 N)$ tests and $O(K^3 \log^2 N)$ recovery time for identifying up to $K$ defectives in a population of size $N$.
Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance. In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, keeping the rest of the weights frozen at their initial (pretrained) values. We demonstrate that \emph{subset finetuning} (or SubTuning) often achieves accuracy comparable to full finetuning of the model, and even surpasses the performance of full finetuning when training data is scarce. Therefore, SubTuning allows deploying new tasks at minimal computational cost, while enjoying the benefits of finetuning the entire model. This yields a simple and effective method for multi-task learning, where different tasks do not interfere with one another, and yet share most of the resources at inference time. We demonstrate the efficiency of SubTuning across multiple tasks, using different network architectures and pretraining methods.
Event-based cameras are raising interest within the computer vision community. These sensors operate with asynchronous pixels, emitting events, or "spikes", when the luminance change at a given pixel since the last event surpasses a certain threshold. Thanks to their inherent qualities, such as their low power consumption, low latency and high dynamic range, they seem particularly tailored to applications with challenging temporal constraints and safety requirements. Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs), since the coupling of an asynchronous sensor with neuromorphic hardware can yield real-time systems with minimal power requirements. In this work, we seek to develop one such system, using both event sensor data from the DSEC dataset and spiking neural networks to estimate optical flow for driving scenarios. We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations. To do so, we encourage both minimal norm for the error vector and minimal angle between ground-truth and predicted flow, training our model with back-propagation using a surrogate gradient. In addition, the use of 3d convolutions allows us to capture the dynamic nature of the data by increasing the temporal receptive fields. Upsampling after each decoding stage ensures that each decoder's output contributes to the final estimation. Thanks to separable convolutions, we have been able to develop a light model (when compared to competitors) that can nonetheless yield reasonably accurate optical flow estimates.
We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to H\"older smoothness. This measure of the ``effective smoothness'' of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic ``worst-case'' H\"older constant. We prove nearly tight upper and lower risk bounds in terms of the average H\"older smoothness, establishing the minimax rate in the realizable regression setting up to log factors; this was not previously known even in the special case of average Lipschitz smoothness. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown sampling distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide a learning algorithm that achieves the (nearly) optimal learning rate. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of H\"older smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.
Neuroevolution is a promising area of research that combines evolutionary algorithms with neural networks. A popular subclass of neuroevolutionary methods, called evolution strategies, relies on dense noise perturbations to mutate networks, which can be sample inefficient and challenging for large models with millions of parameters. We introduce an approach to alleviating this problem by decomposing dense mutations into low-dimensional subspaces. Restricting mutations in this way can significantly reduce variance as networks can handle stronger perturbations while maintaining performance, which enables a more controlled and targeted evolution of deep networks. This approach is uniquely effective for the task of fine tuning pre-trained models, which is an increasingly valuable area of research as networks continue to scale in size and open source models become more widely available. Furthermore, we show how this work naturally connects to ensemble learning where sparse mutations encourage diversity among children such that their combined predictions can reliably improve performance. We conduct the first large scale exploration of neuroevolutionary fine tuning and ensembling on the notoriously difficult ImageNet dataset, where we see small generalization improvements with only a single evolutionary generation using nearly a dozen different deep neural network architectures.
We introduce a high-order spline geometric approach for the initial boundary value problem for Maxwell's equations. The method is geometric in the sense that it discretizes in structure preserving fashion the two de Rham sequences of differential forms involved in the formulation of the continuous system. Both the Ampere--Maxwell and the Faraday equations are required to hold strongly, while to make the system solvable two discrete Hodge star operators are used. By exploiting the properties of the chosen spline spaces and concepts from exterior calculus, a non-standard explicit in time formulation is introduced, based on the solution of linear systems with matrices presenting Kronecker product structure, rather than mass matrices as in the standard literature. These matrices arise from the application of the exterior (wedge) product in the discrete setting, and they present Kronecker product structure independently of the geometry of the domain or the material parameters. The resulting scheme preserves the desirable energy conservation properties of the known approaches. The computational advantages of the newly proposed scheme are studied both through a complexity analysis and through numerical experiments in three dimensions.
We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware.
We consider the differentially private estimation of multiple quantiles (MQ) of a distribution from a dataset, a key building block in modern data analysis. We apply the recent non-smoothed Inverse Sensitivity (IS) mechanism to this specific problem. We establish that the resulting method is closely related to the recently published ad hoc algorithm JointExp. In particular, they share the same computational complexity and a similar efficiency. We prove the statistical consistency of these two algorithms for continuous distributions. Furthermore, we demonstrate both theoretically and empirically that this method suffers from an important lack of performance in the case of peaked distributions, which can degrade up to a potentially catastrophic impact in the presence of atoms. Its smoothed version (i.e. by applying a max kernel to its output density) would solve this problem, but remains an open challenge to implement. As a proxy, we propose a simple and numerically efficient method called Heuristically Smoothed JointExp (HSJointExp), which is endowed with performance guarantees for a broad class of distributions and achieves results that are orders of magnitude better on problematic datasets.
Object detection with transformers (DETR) reaches competitive performance with Faster R-CNN via a transformer encoder-decoder architecture. Inspired by the great success of pre-training transformers in natural language processing, we propose a pretext task named random query patch detection to unsupervisedly pre-train DETR (UP-DETR) for object detection. Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the original image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade-off multi-task learning of classification and localization in the pretext task, we freeze the CNN backbone and propose a patch feature reconstruction branch which is jointly optimized with patch detection. (2) To perform multi-query localization, we introduce UP-DETR from single-query patch and extend it to multi-query patches with object query shuffle and attention mask. In our experiments, UP-DETR significantly boosts the performance of DETR with faster convergence and higher precision on PASCAL VOC and COCO datasets. The code will be available soon.
Detection and recognition of text in natural images are two main problems in the field of computer vision that have a wide variety of applications in analysis of sports videos, autonomous driving, industrial automation, to name a few. They face common challenging problems that are factors in how text is represented and affected by several environmental conditions. The current state-of-the-art scene text detection and/or recognition methods have exploited the witnessed advancement in deep learning architectures and reported a superior accuracy on benchmark datasets when tackling multi-resolution and multi-oriented text. However, there are still several remaining challenges affecting text in the wild images that cause existing methods to underperform due to there models are not able to generalize to unseen data and the insufficient labeled data. Thus, unlike previous surveys in this field, the objectives of this survey are as follows: first, offering the reader not only a review on the recent advancement in scene text detection and recognition, but also presenting the results of conducting extensive experiments using a unified evaluation framework that assesses pre-trained models of the selected methods on challenging cases, and applies the same evaluation criteria on these techniques. Second, identifying several existing challenges for detecting or recognizing text in the wild images, namely, in-plane-rotation, multi-oriented and multi-resolution text, perspective distortion, illumination reflection, partial occlusion, complex fonts, and special characters. Finally, the paper also presents insight into the potential research directions in this field to address some of the mentioned challenges that are still encountering scene text detection and recognition techniques.
We examine the problem of question answering over knowledge graphs, focusing on simple questions that can be answered by the lookup of a single fact. Adopting a straightforward decomposition of the problem into entity detection, entity linking, relation prediction, and evidence combination, we explore simple yet strong baselines. On the popular SimpleQuestions dataset, we find that basic LSTMs and GRUs plus a few heuristics yield accuracies that approach the state of the art, and techniques that do not use neural networks also perform reasonably well. These results show that gains from sophisticated deep learning techniques proposed in the literature are quite modest and that some previous models exhibit unnecessary complexity.