In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are highly vulnerable to adversarial examples. Our results hold even in cases where the network has many more parameters than training examples. Despite the potential for harmful overfitting in such overparameterized settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial $\ell_2$-perturbations), even though robust networks that fit the data exist.
Motivated by the importance of floating-point computations, we study the problem of securely and accurately summing many floating-point numbers. Prior work has focused on security absent accuracy or accuracy absent security, whereas our approach achieves both of them. Specifically, we show how to implement floating-point superaccumulators using secure multi-party computation techniques, so that a number of participants holding secret shares of floating-point numbers can accurately compute their sum while keeping the individual values private.
In this paper, we study the problem of efficiently and effectively embedding the high-dimensional spatio-spectral information of hyperspectral (HS) images, guided by feature diversity. Specifically, based on the theoretical formulation that feature diversity is correlated with the rank of the unfolded kernel matrix, we rectify 3D convolution by modifying its topology to enhance the rank upper-bound. This modification yields a rank-enhanced spatial-spectral symmetrical convolution set (ReS$^3$-ConvSet), which not only learns diverse and powerful feature representations but also saves network parameters. Additionally, we also propose a novel diversity-aware regularization (DA-Reg) term that directly acts on the feature maps to maximize independence among elements. To demonstrate the superiority of the proposed ReS$^3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks, including denoising, spatial super-resolution, and classification. Extensive experiments show that the proposed approaches outperform state-of-the-art methods both quantitatively and qualitatively to a significant extent. The code is publicly available at //github.com/jinnh/ReSSS-ConvSet.
In this paper, we introduce a new approach for constructing robust well-balanced numerical methods for the one-dimensional Saint-Venant system with and without the Manning friction term. Following the idea presented in [R. Abgrall, Commun. Appl. Math. Comput. 5(2023), pp. 370-402], we first combine the conservative and non-conservative (primitive) formulations of the studied conservative hyperbolic system in a natural way. The solution is globally continuous and described by a combination of point values and average values. The point values and average values will then be evolved by two different forms of PDEs: a conservative version of the cell averages and a possibly non-conservative one for the points. We show how to deal with both the conservative and non-conservative forms of PDEs in a well-balanced manner. The developed schemes are capable of exactly preserving both the still-water and moving-water equilibria. Compared with existing well-balanced methods, this new class of scheme is nonlinear-equations-solver-free. This makes the developed schemes less computationally costly and easier to extend to other models. We demonstrate the behavior of the proposed new scheme on several challenging examples.
In this work, we present a new visual prompting method called 3DAxiesPrompts (3DAP) to unleash the capabilities of GPT-4V in performing 3D spatial tasks. Our investigation reveals that while GPT-4V exhibits proficiency in discerning the position and interrelations of 2D entities through current visual prompting techniques, its abilities in handling 3D spatial tasks have yet to be explored. In our approach, we create a 3D coordinate system tailored to 3D imagery, complete with annotated scale information. By presenting images infused with the 3DAP visual prompt as inputs, we empower GPT-4V to ascertain the spatial positioning information of the given 3D target image with a high degree of precision. Through experiments, We identified three tasks that could be stably completed using the 3DAP method, namely, 2D to 3D Point Reconstruction, 2D to 3D point matching, and 3D Object Detection. We perform experiments on our proposed dataset 3DAP-Data, the results from these experiments validate the efficacy of 3DAP-enhanced GPT-4V inputs, marking a significant stride in 3D spatial task execution.
In this work, we address the problem of directing the text generation of a language model (LM) towards a desired behavior, aligning the generated text with the preferences of the human operator. We propose using another, instruction-tuned language model as a critic reward model in a zero-shot way thanks to the prompt of a Yes-No question that represents the user preferences, without requiring further labeled data. This zero-shot reward model provides the learning signal to further fine-tune the base LM using Reinforcement Learning from AI Feedback (RLAIF); yet our approach is also compatible in other contexts such as quality-diversity search. Extensive evidence of the capabilities of the proposed ZYN framework is provided through experiments in different domains related to text generation, including detoxification; optimizing sentiment of movie reviews, or any other attribute; steering the opinion about a particular topic the model may have; and personalizing prompt generators for text-to-image tasks. Code available at \url{//github.com/vicgalle/zero-shot-reward-models/}.
In our contemporary academic inquiry, we present "Diffusion-C," a foundational methodology to analyze the generative restrictions of Diffusion Models, particularly those akin to GANs, DDPM, and DDIM. By employing input visual data that has been subjected to a myriad of corruption modalities and intensities, we elucidate the performance characteristics of those Diffusion Models. The noise component takes center stage in our analysis, hypothesized to be a pivotal element influencing the mechanics of deep learning systems. In our rigorous expedition utilizing Diffusion-C, we have discerned the following critical observations: (I) Within the milieu of generative models under the Diffusion taxonomy, DDPM emerges as a paragon, consistently exhibiting superior performance metrics. (II) Within the vast spectrum of corruption frameworks, the fog and fractal corruptions notably undermine the functional robustness of both DDPM and DDIM. (III) The vulnerability of Diffusion Models to these particular corruptions is significantly influenced by topological and statistical similarities, particularly concerning the alignment between mean and variance. This scholarly work highlights Diffusion-C's core understandings regarding the impacts of various corruptions, setting the stage for future research endeavors in the realm of generative models.
In the past decade, we have witnessed the rise of deep learning to dominate the field of artificial intelligence. Advances in artificial neural networks alongside corresponding advances in hardware accelerators with large memory capacity, together with the availability of large datasets enabled researchers and practitioners alike to train and deploy sophisticated neural network models that achieve state-of-the-art performance on tasks across several fields spanning computer vision, natural language processing, and reinforcement learning. However, as these neural networks become bigger, more complex, and more widely used, fundamental problems with current deep learning models become more apparent. State-of-the-art deep learning models are known to suffer from issues that range from poor robustness, inability to adapt to novel task settings, to requiring rigid and inflexible configuration assumptions. Ideas from collective intelligence, in particular concepts from complex systems such as self-organization, emergent behavior, swarm optimization, and cellular systems tend to produce solutions that are robust, adaptable, and have less rigid assumptions about the environment configuration. It is therefore natural to see these ideas incorporated into newer deep learning methods. In this review, we will provide a historical context of neural network research's involvement with complex systems, and highlight several active areas in modern deep learning research that incorporate the principles of collective intelligence to advance its current capabilities. To facilitate a bi-directional flow of ideas, we also discuss work that utilize modern deep learning models to help advance complex systems research. We hope this review can serve as a bridge between complex systems and deep learning communities to facilitate the cross pollination of ideas and foster new collaborations across disciplines.
In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.
Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursue good learning performance, human experts are heavily involved in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automated machine learning (AutoML) has emerged as a hot topic with both industrial and academic interest. In this paper, we provide an up to date survey on AutoML. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers most existing approaches to date but also can guide the design for new methods. Subsequently, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future research.