Moving from the mathematical theory of (abstract) syntax, we develop a general relational theory of symbolic manipulation parametric with respect to, and accounting for, general notions of syntax. We model syntax relying on categorical notions, such as free algebras and monads, and show that a general theory of symbolic manipulation in the style of rewriting systems can be obtained by extending such notions to an allegorical setting. This way, we obtain an augmented calculus of relations accounting for syntax-based rewriting. We witness the effectiveness of the relational approach by generalising and unifying milestones results in rewriting, such as the parallel moves and the Tait-Martin-L\"of techniques.
Two numerical schemes are proposed and investigated for the Yang--Mills equations, which can be seen as a nonlinear generalisation of the Maxwell equations set on Lie algebra-valued functions, with similarities to certain formulations of General Relativity. Both schemes are built on the Discrete de Rham (DDR) method, and inherit from its main features: an arbitrary order of accuracy, and applicability to generic polyhedral meshes. They make use of the complex property of the DDR, together with a Lagrange-multiplier approach, to preserve, at the discrete level, a nonlinear constraint associated with the Yang--Mills equations. We also show that the schemes satisfy a discrete energy dissipation (the dissipation coming solely from the implicit time stepping). Issues around the practical implementations of the schemes are discussed; in particular, the assembly of the local contributions in a way that minimises the price we pay in dealing with nonlinear terms, in conjunction with the tensorisation coming from the Lie algebra. Numerical tests are provided using a manufactured solution, and show that both schemes display a convergence in $L^2$-norm of the potential and electrical fields in $\mathcal O(h^{k+1})$ (provided that the time step is of that order), where $k$ is the polynomial degree chosen for the DDR complex. We also numerically demonstrate the preservation of the constraint.
This article provides a comprehensive understanding of optimization in deep learning, with a primary focus on the challenges of gradient vanishing and gradient exploding, which normally lead to diminished model representational ability and training instability, respectively. We analyze these two challenges through several strategic measures, including the improvement of gradient flow and the imposition of constraints on a network's Lipschitz constant. To help understand the current optimization methodologies, we categorize them into two classes: explicit optimization and implicit optimization. Explicit optimization methods involve direct manipulation of optimizer parameters, including weight, gradient, learning rate, and weight decay. Implicit optimization methods, by contrast, focus on improving the overall landscape of a network by enhancing its modules, such as residual shortcuts, normalization methods, attention mechanisms, and activations. In this article, we provide an in-depth analysis of these two optimization classes and undertake a thorough examination of the Jacobian matrices and the Lipschitz constants of many widely used deep learning modules, highlighting existing issues as well as potential improvements. Moreover, we also conduct a series of analytical experiments to substantiate our theoretical discussions. This article does not aim to propose a new optimizer or network. Rather, our intention is to present a comprehensive understanding of optimization in deep learning. We hope that this article will assist readers in gaining a deeper insight in this field and encourages the development of more robust, efficient, and high-performing models.
Recently, Gilmer proved the first constant lower bound for the union-closed sets conjecture via an information-theoretic argument. The heart of the argument is an entropic inequality involving the OR function of two i.i.d.\ binary vectors, and the best constant obtainable through the i.i.d.\ coupling is $\frac{3-\sqrt{5}}{2}\approx0.38197$. Sawin demonstrated that the bound can be strictly improved by considering a convex combination of the i.i.d.\ coupling and the max-entropy coupling, and the best constant obtainable through this approach is around 0.38234, as evaluated by Yu and Cambie. In this work we show analytically that the bound can be further strictly improved by considering another class of coupling under which the two binary sequences are i.i.d.\ conditioned on an auxiliary random variable. We also provide a new class of bounds in terms of finite-dimensional optimization. For a basic instance from this class, analysis assisted with numerically solved 9-dimensional optimization suggests that the optimizer assumes a certain structure. Under numerically verified hypotheses, the lower bound for the union-closed sets conjecture can be improved to approximately 0.38271, a number that can be defined as the solution to an analytic equation.
Poetry generation is a typical and popular task in natural language generation. While prior works have shown success in controlling either semantic or metrical aspects of poetry generation, there are still challenges in addressing both perspectives simultaneously. In this paper, we employ the Diffusion model to generate poetry in Sonnet and SongCi in Chinese for the first time to tackle such challenges. Different from autoregressive generation, our PoetryDiffusion model, based on Diffusion model, generates the complete sentence or poetry by taking into account the whole sentence information, resulting in improved semantic expression. Additionally, we incorporate a novel metrical controller to manipulate and evaluate metrics (format and rhythm). The denoising process in PoetryDiffusion allows for gradual enhancement of semantics and flexible integration of the metrical controller. Experimental results on two datasets demonstrate that our model outperforms existing models in terms of semantic, metrical and overall performance.
We formalise the modal operators from the concurrent dynamic logics of Peleg, Nerode and Wijesekera in a multirelational algebraic language based on relation algebra and power allegories, using relational approximation operators on multirelations developed in a companion article. We relate Nerode and Wijesekera's box operator with a relational approximation operator for multirelations and two related operators that approximate multirelations by different kinds of deterministic multirelations. We provide an algebraic soundness proof of Goldblatt's axioms for concurrent dynamic logic as an application.
Mobile applications, particularly those from social media platforms such as WeChat and TikTok, are evolving into "super apps" that offer a wide range of services such as instant messaging and media sharing, e-commerce, e-learning, and e-government. These super apps often provide APIs for developers to create "miniapps" that run within the super app. These APIs should have been thoroughly scrutinized for security. Unfortunately, we find that many of them are undocumented and unsecured, potentially allowing miniapps to bypass restrictions and gain higher privileged access. To systematically identify these hidden APIs before they are exploited by attackers, we developed a tool APIScope with both static analysis and dynamic analysis, where static analysis is used to recognize hidden undocumented APIs, and dynamic analysis is used to confirm whether the identified APIs can be invoked by an unprivileged 3rdparty miniapps. We have applied APIScope to five popular super apps (i.e., WeChat, WeCom, Baidu, QQ, and Tiktok) and found that all of them contain hidden APIs, many of which can be exploited due to missing security checks. We have also quantified the hidden APIs that may have security implications by verifying if they have access to resources protected by Android permissions. Furthermore, we demonstrate the potential security hazards by presenting various attack scenarios, including unauthorized access to any web pages, downloading and installing malicious software, and stealing sensitive information. We have reported our findings to the relevant vendors, some of whom have patched the vulnerabilities and rewarded us with bug bounties.
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent years. This achievement can be ascribed in part to advances in AI subfields including Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). Deep learning, a sub-field of machine learning that employs artificial neural network concepts, has enabled the most rapid growth in these domains. The integration of vision and language has sparked a lot of attention as a result of this. The tasks have been created in such a way that they properly exemplify the concepts of deep learning. In this review paper, we provide a thorough and an extensive review of the state of the arts approaches, key models design principles and discuss existing datasets, methods, their problem formulation and evaluation measures for VQA and Visual reasoning tasks to understand vision and language representation learning. We also present some potential future paths in this field of research, with the hope that our study may generate new ideas and novel approaches to handle existing difficulties and develop new applications.
Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: //github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.