Neural networks (NNs) inspired by the forward-backward algorithm (FBA) are used as equalizers for bandlimited channels with a memoryless nonlinearity. The NN-equalizers are combined with successive interference cancellation (SIC) to approach the information rates of joint detection and decoding (JDD) with considerably less complexity than JDD and other existing equalizers. Simulations for short-haul optical fiber links with square-law detection illustrate the gains of NNs as compared to the complexity-limited FBA and Gibbs sampling.
Financial networks are characterized by complex structures of mutual obligations. These obligations are fulfilled entirely or in part (when defaults occur) via a mechanism called clearing, which determines a set of payments that settle the claims by respecting rules such as limited liability, absolute priority, and proportionality (pro-rated payments). In the presence of shocks on the financial system, however, the clearing mechanism may lead to cascaded defaults and eventually to financial disaster. In this paper, we first study the clearing model under pro-rated payments of Eisenberg and Noe, and we derive novel necessary and sufficient conditions for the uniqueness of the clearing payments, valid for an arbitrary topology of the financial network. Then, we argue that the proportionality rule is one of the factors responsible for cascaded defaults, and that the overall system loss can be reduced if this rule is lifted. The proposed approach thus shifts the focus from the individual interest to the overall system's interest to control and contain adverse effects of cascaded failures, and we show that clearing payments in this setting can be computed by solving suitable convex optimization problems.
Simulation-based inference (SBI) is constantly in search of more expressive algorithms for accurately inferring the parameters of complex models from noisy data. We present consistency models for neural posterior estimation (CMPE), a new free-form conditional sampler for scalable, fast, and amortized SBI with generative neural networks. CMPE combines the advantages of normalizing flows and flow matching methods into a single generative architecture: It essentially distills a continuous probability flow and enables rapid few-shot inference with an unconstrained architecture that can be tailored to the structure of the estimation problem. Our empirical evaluation demonstrates that CMPE not only outperforms current state-of-the-art algorithms on three hard low-dimensional problems but also achieves competitive performance in a high-dimensional Bayesian denoising experiment and in estimating a computationally demanding multi-scale model of tumor spheroid growth.
Quantum Machine Learning (QML) has surfaced as a pioneering framework addressing sequential control tasks and time-series modeling. It has demonstrated empirical quantum advantages notably within domains such as Reinforcement Learning (RL) and time-series prediction. A significant advancement lies in Quantum Recurrent Neural Networks (QRNNs), specifically tailored for memory-intensive tasks encompassing partially observable environments and non-linear time-series prediction. Nevertheless, QRNN-based models encounter challenges, notably prolonged training duration stemming from the necessity to compute quantum gradients using backpropagation-through-time (BPTT). This predicament exacerbates when executing the complete model on quantum devices, primarily due to the substantial demand for circuit evaluation arising from the parameter-shift rule. This paper introduces the Quantum Fast Weight Programmers (QFWP) as a solution to the temporal or sequential learning challenge. The QFWP leverages a classical neural network (referred to as the 'slow programmer') functioning as a quantum programmer to swiftly modify the parameters of a variational quantum circuit (termed the 'fast programmer'). Instead of completely overwriting the fast programmer at each time-step, the slow programmer generates parameter changes or updates for the quantum circuit parameters. This approach enables the fast programmer to incorporate past observations or information. Notably, the proposed QFWP model achieves learning of temporal dependencies without necessitating the use of quantum recurrent neural networks. Numerical simulations conducted in this study showcase the efficacy of the proposed QFWP model in both time-series prediction and RL tasks. The model exhibits performance levels either comparable to or surpassing those achieved by QLSTM-based models.
In neural network binarization, BinaryConnect (BC) and its variants are considered the standard. These methods apply the sign function in their forward pass and their respective gradients are backpropagated to update the weights. However, the derivative of the sign function is zero whenever defined, which consequently freezes training. Therefore, implementations of BC (e.g., BNN) usually replace the derivative of sign in the backward computation with identity or other approximate gradient alternatives. Although such practice works well empirically, it is largely a heuristic or ''training trick.'' We aim at shedding some light on these training tricks from the optimization perspective. Building from existing theory on ProxConnect (PC, a generalization of BC), we (1) equip PC with different forward-backward quantizers and obtain ProxConnect++ (PC++) that includes existing binarization techniques as special cases; (2) derive a principled way to synthesize forward-backward quantizers with automatic theoretical guarantees; (3) illustrate our theory by proposing an enhanced binarization algorithm BNN++; (4) conduct image classification experiments on CNNs and vision transformers, and empirically verify that BNN++ generally achieves competitive results on binarizing these models.
Linear optical circuits can be used to manipulate the quantum states of photons as they pass through components including beam splitters and phase shifters. Those photonic states possess a particularly high level of expressiveness, as they reside within the bosonic Fock space, an infinite-dimensional Hilbert space. However, in the domain of linear optical quantum computation, these basic components may not be sufficient to efficiently perform all computations of interest, such as universal quantum computation. To address this limitation it is common to add auxiliary sources and detectors, which enable projections onto auxiliary photonic states and thus increase the versatility of the processes. In this paper, we introduce the $\textbf{LO}_{fi}$-calculus, a graphical language to reason on the infinite-dimensional bosonic Fock space with circuits composed of four core elements of linear optics: the phase shifter, the beam splitter, and auxiliary sources and detectors with bounded photon number. We present an equational theory that we prove to be complete: two $\textbf{LO}_{fi}$-circuits represent the same quantum process if and only if one can be transformed into the other with the rules of the $\textbf{LO}_{fi}$-calculus. We give a unique and compact universal form for such circuits.
Interior point methods are widely used for different types of mathematical optimization problems. Many implementations of interior point methods in use today rely on direct linear solvers to solve systems of equations in each iteration. The need to solve ever larger optimization problems more efficiently and the rise of hardware accelerators for general purpose computing has led to a large interest in using iterative linear solvers instead, with the major issue being inevitable ill-conditioning of the linear systems arising as the optimization progresses. We investigate the use of Krylov solvers for interior point methods in solving optimization problems from radiation therapy and support vector machines. We implement a prototype interior point method using a so called doubly augmented formulation of the Karush-Kuhn-Tucker linear system of equations, originally proposed by Forsgren and Gill, and evaluate its performance on real optimization problems from radiation therapy and support vector machines. Crucially, our implementation uses a preconditioned conjugate gradient method with Jacobi preconditioning internally. Our measurements of the conditioning of the linear systems indicate that the Jacobi preconditioner improves the conditioning of the systems to a degree that they can be solved iteratively, but there is room for further improvement in that regard. Furthermore, profiling of our prototype code shows that it is suitable for GPU acceleration, which may further improve its performance in practice. Overall, our results indicate that our method can find solutions of acceptable accuracy in reasonable time, even with a simple Jacobi preconditioner.
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.
A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.