The necessity of radix conversion of numeric data is an indispensable component in any complete analysis of digital computation. In this paper, we propose a binary encoding for mixed-radix digits. Second, a variant of rANS coding based on this conversion is given, which supports parallel decoding. The simulations show that the proposed coding in serial mode has a higher throughput than the baseline (with the speed-up factor about 2X) without loss of compression ratio, and it outperforms the existing 2-way interleaving implementation.
We propose a new algorithm for efficiently solving the damped Fisher matrix in large-scale scenarios where the number of parameters significantly exceeds the number of available samples. This problem is fundamental for natural gradient descent and stochastic reconfiguration. Our algorithm is based on Cholesky decomposition and is generally applicable. Benchmark results show that the algorithm is significantly faster than existing methods.
We study a novel ensemble approach for feature selection based on hierarchical stacking in cases of non-stationarity and limited number of samples with large number of features. Our approach exploits the co-dependency between features using a hierarchical structure. Initially, a machine learning model is trained using a subset of features, and then the model's output is updated using another algorithm with the remaining features to minimize the target loss. This hierarchical structure allows for flexible depth and feature selection. By exploiting feature co-dependency hierarchically, our proposed approach overcomes the limitations of traditional feature selection methods and feature importance scores. The effectiveness of the approach is demonstrated on synthetic and real-life datasets, indicating improved performance with scalability and stability compared to the traditional methods and state-of-the-art approaches.
Generalized variational inference (GVI) provides an optimization-theoretic framework for statistical estimation that encapsulates many traditional estimation procedures. The typical GVI problem is to compute a distribution of parameters that maximizes the expected payoff minus the divergence of the distribution from a specified prior. In this way, GVI enables likelihood-free estimation with the ability to control the influence of the prior by tuning the so-called learning rate. Recently, GVI was shown to outperform traditional Bayesian inference when the model and prior distribution are misspecified. In this paper, we introduce and analyze a new GVI formulation based on utility theory and risk management. Our formulation is to maximize the expected payoff while enforcing constraints on the maximizing distribution. We recover the original GVI distribution by choosing the feasible set to include a constraint on the divergence of the distribution from the prior. In doing so, we automatically determine the learning rate as the Lagrange multiplier for the constraint. In this setting, we are able to transform the infinite-dimensional estimation problem into a two-dimensional convex program. This reformulation further provides an analytic expression for the optimal density of parameters. In addition, we prove asymptotic consistency results for empirical approximations of our optimal distributions. Throughout, we draw connections between our estimation procedure and risk management. In fact, we demonstrate that our estimation procedure is equivalent to evaluating a risk measure. We test our procedure on an estimation problem with a misspecified model and prior distribution, and conclude with some extensions of our approach.
Obtaining the solutions of partial differential equations based on various machine learning methods has drawn more and more attention in the fields of scientific computation and engineering applications. In this work, we first propose a coupled Extreme Learning Machine (called CELM) method incorporated with the physical laws to solve a class of fourth-order biharmonic equations by reformulating it into two well-posed Poisson problems. In addition, some activation functions including tangent, gauss, sine, and trigonometric (sin+cos) functions are introduced to assess our CELM method. Notably, the sine and trigonometric functions demonstrate a remarkable ability to effectively minimize the approximation error of the CELM model. In the end, several numerical experiments are performed to study the initializing approaches for both the weights and biases of the hidden units in our CELM model and explore the required number of hidden units. Numerical results show the proposed CELM algorithm is high-precision and efficient to address the biharmonic equation in both regular and irregular domains.
Approximate computing is a promising approach to reduce the power, delay, and area in hardware design for many error-resilient applications such as machine learning (ML) and digital signal processing (DSP) systems, in which multipliers usually are key arithmetic units. Due to the underlying architectural differences between ASICs and FPGAs, existing ASIC-based approximate multipliers do not offer symmetrical gains when they are implemented by FPGA resources. In this paper, we propose AMG, an open-source automated approximate multiplier generator for FPGAs driven by Bayesian optimization (BO) with parallel evaluation. The proposed method simplifies the exact half adders (HAs) for the initial partial product (PP) compression in a multiplier while preserving coarse-grained additions for the following accumulation. The generated multipliers can be effectively mapped to lookup tables (LUTs) and carry chains provided by modern FPGAs, reducing hardware costs with acceptable errors. Compared with 1167 multipliers from previous works, our generated multipliers can form a Pareto front with 28.70%-38.47% improvements in terms of the product of hardware cost and error on average. All source codes, reproduced multipliers, and our generated multipliers are available at //github.com/phyzhenli/AMG.
Learning network dynamics from the empirical structure and spatio-temporal observation data is crucial to revealing the interaction mechanisms of complex networks in a wide range of domains. However, most existing methods only aim at learning network dynamic behaviors generated by a specific ordinary differential equation instance, resulting in ineffectiveness for new ones, and generally require dense observations. The observed data, especially from network emerging dynamics, are usually difficult to obtain, which brings trouble to model learning. Therefore, how to learn accurate network dynamics with sparse, irregularly-sampled, partial, and noisy observations remains a fundamental challenge. We introduce Neural ODE Processes for Network Dynamics (NDP4ND), a new class of stochastic processes governed by stochastic data-adaptive network dynamics, to overcome the challenge and learn continuous network dynamics from scarce observations. Intensive experiments conducted on various network dynamics in ecological population evolution, phototaxis movement, brain activity, epidemic spreading, and real-world empirical systems, demonstrate that the proposed method has excellent data adaptability and computational efficiency, and can adapt to unseen network emerging dynamics, producing accurate interpolation and extrapolation with reducing the ratio of required observation data to only about 6\% and improving the learning speed for new dynamics by three orders of magnitude.
The propositional product logic is one of the basic fuzzy logics with continuous t-norms, exploiting the multiplication t-norm on the unit interval [0,1]. Our aim is to combine well-established automated deduction (theorem proving) with fuzzy inference. As a first step, we devise a modification of the procedure of Davis, Putnam, Logemann, and Loveland (DPLL) with dichotomous branching inferring in the product logic. We prove that the procedure is refutation sound and finitely complete. As a consequence, solutions to the deduction, satisfiability, and validity problems will be proposed for the finite case. The presented results are applicable to a design of intelligent systems, exploiting some kind of multi-step fuzzy inference.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.