There is an intimate connection between numerical upscaling of multiscale PDEs and scattered data approximation of heterogeneous functions: the coarse variables selected for deriving an upscaled equation (in the former) correspond to the sampled information used for approximation (in the latter). As such, both problems can be thought of as recovering a target function based on some coarse data that are either artificially chosen by an upscaling algorithm, or determined by some physical measurement process. The purpose of this paper is then to study that, under such a setup and for a specific elliptic problem, how the lengthscale of the coarse data, which we refer to as the subsampled lengthscale, influences the accuracy of recovery, given limited computational budgets. Our analysis and experiments identify that, reducing the subsampling lengthscale may improve the accuracy, implying a guiding criterion for coarse-graining or data acquisition in this computationally constrained scenario, especially leading to direct insights for the implementation of the Gamblets method in the numerical homogenization literature. Moreover, reducing the lengthscale to zero may lead to a blow-up of approximation error if the target function does not have enough regularity, suggesting the need for a stronger prior assumption on the target function to be approximated. We introduce a singular weight function to deal with it, both theoretically and numerically. This work sheds light on the interplay of the lengthscale of coarse data, the computational costs, the regularity of the target function, and the accuracy of approximations and numerical simulations.
In this paper we develop efficient first-order algorithms for the generalized trust-region subproblem (GTRS), which has applications in signal processing, compressed sensing, and engineering. Although the GTRS, as stated, is nonlinear and nonconvex, it is well-known that objective value exactness holds for its SDP relaxation under a Slater condition. While polynomial-time SDP-based algorithms exist for the GTRS, their relatively large computational complexity has motivated and spurred the development of custom approaches for solving the GTRS. In particular, recent work in this direction has developed first-order methods for the GTRS whose running times are linear in the sparsity (the number of nonzero entries) of the input data. In contrast to these algorithms, in this paper we develop algorithms for computing $\epsilon$-approximate solutions to the GTRS whose running times are linear in both the input sparsity and the precision $\log(1/\epsilon)$ whenever a regularity parameter is positive. We complement our theoretical guarantees with numerical experiments comparing our approach against algorithms from the literature. Our numerical experiments highlight that our new algorithms significantly outperform prior state-of-the-art algorithms on sparse large-scale instances.
Multi-label learning in the presence of missing labels (MLML) is a challenging problem. Existing methods mainly focus on the design of network structures or training schemes, which increase the complexity of implementation. This work seeks to fulfill the potential of loss function in MLML without increasing the procedure and complexity. Toward this end, we propose two simple yet effective methods via robust loss design based on an observation that a model can identify missing labels during training with a high precision. The first is a novel robust loss for negatives, namely the Hill loss, which re-weights negatives in the shape of a hill to alleviate the effect of false negatives. The second is a self-paced loss correction (SPLC) method, which uses a loss derived from the maximum likelihood criterion under an approximate distribution of missing labels. Comprehensive experiments on a vast range of multi-label image classification datasets demonstrate that our methods can remarkably boost the performance of MLML and achieve new state-of-the-art loss functions in MLML.
The use of mathematical models to make predictions about tumor growth and response to treatment has become increasingly more prevalent in the clinical setting. The level of complexity within these models ranges broadly, and the calibration of more complex models correspondingly requires more detailed clinical data. This raises questions about how much data should be collected and when, in order to minimize the total amount of data used and the time until a model can be calibrated accurately. To address these questions, we propose a Bayesian information-theoretic procedure, using a gradient-based score function to determine the optimal data collection times for model calibration. The novel score function introduced in this work eliminates the need for a weight parameter used in a previous study's score function, while still yielding accurate and efficient model calibration using even fewer scans on a sample set of synthetic data, simulating tumors of varying levels of radiosensitivity. We also conduct a robust analysis of the calibration accuracy and certainty, using both error and uncertainty metrics. Unlike the error analysis of the previous study, the inclusion of uncertainty analysis in this work|as a means for deciding when the algorithm can be terminated|provides a more realistic option for clinical decision-making, since it does not rely on data that will be collected later in time.
Backtracking search algorithms are often used to solve the Constraint Satisfaction Problem (CSP). The efficiency of backtracking search depends greatly on the variable ordering heuristics. Currently, the most commonly used heuristics are hand-crafted based on expert knowledge. In this paper, we propose a deep reinforcement learning based approach to automatically discover new variable ordering heuristics that are better adapted for a given class of CSP instances. We show that directly optimizing the search cost is hard for bootstrapping, and propose to optimize the expected cost of reaching a leaf node in the search tree. To capture the complex relations among the variables and constraints, we design a representation scheme based on Graph Neural Network that can process CSP instances with different sizes and constraint arities. Experimental results on random CSP instances show that the learned policies outperform classical hand-crafted heuristics in terms of minimizing the search tree size, and can effectively generalize to instances that are larger than those used in training.
We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results that match those in the i.i.d. setting up to a multiplicative factor that is proportional to the mixing time of the chain. Our theoretical guarantees of optimality are corroborated by numerical experiments.
Many practical engineering systems and their components have multiple performance levels and failure modes. If these systems form a monotonically increasing structure function (system model) with respect to the performance of their components and also if all of their components affect the overall system performance, then they are said to be multistate coherent systems. Traditionally, the reliability analysis of these multistate coherent systems has been carried out using paper-and-pencil or simulation based methods. The former method is often prone to human errors, while the latter requires high computational resources for large and complex systems having components with multiple operational states. As a complimentary approach, we propose to use Higher-order-logic (HOL) theorem proving to develop a sound reasoning framework to analyze the reliability of multistate coherent systems in this paper. This framework allows us to formally verify generic mathematical properties about multistate coherent systems with an arbitrary number of components and their states. Particularly, we present the HOL formalization of series and parallel multistate coherent systems and formally verify their deterministic and probabilistic properties using the HOL4 theorem prover. For illustration purposes, we present the formal reliability analysis of the multistate oil and gas pipeline to demonstrate the effectiveness of our proposed framework.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Network embedding has attracted considerable research attention recently. However, the existing methods are incapable of handling billion-scale networks, because they are computationally expensive and, at the same time, difficult to be accelerated by distributed computing schemes. To address these problems, we propose RandNE, a novel and simple billion-scale network embedding method. Specifically, we propose a Gaussian random projection approach to map the network into a low-dimensional embedding space while preserving the high-order proximities between nodes. To reduce the time complexity, we design an iterative projection procedure to avoid the explicit calculation of the high-order proximities. Theoretical analysis shows that our method is extremely efficient, and friendly to distributed computing schemes without any communication cost in the calculation. We demonstrate the efficacy of RandNE over state-of-the-art methods in network reconstruction and link prediction tasks on multiple datasets with different scales, ranging from thousands to billions of nodes and edges.
Many problems on signal processing reduce to nonparametric function estimation. We propose a new methodology, piecewise convex fitting (PCF), and give a two-stage adaptive estimate. In the first stage, the number and location of the change points is estimated using strong smoothing. In the second stage, a constrained smoothing spline fit is performed with the smoothing level chosen to minimize the MSE. The imposed constraint is that a single change point occurs in a region about each empirical change point of the first-stage estimate. This constraint is equivalent to requiring that the third derivative of the second-stage estimate has a single sign in a small neighborhood about each first-stage change point. We sketch how PCF may be applied to signal recovery, instantaneous frequency estimation, surface reconstruction, image segmentation, spectral estimation and multivariate adaptive regression.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.