Iterative random sketching (IRS) offers a computationally expedient approach to solving linear systems. However, IRS' benefits can only be realized if the procedure can be appropriately tracked and stopped -- otherwise, the algorithm may stop before the desired accuracy is achieved, or it may run longer than necessary. Unfortunately, IRS solvers cannot access the residual norm without undermining their computational efficiency. While iterative random sketching solvers have access to noisy estimates of the residual, such estimates turn out to be insufficient to generate accurate estimates and confidence bounds for the true residual. Thus, in this work, we propose a moving average estimator for the system's residual, and rigorously develop practical uncertainty sets for our estimator. We then demonstrate the accuracy of our methods on a number of linear systems problems.
Neural networks have had discernible achievements in a wide range of applications. The wide-spread adoption also raises the concern of their dependability and reliability. Similar to traditional decision-making programs, neural networks can have defects that need to be repaired. The defects may cause unsafe behaviors, raise security concerns or unjust societal impacts. In this work, we address the problem of repairing a neural network for desirable properties such as fairness and the absence of backdoor. The goal is to construct a neural network that satisfies the property by (minimally) adjusting the given neural network's parameters (i.e., weights). Specifically, we propose CARE (\textbf{CA}usality-based \textbf{RE}pair), a causality-based neural network repair technique that 1) performs causality-based fault localization to identify the `guilty' neurons and 2) optimizes the parameters of the identified neurons to reduce the misbehavior. We have empirically evaluated CARE on various tasks such as backdoor removal, neural network repair for fairness and safety properties. Our experiment results show that CARE is able to repair all neural networks efficiently and effectively. For fairness repair tasks, CARE successfully improves fairness by $61.91\%$ on average. For backdoor removal tasks, CARE reduces the attack success rate from over $98\%$ to less than $1\%$. For safety property repair tasks, CARE reduces the property violation rate to less than $1\%$. Results also show that thanks to the causality-based fault localization, CARE's repair focuses on the misbehavior and preserves the accuracy of the neural networks.
Ultrasound (US) imaging is widely used for anatomical structure inspection in clinical diagnosis. The training of new sonographers and deep learning based algorithms for US image analysis usually requires a large amount of data. However, obtaining and labeling large-scale US imaging data are not easy tasks, especially for diseases with low incidence. Realistic US image synthesis can alleviate this problem to a great extent. In this paper, we propose a generative adversarial network (GAN) based image synthesis framework. Our main contributions include: 1) we present the first work that can synthesize realistic B-mode US images with high-resolution and customized texture editing features; 2) to enhance structural details of generated images, we propose to introduce auxiliary sketch guidance into a conditional GAN. We superpose the edge sketch onto the object mask and use the composite mask as the network input; 3) to generate high-resolution US images, we adopt a progressive training strategy to gradually generate high-resolution images from low-resolution images. In addition, a feature loss is proposed to minimize the difference of high-level features between the generated and real images, which further improves the quality of generated images; 4) the proposed US image synthesis method is quite universal and can also be generalized to the US images of other anatomical structures besides the three ones tested in our study (lung, hip joint, and ovary); 5) extensive experiments on three large US image datasets are conducted to validate our method. Ablation studies, customized texture editing, user studies, and segmentation tests demonstrate promising results of our method in synthesizing realistic US images.
Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.
Many recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms by encouraging iterative refinements toward a stable flow estimation. However, these RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation. They can converge poorly and thereby suffer from performance degradation. To combat these drawbacks, we propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer (using any black-box solver), and differentiates through this fixed point analytically (thus requiring $O(1)$ training memory). This implicit-depth approach is not predicated on any specific model, and thus can be applied to a wide range of SOTA flow estimation model designs. The use of these DEQ flow estimators allows us to compute the flow faster using, e.g., fixed-point reuse and inexact gradients, consumes $4\sim6\times$ times less training memory than the recurrent counterpart, and achieves better results with the same computation budget. In addition, we propose a novel, sparse fixed-point correction scheme to stabilize our DEQ flow estimators, which addresses a longstanding challenge for DEQ models in general. We test our approach in various realistic settings and show that it improves SOTA methods on Sintel and KITTI datasets with substantially better computational and memory efficiency.
On-demand delivery has become increasingly popular around the world. Motivated by a large grocery chain store who offers fast on-demand delivery services, we model and solve a stochastic dynamic driver dispatching and routing problem for last-mile delivery systems where on-time performance is the main target. The system operator needs to dispatch a set of drivers and specify their delivery routes facing random demand that arrives over a fixed number of periods. The resulting stochastic dynamic program is challenging to solve due to the curse of dimensionality. We propose a novel structured approximation framework to approximate the value function via a parametrized dispatching and routing policy. We analyze the structural properties of the approximation framework and establish its performance guarantee under large-demand scenarios. We then develop efficient exact algorithms for the approximation problem based on Benders decomposition and column generation, which deliver verifiably optimal solutions within minutes. The evaluation results on a real-world data set show that our framework outperforms the current policy of the company by 36.53% on average in terms of delivery time. We also perform several policy experiments to understand the value of dynamic dispatching and routing with varying fleet sizes and dispatch frequencies.
The fact that the millimeter-wave (mmWave) multiple-input multiple-output (MIMO) channel has sparse support in the spatial domain has motivated recent compressed sensing (CS)-based mmWave channel estimation methods, where the angles of arrivals (AoAs) and angles of departures (AoDs) are quantized using angle dictionary matrices. However, the existing CS-based methods usually obtain the estimation result through one-stage channel sounding that have two limitations: (i) the requirement of large-dimensional dictionary and (ii) unresolvable quantization error. These two drawbacks are irreconcilable; improvement of the one implies deterioration of the other. To address these challenges, we propose, in this paper, a two-stage method to estimate the AoAs and AoDs of mmWave channels. In the proposed method, the channel estimation task is divided into two stages, Stage I and Stage II. Specifically, in Stage I, the AoAs are estimated by solving a multiple measurement vectors (MMV) problem. In Stage II, based on the estimated AoAs, the receive sounders are designed to estimate AoDs. The dimension of the angle dictionary in each stage can be reduced, which in turn reduces the computational complexity substantially. We then analyze the successful recovery probability (SRP) of the proposed method, revealing the superiority of the proposed framework over the existing one-stage CS-based methods. We further enhance the reconstruction performance by performing resource allocation between the two stages. We also overcome the unresolvable quantization error issue present in the prior techniques by applying the atomic norm minimization method to each stage of the proposed two-stage approach. The simulation results illustrate the substantially improved performance with low complexity of the proposed two-stage method.
Bayesian model selection provides a powerful framework for objectively comparing models directly from observed data, without reference to ground truth data. However, Bayesian model selection requires the computation of the marginal likelihood (model evidence), which is computationally challenging, prohibiting its use in many high-dimensional Bayesian inverse problems. With Bayesian imaging applications in mind, in this work we present the proximal nested sampling methodology to objectively compare alternative Bayesian imaging models for applications that use images to inform decisions under uncertainty. The methodology is based on nested sampling, a Monte Carlo approach specialised for model comparison, and exploits proximal Markov chain Monte Carlo techniques to scale efficiently to large problems and to tackle models that are log-concave and not necessarily smooth (e.g., involving l_1 or total-variation priors). The proposed approach can be applied computationally to problems of dimension O(10^6) and beyond, making it suitable for high-dimensional inverse imaging problems. It is validated on large Gaussian models, for which the likelihood is available analytically, and subsequently illustrated on a range of imaging problems where it is used to analyse different choices of dictionary and measurement model.
We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.
We propose a First-Order System Least Squares (FOSLS) method based on deep-learning for numerically solving second-order elliptic PDEs. The method we propose is capable of dealing with either variational and non-variational problems, and because of its meshless nature, it can also deal with problems posed in high-dimensional domains. We prove the $\Gamma$-convergence of the neural network approximation towards the solution of the continuous problem, and extend the convergence proof to some well-known related methods. Finally, we present several numerical examples illustrating the performance of our discretization.
Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advance the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior -- the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converge, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.