Solving optimization problems with transient PDE-constraints is computationally costly due to the number of nonlinear iterations and the cost of solving large-scale KKT matrices. These matrices scale with the size of the spatial discretization times the number of time steps. We propose a new two level domain decomposition preconditioner to solve these linear systems when constrained by the heat equation. Our approach leverages the observation that the Schur-complement is elliptic in time, and thus amenable to classical domain decomposition methods. Further, the application of the preconditioner uses existing time integration routines to facilitate implementation and maximize software reuse. The performance of the preconditioner is examined in an empirical study demonstrating the approach is scalable with respect to the number of time steps and subdomains.
We consider Biot model with block preconditioners and generalized eigenvalue problems for scalability and robustness to parameters. A discontinuous Galerkin discretization is employed with the displacement and Darcy flow flux discretized as piecewise continuous in $P_1$ elements, and the pore pressure as piecewise constant in the $P_0$ element with a stabilizing term. Parallel algorithms are designed to solve the resulting linear system. Specifically, the GMRES method is employed as the outer iteration algorithm and block-triangular preconditioners are designed to accelerate the convergence. In the preconditioners, the elliptic operators are further approximated by using incomplete Cholesky factorization or two-level additive overlapping Schwartz method where coarse grids are constructed by generalized eigenvalue problems in the overlaps (GenEO). Extensive numerical experiments show a scalability and parametric robustness of the resulting parallel algorithms.
In the automotive industry, the sequence of vehicles to be produced is determined ahead of the production day. However, there are some vehicles, failed vehicles, that cannot be produced due to some reasons such as material shortage or paint failure. These vehicles are pulled out of the sequence, and the vehicles in the succeeding positions are moved forward, potentially resulting in challenges for logistics or other scheduling concerns. This paper proposes a two-stage stochastic program for the mixed-model sequencing (MMS) problem with stochastic product failures, and provides improvements to the second-stage problem. To tackle the exponential number of scenarios, we employ the sample average approximation approach and two solution methodologies. On one hand, we develop an L-shaped decomposition-based algorithm, where the computational experiments show its superiority over solving the deterministic equivalent formulation with an off-the-shelf solver. Moreover, we provide a tabu search algorithm in addition to a greedy heuristic to tackle case study instances inspired by our car manufacturer partner. Numerical experiments show that the proposed solution methodologies generate high quality solutions by utilizing a sample of scenarios. Particularly, a robust sequence that is generated by considering car failures can decrease the expected work overload by more than 20\% for both small- and large-sized instances.
The numerical solution of differential equations using machine learning-based approaches has gained significant popularity. Neural network-based discretization has emerged as a powerful tool for solving differential equations by parameterizing a set of functions. Various approaches, such as the deep Ritz method and physics-informed neural networks, have been developed for numerical solutions. Training algorithms, including gradient descent and greedy algorithms, have been proposed to solve the resulting optimization problems. In this paper, we focus on the variational formulation of the problem and propose a Gauss- Newton method for computing the numerical solution. We provide a comprehensive analysis of the superlinear convergence properties of this method, along with a discussion on semi-regular zeros of the vanishing gradient. Numerical examples are presented to demonstrate the efficiency of the proposed Gauss-Newton method.
Given sparse observations of buoy velocities, oceanographers are interested in reconstructing ocean currents away from the buoys and identifying divergences in a current vector field. As a first and modular step, we focus on the time-stationary case - for instance, by restricting to short time periods. Since we expect current velocity to be a continuous but highly non-linear function of spatial location, Gaussian processes (GPs) offer an attractive model. But we show that applying a GP with a standard stationary kernel directly to buoy data can struggle at both current reconstruction and divergence identification, due to some physically unrealistic prior assumptions. To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate the benefits of our method with theory and experiments on synthetic and real ocean data.
In this paper, we are concerned with efficiently solving the sequences of regularized linear least squares problems associated with employing Tikhonov-type regularization with regularization operators designed to enforce edge recovery. An optimal regularization parameter, which balances the fidelity to the data with the edge-enforcing constraint term, is typically not known a priori. This adds to the total number of regularized linear least squares problems that must be solved before the final image can be recovered. Therefore, in this paper, we determine effective multigrid preconditioners for these sequences of systems. We focus our approach on the sequences that arise as a result of the edge-preserving method introduced in [6], where we can exploit an interpretation of the regularization term as a diffusion operator; however, our methods are also applicable in other edge-preserving settings, such as iteratively reweighted least squares problems. Particular attention is paid to the selection of components of the multigrid preconditioner in order to achieve robustness for different ranges of the regularization parameter value. In addition, we present a parameter culling approach that, when used with the L-curve heuristic, reduces the total number of solves required. We demonstrate our preconditioning and parameter culling routines on examples in computed tomography and image deblurring.
The increasingly crowded spectrum has spurred the design of joint radar-communications systems that share hardware resources and efficiently use the radio frequency spectrum. We study a general spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown at the receiver. In this dual-blind deconvolution (DBD) problem, a common receiver admits a multi-carrier wireless communications signal that is overlaid with the radar signal reflected off multiple targets. The communications and radar channels are represented by continuous-valued range-time and Doppler velocities of multiple transmission paths and multiple targets. We exploit the sparsity of both channels to solve the highly ill-posed DBD problem by casting it into a sum of multivariate atomic norms (SoMAN) minimization. We devise a semidefinite program to estimate the unknown target and communications parameters using the theories of positive-hyperoctant trigonometric polynomials (PhTP). Our theoretical analyses show that the minimum number of samples required for near-perfect recovery is dependent on the logarithm of the maximum of number of radar targets and communications paths rather than their sum. We show that our SoMAN method and PhTP formulations are also applicable to more general scenarios such as unsynchronized transmission, the presence of noise, and multiple emitters. Numerical experiments demonstrate great performance enhancements during parameter recovery under different scenarios.
This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input constraints. In this paper, we provide a non-asymptotic error bound for least square estimation when the data trajectory is generated by any nonlinear and/or time-varying policies as long as the generated state and action trajectories are bounded. This significantly generalizes the existing non-asymptotic guarantees for linear system identification, which usually consider i.i.d. random inputs or linear policies. Interestingly, our error bound is consistent with that for linear policies with respect to the dependence on the trajectory length, system dimensions, and excitation levels. Lastly, we demonstrate the applications of our results by safe learning with robust model predictive control and provide numerical analysis.
Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.
This paper aims at revisiting Graph Convolutional Neural Networks by bridging the gap between spectral and spatial design of graph convolutions. We theoretically demonstrate some equivalence of the graph convolution process regardless it is designed in the spatial or the spectral domain. The obtained general framework allows to lead a spectral analysis of the most popular ConvGNNs, explaining their performance and showing their limits. Moreover, the proposed framework is used to design new convolutions in spectral domain with a custom frequency profile while applying them in the spatial domain. We also propose a generalization of the depthwise separable convolution framework for graph convolutional networks, what allows to decrease the total number of trainable parameters by keeping the capacity of the model. To the best of our knowledge, such a framework has never been used in the GNNs literature. Our proposals are evaluated on both transductive and inductive graph learning problems. Obtained results show the relevance of the proposed method and provide one of the first experimental evidence of transferability of spectral filter coefficients from one graph to another. Our source codes are publicly available at: //github.com/balcilar/Spectral-Designed-Graph-Convolutions
Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.