The metriplectic formalism is useful for describing complete dynamical systems which conserve energy and produce entropy. This creates challenges for model reduction, as the elimination of high-frequency information will generally not preserve the metriplectic structure which governs long-term stability of the system. Based on proper orthogonal decomposition, a provably convergent metriplectic reduced-order model is formulated which is guaranteed to maintain the algebraic structure necessary for energy conservation and entropy formation. Numerical results on benchmark problems show that the proposed method is remarkably stable, leading to improved accuracy over long time scales at a moderate increase in cost over naive methods.
The energy consumption of wireless networks is a growing concern. In massive MIMO systems, which are being increasingly deployed as part of the 5G roll-out, the power amplifiers in the base stations have a large impact in terms of power demands. Most of the current massive MIMO precoders are designed to minimize the transmit power. However, the efficiency of the power amplifiers depend on their operating regime with respect to their saturation regime, and the consumed power proves to be non-linearly related to the transmit power. Power consumption-based equivalents of maximum ratio transmission, zero-forcing, and regularized zero-forcing precoders are therefore proposed. We show how the structure of the solutions radically changes. While all antennas should be active in order to minimize the transmit power, we find on the contrary that a smaller number of antennas should be activated if the objective is the power consumed by the power amplifiers.
In experiments that study social phenomena, such as peer influence or herd immunity, the treatment of one unit may influence the outcomes of others. Such "interference between units" violates traditional approaches for causal inference, so that additional assumptions are often imposed to model or limit the underlying social mechanism. For binary outcomes, we propose an approach that does not require such assumptions, allowing for interference that is both unmodeled and strong, with confidence intervals derived using only the randomization of treatment. However, the estimates will have wider confidence intervals and weaker causal implications than those attainable under stronger assumptions. The approach allows for the usage of regression, matching, or weighting, as may best fit the application at hand. Inference is done by bounding the distribution of the estimation error over all possible values of the unknown counterfactual, using an integer program. Examples are shown using using a vaccination trial and two experiments investigating social influence.
We consider inference for a collection of partially observed, stochastic, interacting, nonlinear dynamic processes. Each process is identified with a label called its unit, and our primary motivation arises in biological metapopulation systems where a unit corresponds to a spatially distinct sub-population. Metapopulation systems are characterized by strong dependence through time within a single unit and relatively weak interactions between units, and these properties make block particle filters an effective tool for simulation-based likelihood evaluation. Iterated filtering algorithms can facilitate likelihood maximization for simulation-based filters. We introduce a new iterated block particle filter algorithm applicable when parameters are unit-specific or shared between units. We demonstrate this algorithm by performing inference on a coupled epidemiological model describing spatiotemporal measles case report data for twenty towns.
In this paper, we develop a class of mixed finite element methods for the ferrofluid flow model proposed by Shliomis [Soviet Physics JETP, 1972]. We show that the energy stability of the weak solutions to the model is preserved exactly for both the semi- and fully discrete finite element solutions. Furthermore, we prove the existence and uniqueness of the discrete solutions and derive optimal error estimates for both the the semi- and fully discrete schemes. Numerical experiments confirm the theoretical results.
Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be costly and unscalable in practice. Recently, Hessian-free bilevel schemes have been proposed to resolve this issue, where the general idea is to use zeroth- or first-order methods to approximate the full hypergradient of the bilevel problem. However, we empirically observe that such approximation can lead to large variance and unstable training, but estimating only the response Jacobian matrix as a partial component of the hypergradient turns out to be extremely effective. To this end, we propose a new Hessian-free method, which adopts the zeroth-order-like method to approximate the response Jacobian matrix via taking difference between two optimization paths. Theoretically, we provide the convergence rate analysis for the proposed algorithms, where our key challenge is to characterize the approximation and smoothness properties of the trajectory-dependent estimator, which can be of independent interest. This is the first known convergence rate result for this type of Hessian-free bilevel algorithms. Experimentally, we demonstrate that the proposed algorithms outperform baseline bilevel optimizers on various bilevel problems. Particularly, in our experiment on few-shot meta-learning with ResNet-12 network over the miniImageNet dataset, we show that our algorithm outperforms baseline meta-learning algorithms, while other baseline bilevel optimizers do not solve such meta-learning problems within a comparable time frame.
This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems remains a challenging task. At a cost of NP-hard offline planning, prior works have introduced online convex optimization algorithms, although they suffer from nonparametric rate of regret. In this paper, we propose the first computationally tractable online algorithm with regret guarantees that avoids offline planning over the state linear feedback policies. Our algorithm is based on the optimism in the face of uncertainty (OFU) principle in which we optimistically select the best model in a high confidence region. Our algorithm is then more explorative when compared to previous approaches. To overcome non-stationarity, we propose either a restarting strategy (R-OFU) or a sliding window (SW-OFU) strategy. With proper configuration, our algorithm is attains sublinear regret $O(T^{2/3})$. These algorithms utilize data from the current phase for tracking variations on the system dynamics. We corroborate our theoretical findings with numerical experiments, which highlight the effectiveness of our methods. To the best of our knowledge, our study establishes the first model-based online algorithm with regret guarantees under LTV dynamical systems.
In this paper, we study the trace regression when a matrix of parameters B* is estimated via convex relaxation of a rank-penalized regression or via non-convex optimization. It is known that these estimators satisfy near-optimal error bounds under assumptions on rank, coherence, or spikiness of B*. We start by introducing a general notion of spikiness for B* that provides a generic recipe to prove restricted strong convexity for the sampling operator of the trace regression and obtain near-optimal and non-asymptotic error bounds for the estimation error. Similar to the existing literature, these results require the penalty parameter to be above a certain theory-inspired threshold that depends on the observation noise and the sampling operator which may be unknown in practice. Next, we extend the error bounds to the cases when the regularization parameter is chosen via cross-validation. This result is significant in that existing theoretical results on cross-validated estimators do not apply to our setting since the estimators we study are not known to satisfy their required notion of stability. Finally, using simulations on synthetic and real data, we show that the cross-validated estimator selects a nearly-optimal penalty parameter and outperforms the theory-inspired approach of selecting the parameter.
Hierarchical matrices provide a powerful representation for significantly reducing the computational complexity associated with dense kernel matrices. For general kernel functions, interpolation-based methods are widely used for the efficient construction of hierarchical matrices. In this paper, we present a fast hierarchical data reduction (HiDR) procedure with $O(n)$ complexity for the memory-efficient construction of hierarchical matrices with nested bases where $n$ is the number of data points. HiDR aims to reduce the given data in a hierarchical way so as to obtain $O(1)$ representations for all nearfield and farfield interactions. Based on HiDR, a linear complexity $\mathcal{H}^2$ matrix construction algorithm is proposed. The use of data-driven methods enables {better efficiency than other general-purpose methods} and flexible computation without accessing the kernel function. Experiments demonstrate significantly improved memory efficiency of the proposed data-driven method compared to interpolation-based methods over a wide range of kernels. Though the method is not optimized for any special kernel, benchmark experiments for the Coulomb kernel show that the proposed general-purpose algorithm offers competitive performance for hierarchical matrix construction compared to several state-of-the-art algorithms for the Coulomb kernel.
This work proposes a hyper-reduction method for nonlinear parametric dynamical systems characterized by gradient fields such as (port-)Hamiltonian systems and gradient flows. The gradient structure is associated with conservation of invariants or with dissipation and hence plays a crucial role in the description of the physical properties of the system. Traditional hyper-reduction of nonlinear gradient fields yields efficient approximations that, however, lack the gradient structure. We focus on Hamiltonian gradients and we propose to first decompose the nonlinear part of the Hamiltonian, mapped into a suitable reduced space, into the sum of d terms, each characterized by a sparse dependence on the system state. Then, the hyper-reduced approximation is obtained via discrete empirical interpolation (DEIM) of the Jacobian of the derived d-valued nonlinear function. The resulting hyper-reduced model retains the gradient structure and its computationally complexity is independent of the size of the full model. Moreover, a priori error estimates show that the hyper-reduced model converges to the reduced model and the Hamiltonian is asymptotically preserved. Whenever the nonlinear Hamiltonian gradient is not globally reducible, i.e. its evolution requires high-dimensional DEIM approximation spaces, an adaptive strategy is performed. This consists in updating the hyper-reduced Hamiltonian via a low-rank correction of the DEIM basis. Numerical tests demonstrate the applicability of the proposed approach to general nonlinear operators and runtime speedups compared to the full and the reduced models.
Gradient based meta-learning methods are prone to overfit on the meta-training set, and this behaviour is more prominent with large and complex networks. Moreover, large networks restrict the application of meta-learning models on low-power edge devices. While choosing smaller networks avoid these issues to a certain extent, it affects the overall generalization leading to reduced performance. Clearly, there is an approximately optimal choice of network architecture that is best suited for every meta-learning problem, however, identifying it beforehand is not straightforward. In this paper, we present MetaDOCK, a task-specific dynamic kernel selection strategy for designing compressed CNN models that generalize well on unseen tasks in meta-learning. Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task. Rather, each task uses only a fraction of the kernels, and the selection of the kernels per task can be learnt dynamically as a part of the inner update steps. MetaDOCK compresses the meta-model as well as the task-specific inner models, thus providing significant reduction in model size for each task, and through constraining the number of active kernels for every task, it implicitly mitigates the issue of meta-overfitting. We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models. MetaDOCK couples well with popular meta-learning approaches such as iMAML. The efficacy of our method is validated on CIFAR-fs and mini-ImageNet datasets, and we have observed that our approach can provide improvements in model accuracy of up to 2% on standard meta-learning benchmark, while reducing the model size by more than 75%.