Consider the task of estimating a random vector $X$ from noisy observations $Y = X + Z$, where $Z$ is a standard normal vector, under the $L^p$ fidelity criterion. This work establishes that, for $1 \leq p \leq 2$, the optimal Bayesian estimator is linear and positive definite if and only if the prior distribution on $X$ is a (non-degenerate) multivariate Gaussian. Furthermore, for $p > 2$, it is demonstrated that there are infinitely many priors that can induce such an estimator.
Equipping the rototranslation group $SE(2)$ with a sub-Riemannian structure inspired by the visual cortex V1, we propose algorithms for image inpainting and enhancement based on hypoelliptic diffusion. We innovate on previous implementations of the methods by Citti, Sarti, and Boscain et al., by proposing an alternative that prevents fading and is capable of producing sharper results in a procedure that we call WaxOn-WaxOff. We also exploit the sub-Riemannian structure to define a completely new unsharp filter using $SE(2)$, analogous to the classical unsharp filter for 2D image processing. We demonstrate our method on blood vessels enhancement in retinal scans.
The solution in sense of Prager&Synge is the alternative to the commonly used notion of the numerical solution, which is considered as a limit of grid functions at mesh refinement. Prager&Synge solution is defined as a hypersphere containing the projection of the true solution of the system of partial differentiation equations (PDE) onto the computational grid and does not use any asymptotics. In the original variant it is determined using orthogonal properties specific for certain equations. In the proposed variant, the center and radius of the hypersphere is estimated using the ensemble of numerical solutions obtained by independent algorithms. This approach may be easily expanded for solutions of an arbitrary system of partial differentiation equations that significantly expands the domain of its applicability. Several options for the computation of the Prager&Synge solution are considered and compared herein. The first one is based on the search for the orthogonal truncation errors and their transformation. The second is based on the orthogonalization of approximation errors obtained using the defect correction method and applies a superposition of numerical solutions. These options are intrusive. In third option (nonintrusive) the information regarding orthogonality of errors, which is crucial for the Prager&Synge approach method, is replaced by information that stems from the properties of the ensemble of numerical solutions, obtained by independent numerical algorithms. The values of the angle between the truncation errors on such ensemble or the distances between elements of the ensemble may be used to replace the orthogonality. The variant based on the width of the ensemble of independent numerical solutions does not require any additional a priori information and is the approximate nonintrusive version of the method based on the orthogonalization of approximation errors.
Although there is mounting empirical evidence for the increase in affective polarization, few mechanistic models can explain its emergence at the population level. The question of how such a phenomenon can emerge from divergent opinions of a population on an ideological issue is still an open issue. In this paper, we establish that human normativity, that is, individual expression of normative opinions based on beliefs about the population, can lead to population-level polarization when ideological institutions distort beliefs in accordance with their objective of moving expressed opinion to one extreme. Using a game-theoretic model, we establish that individuals with more extreme opinions will have more extreme rhetoric and higher misperceptions about their outgroup members. Our model also shows that when social recommendation systems mediate institutional signals, we can observe the formation of different institutional communities, each with its unique community structure and characteristics. Using the model, we identify practical strategies platforms can implement, such as reducing exposure to signals from ideological institutions and a tailored approach to content moderation, both of which can rectify the affective polarization problem within its purview.
Most commonly used $f$-divergences of measures, e.g., the Kullback-Leibler divergence, are subject to limitations regarding the support of the involved measures. A remedy consists of regularizing the $f$-divergence by a squared maximum mean discrepancy (MMD) associated with a characteristic kernel $K$. In this paper, we use the so-called kernel mean embedding to show that the corresponding regularization can be rewritten as the Moreau envelope of some function in the reproducing kernel Hilbert space associated with $K$. Then, we exploit well-known results on Moreau envelopes in Hilbert spaces to prove properties of the MMD-regularized $f$-divergences and, in particular, their gradients. Subsequently, we use our findings to analyze Wasserstein gradient flows of MMD-regularized $f$-divergences. Finally, we consider Wasserstein gradient flows starting from empirical measures. We provide proof-of-the-concept numerical examples for $f$-divergences with both infinite and finite recession constant.
Quantum Relative Entropy (QRE) programming is a recently popular and challenging class of convex optimization problems with significant applications in quantum computing and quantum information theory. We are interested in modern interior point (IP) methods based on optimal self-concordant barriers for the QRE cone. A range of theoretical and numerical challenges associated with such barrier functions and the QRE cones have hindered the scalability of IP methods. To address these challenges, we propose a series of numerical and linear algebraic techniques and heuristics aimed at enhancing the efficiency of gradient and Hessian computations for the self-concordant barrier function, solving linear systems, and performing matrix-vector products. We also introduce and deliberate about some interesting concepts related to QRE such as symmetric quantum relative entropy (SQRE). We also introduce a two-phase method for performing facial reduction that can significantly improve the performance of QRE programming. Our new techniques have been implemented in the latest version (DDS 2.2) of the software package DDS. In addition to handling QRE constraints, DDS accepts any combination of several other conic and non-conic convex constraints. Our comprehensive numerical experiments encompass several parts including 1) a comparison of DDS 2.2 with Hypatia for the nearest correlation matrix problem, 2) using DDS for combining QRE constraints with various other constraint types, and 3) calculating the key rate for quantum key distribution (QKD) channels and presenting results for several QKD protocols.
A recent development in Bayesian optimization is the use of local optimization strategies, which can deliver strong empirical performance on high-dimensional problems compared to traditional global strategies. The "folk wisdom" in the literature is that the focus on local optimization sidesteps the curse of dimensionality; however, little is known concretely about the expected behavior or convergence of Bayesian local optimization routines. We first study the behavior of the local approach, and find that the statistics of individual local solutions of Gaussian process sample paths are surprisingly good compared to what we would expect to recover from global methods. We then present the first rigorous analysis of such a Bayesian local optimization algorithm recently proposed by M\"uller et al. (2021), and derive convergence rates in both the noisy and noiseless settings.
Generalizations and variations of the fundamental lemma by Willems et al. are an active topic of recent research. In this note, we explore and formalize the links between kernel regression and known nonlinear extensions of the fundamental lemma. Applying a transformation to the usual linear equation in Hankel matrices, we arrive at an alternative implicit kernel representation of the system trajectories while keeping the requirements on persistency of excitation. We show that this representation is equivalent to the solution of a specific kernel regression problem. We explore the possible structures of the underlying kernel as well as the system classes to which they correspond.
The moving discontinuous Galerkin method with interface condition enforcement (MDG-ICE) is a high-order, r-adaptive method that treats the grid as a variable and weakly enforces the conservation law, constitutive law, and corresponding interface conditions in order to implicitly fit high-gradient flow features. In this paper, we develop an optimization solver based on the Levenberg-Marquardt algorithm that features an anisotropic, locally adaptive penalty method to enhance robustness and prevent cell degeneration in the computation of hypersonic, viscous flows. Specifically, we incorporate an anisotropic grid regularization based on the mesh-implied metric that inhibits grid motion in directions with small element length scales, an element shape regularization that inhibits nonlinear deformations of the high-order elements, and a penalty regularization that penalizes degenerate elements. Additionally, we introduce a procedure for locally scaling the regularization operators in an adaptive, elementwise manner in order to maintain grid validity. We apply the proposed MDG-ICE formulation to two- and three-dimensional test cases involving viscous shocks and/or boundary layers, including Mach 17.6 hypersonic viscous flow over a circular cylinder and Mach 5 hypersonic viscous flow over a sphere, which are very challenging test cases for conventional numerical schemes on simplicial grids. Even without artificial dissipation, the computed solutions are free from spurious oscillations and yield highly symmetric surface heat-flux profiles.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.