Phase retrieval is the nonlinear inverse problem of recovering a true signal from its Fourier magnitude measurements. It arises in many applications such as astronomical imaging, X-Ray crystallography, microscopy, and more. The problem is highly ill-posed due to the phase-induced ambiguities and the large number of possible images that can fit to the given measurements. Thus, there's a rich history of enforcing structural priors to improve solutions including sparsity priors and deep-learning-based generative models. However, such priors are often limited in their representational capacity or generalizability to slightly different distributions. Recent advancements in using denoisers as regularizers for non-convex optimization algorithms have shown promising performance and generalization. We present a way of leveraging the prior implicitly learned by a denoiser to solve phase retrieval problems by incorporating it in a classical alternating minimization framework. Compared to performant denoising-based algorithms for phase retrieval, we showcase competitive performance with Fourier measurements on in-distribution images and notable improvement on out-of-distribution images.
Fast development in science and technology has driven the need for proper statistical tools to capture special data features such as abrupt changes or sharp contrast. Many applications in the data science seek spatiotemporal reconstruction from a sequence of time-dependent objects with discontinuity or singularity, e.g. dynamic computerized tomography (CT) images with edges. Traditional methods based on Gaussian processes (GP) may not provide satisfactory solutions since they tend to offer over-smooth prior candidates. Recently, Besov process (BP) defined by wavelet expansions with random coefficients has been proposed as a more appropriate prior for this type of Bayesian inverse problems. While BP outperforms GP in imaging analysis to produce edge-preserving reconstructions, it does not automatically incorporate temporal correlation inherited in the dynamically changing images. In this paper, we generalize BP to the spatiotemporal domain (STBP) by replacing the random coefficients in the series expansion with stochastic time functions following Q-exponential process which governs the temporal correlation strength. Mathematical and statistical properties about STBP are carefully studied. A white-noise representation of STBP is also proposed to facilitate the point estimation through maximum a posterior (MAP) and the uncertainty quantification (UQ) by posterior sampling. Two limited-angle CT reconstruction examples and a highly non-linear inverse problem involving Navier-Stokes equation are used to demonstrate the advantage of the proposed STBP in preserving spatial features while accounting for temporal changes compared with the classic STGP and a time-uncorrelated approach.
Visual-inertial odometry (VIO) is the most common approach for estimating the state of autonomous micro aerial vehicles using only onboard sensors. Existing methods improve VIO performance by including a dynamics model in the estimation pipeline. However, such methods degrade in the presence of low-fidelity vehicle models and continuous external disturbances, such as wind. Our proposed method, HDVIO, overcomes these limitations by using a hybrid dynamics model that combines a point-mass vehicle model with a learning-based component that captures complex aerodynamic effects. HDVIO estimates the external force and the full robot state by leveraging the discrepancy between the actual motion and the predicted motion of the hybrid dynamics model. Our hybrid dynamics model uses a history of thrust and IMU measurements to predict the vehicle dynamics. To demonstrate the performance of our method, we present results on both public and novel drone dynamics datasets and show real-world experiments of a quadrotor flying in strong winds up to 25 km/h. The results show that our approach improves the motion and external force estimation compared to the state-of-the-art by up to 33% and 40%, respectively. Furthermore, differently from existing methods, we show that it is possible to predict the vehicle dynamics accurately while having no explicit knowledge of its full state.
While generative modeling has been ubiquitous in natural language processing and computer vision, its application to image retrieval remains unexplored. In this paper, we recast image retrieval as a form of generative modeling by employing a sequence-to-sequence model, contributing to the current unified theme. Our framework, IRGen, is a unified model that enables end-to-end differentiable search, thus achieving superior performance thanks to direct optimization. While developing IRGen we tackle the key technical challenge of converting an image into quite a short sequence of semantic units in order to enable efficient and effective retrieval. Empirical experiments demonstrate that our model yields significant improvement over three commonly used benchmarks, for example, 22.9\% higher than the best baseline method in precision@10 on In-shop dataset with comparable recall@10 score.
Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generation models and represents a new paradigm distinct from traditional learning-to-rank methods. However, despite its rapid development, current generative retrieval methods are still limited. They typically rely on a heuristic function to transform predicted identifiers into a passage rank list, which creates a gap between the learning objective of generative retrieval and the desired passage ranking target. Moreover, the inherent exposure bias problem of text generation also persists in generative retrieval. To address these issues, we propose a novel framework, called LTRGR, that combines generative retrieval with the classical learning-to-rank paradigm. Our approach involves training an autoregressive model using a passage rank loss, which directly optimizes the autoregressive model toward the optimal passage ranking. This framework only requires an additional training step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conducted experiments on three public datasets, and our results demonstrate that LTRGR achieves state-of-the-art performance among generative retrieval methods, indicating its effectiveness and robustness.
Phase retrieval is the numerical procedure of recovering a complex-valued signal from knowledge about its amplitude and some additional information. Here, an indirect registration procedure, based on the large deformation diffeomorphic metric mapping (LDDMM) formalism, is investigated as a phase retrieval method for coherent diffractive imaging. The method attempts to find a deformation which transforms an initial, template image to match an unknown target image by comparing the diffraction pattern to the data. The exterior calculus framework is used to treat different types of deformations in a unified and coordinate-free way. The algorithm performance with respect to measurement noise, image topology, and particular action are explored through numerical examples.
Autonomous racing control is a challenging research problem as vehicles are pushed to their limits of handling to achieve an optimal lap time; therefore, vehicles exhibit highly nonlinear and complex dynamics. Difficult-to-model effects, such as drifting, aerodynamics, chassis weight transfer, and suspension can lead to infeasible and suboptimal trajectories. While offline planning allows optimizing a full reference trajectory for the minimum lap time objective, such modeling discrepancies are particularly detrimental when using offline planning, as planning model errors compound with controller modeling errors. Gaussian Process Regression (GPR) can compensate for modeling errors. However, previous works primarily focus on modeling error in real-time control without consideration for how the model used in offline planning can affect the overall performance. In this work, we propose a double-GPR error compensation algorithm to reduce model uncertainties; specifically, we compensate both the planner's model and controller's model with two respective GPR-based error compensation functions. Furthermore, we design an iterative framework to re-collect error-rich data using the racing control system. We test our method in the high-fidelity racing simulator Gran Turismo Sport (GTS); we find that our iterative, double-GPR compensation functions improve racing performance and iteration stability in comparison to a single compensation function applied merely for real-time control.
Unobserved confounding is a fundamental obstacle to establishing valid causal conclusions from observational data. Two complementary types of approaches have been developed to address this obstacle: obtaining identification using fortuitous external aids, such as instrumental variables or proxies, or by means of the ID algorithm, using Markov restrictions on the full data distribution encoded in graphical causal models. In this paper we aim to develop a synthesis of the former and latter approaches to identification in causal inference to yield the most general identification algorithm in multivariate systems currently known -- the proximal ID algorithm. In addition to being able to obtain nonparametric identification in all cases where the ID algorithm succeeds, our approach allows us to systematically exploit proxies to adjust for the presence of unobserved confounders that would have otherwise prevented identification. In addition, we outline a class of estimation strategies for causal parameters identified by our method in an important special case. We illustrate our approach by simulation studies and a data application.
We propose to use L\'evy {\alpha}-stable distributions for constructing priors for Bayesian inverse problems. The construction is based on Markov fields with stable-distributed increments. Special cases include the Cauchy and Gaussian distributions, with stability indices {\alpha} = 1, and {\alpha} = 2, respectively. Our target is to show that these priors provide a rich class of priors for modelling rough features. The main technical issue is that the {\alpha}-stable probability density functions do not have closed-form expressions in general, and this limits their applicability. For practical purposes, we need to approximate probability density functions through numerical integration or series expansions. Current available approximation methods are either too time-consuming or do not function within the range of stability and radius arguments needed in Bayesian inversion. To address the issue, we propose a new hybrid approximation method for symmetric univariate and bivariate {\alpha}-stable distributions, which is both fast to evaluate, and accurate enough from a practical viewpoint. Then we use approximation method in the numerical implementation of {\alpha}-stable random field priors. We demonstrate the applicability of the constructed priors on selected Bayesian inverse problems which include the deconvolution problem, and the inversion of a function governed by an elliptic partial differential equation. We also demonstrate hierarchical {\alpha}-stable priors in the one-dimensional deconvolution problem. We employ maximum-a-posterior-based estimation at all the numerical examples. To that end, we exploit the limited-memory BFGS and its bounded variant for the estimator.
Foley sound generation aims to synthesise the background sound for multimedia content. Previous models usually employ a large development set with labels as input (e.g., single numbers or one-hot vector). In this work, we propose a diffusion model based system for Foley sound generation with text conditions. To alleviate the data scarcity issue, our model is initially pre-trained with large-scale datasets and fine-tuned to this task via transfer learning using the contrastive language-audio pertaining (CLAP) technique. We have observed that the feature embedding extracted by the text encoder can significantly affect the performance of the generation model. Hence, we introduce a trainable layer after the encoder to improve the text embedding produced by the encoder. In addition, we further refine the generated waveform by generating multiple candidate audio clips simultaneously and selecting the best one, which is determined in terms of the similarity score between the embedding of the candidate clips and the embedding of the target text label. Using the proposed method, our system ranks ${1}^{st}$ among the systems submitted to DCASE Challenge 2023 Task 7. The results of the ablation studies illustrate that the proposed techniques significantly improve sound generation performance. The codes for implementing the proposed system are available online.
Inversion by Direct Iteration (InDI) is a new formulation for supervised image restoration that avoids the so-called ``regression to the mean'' effect and produces more realistic and detailed images than existing regression-based methods. It does this by gradually improving image quality in small steps, similar to generative denoising diffusion models. Image restoration is an ill-posed problem where multiple high-quality images are plausible reconstructions of a given low-quality input. Therefore, the outcome of a single step regression model is typically an aggregate of all possible explanations, therefore lacking details and realism. The main advantage of InDI is that it does not try to predict the clean target image in a single step but instead gradually improves the image in small steps, resulting in better perceptual quality. While generative denoising diffusion models also work in small steps, our formulation is distinct in that it does not require knowledge of any analytic form of the degradation process. Instead, we directly learn an iterative restoration process from low-quality and high-quality paired examples. InDI can be applied to virtually any image degradation, given paired training data. In conditional denoising diffusion image restoration the denoising network generates the restored image by repeatedly denoising an initial image of pure noise, conditioned on the degraded input. Contrary to conditional denoising formulations, InDI directly proceeds by iteratively restoring the input low-quality image, producing high-quality results on a variety of image restoration tasks, including motion and out-of-focus deblurring, super-resolution, compression artifact removal, and denoising.