Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details. Our preliminary work introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA, CVACT, and VIGOR by a large margin ($16.44\%$, $22.71\%$, and $17.02\%$ without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+.
Application of deep learning methods to physical simulations such as CFD (Computational Fluid Dynamics), have been so far of limited industrial relevance. This paper demonstrates the development and application of a deep learning framework for real-time predictions of the impact of tip clearance variations on the aerodynamic performance of multi-stage axial compressors in gas turbines. The proposed C(NN)FD architecture is proven to be scalable to industrial applications, and achieves in real-time accuracy comparable to the CFD benchmark. The deployed model, is readily integrated within the manufacturing and build process of gas turbines, thus providing the opportunity to analytically assess the impact on performance and potentially reduce requirements for expensive physical tests.
This work explores the relationship between the set of Wardrop equilibria~(WE) of a routing game, the total demand of that game, and the occurrence of Braess's paradox~(BP). The BP formalizes the counter-intuitive fact that for some networks, removing a path from the network decreases congestion at WE. For a single origin-destination routing games with affine cost functions, the first part of this work provides tools for analyzing the evolution of the WE as the demand varies. It characterizes the piece-wise affine nature of this dependence by showing that the set of directions in which the WE can vary in each piece is the solution of a variational inequality problem. In the process we establish various properties of changes in the set of used and minimal-cost paths as demand varies. As a consequence of these characterizations, we derive a procedure to obtain the WE for all demands above a certain threshold. The second part of the paper deals with detecting the presence of BP in a network. We supply a number of sufficient conditions that reveal the presence of BP and that are computationally tractable. We also discuss a different perspective on BP, where we establish that a path causing BP at a particular demand must be strictly beneficial to the network at a lower demand. Several examples throughout this work illustrate and elaborate our findings.
We couple the L1 discretization of the Caputo fractional derivative in time with the Galerkin scheme to devise a linear numerical method for the semilinear subdiffusion equation. Two important points that we make are: nonsmooth initial data and time-dependent diffusion coefficient. We prove the stability and convergence of the method under weak assumptions concerning regularity of the diffusivity. We find optimal pointwise in space and global in time errors, which are verified with several numerical experiments.
The investigation of mixture models is a key to understand and visualize the distribution of multivariate data. Most mixture models approaches are based on likelihoods, and are not adapted to distribution with finite support or without a well-defined density function. This study proposes the Augmented Quantization method, which is a reformulation of the classical quantization problem but which uses the p-Wasserstein distance. This metric can be computed in very general distribution spaces, in particular with varying supports. The clustering interpretation of quantization is revisited in a more general framework. The performance of Augmented Quantization is first demonstrated through analytical toy problems. Subsequently, it is applied to a practical case study involving river flooding, wherein mixtures of Dirac and Uniform distributions are built in the input space, enabling the identification of the most influential variables.
Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when developing the code, better research productivity pleads for a high-level programming language. A popular approach consists of combining Python, used for the high-level interface, and C++, used for the computing intensive part of the code. A more convenient and efficient approach would be to use a language that provides both high-level programming and high-performance. The Julia programming language, developed at MIT especially to allow the use of a single language in research activities, has followed this path. In this paper the applicability of using the Julia language for HEP research is explored, covering the different aspects that are important for HEP code development: runtime performance, handling of large projects, interface with legacy code, distributed computing, training, and ease of programming. The study shows that the HEP community would benefit from a large scale adoption of this programming language. The HEP-specific foundation libraries that would need to be consolidated are identified
This paper addresses the problem of designing the {\it continuous-discrete} unscented Kalman filter (UKF) implementation methods. More precisely, the aim is to propose the MATLAB-based UKF algorithms for {\it accurate} and {\it robust} state estimation of stochastic dynamic systems. The accuracy of the {\it continuous-discrete} nonlinear filters heavily depends on how the implementation method manages the discretization error arisen at the filter prediction step. We suggest the elegant and accurate implementation framework for tracking the hidden states by utilizing the MATLAB built-in numerical integration schemes developed for solving ordinary differential equations (ODEs). The accuracy is boosted by the discretization error control involved in all MATLAB ODE solvers. This keeps the discretization error below the tolerance value provided by users, automatically. Meanwhile, the robustness of the UKF filtering methods is examined in terms of the stability to roundoff. In contrast to the pseudo-square-root UKF implementations established in engineering literature, which are based on the one-rank Cholesky updates, we derive the stable square-root methods by utilizing the $J$-orthogonal transformations for calculating the Cholesky square-root factors.
Summarizing book-length documents (>100K tokens) that exceed the context window size of large language models (LLMs) requires first breaking the input document into smaller chunks and then prompting an LLM to merge, update, and compress chunk-level summaries. Despite the complexity and importance of this task, it has yet to be meaningfully studied due to the challenges of evaluation: existing book-length summarization datasets (e.g., BookSum) are in the pretraining data of most public LLMs, and existing evaluation methods struggle to capture errors made by modern LLM summarizers. In this paper, we present the first study of the coherence of LLM-based book-length summarizers implemented via two prompting workflows: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. We obtain 1193 fine-grained human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. Because human evaluation is expensive and time-consuming, we develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types. BooookScore has high agreement with human annotations and allows us to systematically evaluate the impact of many other critical parameters (e.g., chunk size, base LLM) while saving $15K and 500 hours in human evaluation costs. We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than the oft-repetitive ones generated by LLaMA 2. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by human annotators. We release code and annotations after blind review to spur more principled research on book-length summarization.
As an optical processor, a Diffractive Deep Neural Network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees-of-freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent light. Similarly, D2NNs can also perform arbitrary linear intensity transformations with spatially incoherent illumination; however, under spatially incoherent light, these transformations are non-negative, acting on diffraction-limited optical intensity patterns at the input field-of-view (FOV). Here, we expand the use of spatially incoherent D2NNs to complex-valued information processing for executing arbitrary complex-valued linear transformations using spatially incoherent light. Through simulations, we show that as the number of optimized diffractive features increases beyond a threshold dictated by the multiplication of the input and output space-bandwidth products, a spatially incoherent diffractive visual processor can approximate any complex-valued linear transformation and be used for all-optical image encryption using incoherent illumination. The findings are important for the all-optical processing of information under natural light using various forms of diffractive surface-based optical processors.
High-quality samples generated with score-based reverse diffusion algorithms provide evidence that deep neural networks (DNN) trained for denoising can learn high-dimensional densities, despite the curse of dimensionality. However, recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we show that two denoising DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, with a surprisingly small number of training images. This strong generalization demonstrates an alignment of powerful inductive biases in the DNN architecture and/or training algorithm with properties of the data distribution. We analyze these, demonstrating that the denoiser performs a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous image regions. We show that trained denoisers are inductively biased towards these geometry-adaptive harmonic representations by demonstrating that they arise even when the network is trained on image classes such as low-dimensional manifolds, for which the harmonic basis is suboptimal. Additionally, we show that the denoising performance of the networks is near-optimal when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic.
Degradation of image quality due to the presence of haze is a very common phenomenon. Existing DehazeNet [3], MSCNN [11] tackled the drawbacks of hand crafted haze relevant features. However, these methods have the problem of color distortion in gloomy (poor illumination) environment. In this paper, a cardinal (red, green and blue) color fusion network for single image haze removal is proposed. In first stage, network fusses color information present in hazy images and generates multi-channel depth maps. The second stage estimates the scene transmission map from generated dark channels using multi channel multi scale convolutional neural network (McMs-CNN) to recover the original scene. To train the proposed network, we have used two standard datasets namely: ImageNet [5] and D-HAZY [1]. Performance evaluation of the proposed approach has been carried out using structural similarity index (SSIM), mean square error (MSE) and peak signal to noise ratio (PSNR). Performance analysis shows that the proposed approach outperforms the existing state-of-the-art methods for single image dehazing.