Multimodality of the likelihood in Gaussian mixtures is a well-known problem. The choice of the initial parameter vector for the numerical optimizer may affect whether the optimizer finds the global maximum, or gets trapped in a local maximum of the likelihood. We propose to use Hamiltonian Monte Carlo (HMC) to explore the part of the parameter space which has a high likelihood. Each sampled parameter vector is used as the initial value for quasi-Newton optimizer, and the resulting sample of (maximum) likelihood values is used to determine if the likelihood is multimodal. We use a single simulated data set from a three component bivariate mixture to develop and test the method. We use state-of-the-art HCM software, but experience difficulties when trying to directly apply HMC to the full model with 15 parameters. To improve the mixing of the Markov Chain we explore various tricks, and conclude that for the dataset at hand we have found the global maximum likelihood estimate.
We initiate the design and the analysis of stabilization-free Virtual Element Methods for the laplacian problem written in mixed form. A Virtual Element version of the lowest order Raviart-Thomas Finite Element is considered. To reduce the computational costs, a suitable projection on the gradients of harmonic polynomials is employed. A complete theoretical analysis of stability and convergence is developed in the case of quadrilateral meshes. Some numerical tests highlighting the actual behaviour of the scheme are also provided.
Often linear regression is used to perform mediation analysis. However, in many instances, the underlying relationships may not be linear, as in the case of placental-fetal hormones and fetal development. Although, the exact functional form of the relationship may be unknown, one may hypothesize the general shape of the relationship. For these reasons, we develop a novel shape-restricted inference-based methodology for conducting mediation analysis. This work is motivated by an application in fetal endocrinology where researchers are interested in understanding the effects of pesticide application on birth weight, with human chorionic gonadotropin (hCG) as the mediator. We assume a practically plausible set of nonlinear effects of hCG on the birth weight and a linear relationship between pesticide exposure and hCG, with both exposure-outcome and exposure-mediator models being linear in the confounding factors. Using the proposed methodology on a population-level prenatal screening program data, with hCG as the mediator, we discovered that, while the natural direct effects suggest a positive association between pesticide application and birth weight, the natural indirect effects were negative.
Discrepancies in decision-making between Autonomous Driving Systems (ADS) and human drivers underscore the need for intuitive human gaze predictors to bridge this gap, thereby improving user trust and experience. Existing gaze datasets, despite their value, suffer from noise that hampers effective training. Furthermore, current gaze prediction models exhibit inconsistency across diverse scenarios and demand substantial computational resources, restricting their on-board deployment in autonomous vehicles. We propose a novel adaptive cleansing technique for purging noise from existing gaze datasets, coupled with a robust, lightweight convolutional self-attention gaze prediction model. Our approach not only significantly enhances model generalizability and performance by up to 12.13% but also ensures a remarkable reduction in model complexity by up to 98.2% compared to the state-of-the art, making in-vehicle deployment feasible to augment ADS decision visualization and performance.
Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.
The quadratic complexity of the attention module makes it gradually become the bulk of compute in Transformer-based LLMs during generation. Moreover, the excessive key-value cache that arises when dealing with long inputs also brings severe issues on memory footprint and inference latency. In this work, we propose a plug-and-play approach that is able to incrementally compress the intermediate activation of a specified span of tokens into compact ones, thereby reducing both memory and computational cost when processing subsequent context. Experiments on both in-domain language modeling and zero-shot open-ended document generation demonstrate the advantage of our approach over sparse attention baselines in terms of fluency, n-gram matching, and semantic similarity. At last, we comprehensively profile the benefit of context compression on improving the system throughout. Code is available at //github.com/DRSY/KV_Compression.
We study variants of the secretary problem, where $N$, the number of candidates, is a random variable, and the decision maker wants to maximize the probability of success -- picking the largest number among the $N$ candidates -- using only the relative ranks of the candidates revealed so far. We consider three forms of prior information about $\mathbf p$, the probability distribution of $N$. In the full information setting, we assume $\mathbf p$ to be fully known. In that case, we show that single-threshold type of strategies can achieve $1/e$-approximation to the maximum probability of success among all possible strategies. In the upper bound setting, we assume that $N\leq \overline{n}$ (or $\mathbb E[N]\leq \bar{\mu}$), where $\bar{n}$ (or $\bar{\mu}$) is known. In that case, we show that randomization over single-threshold type of strategies can achieve the optimal worst case probability of success of $\frac{1}{\log(\bar{n})}$ (or $\frac{1}{\log(\bar{\mu})}$) asymptotically. Surprisingly, there is a single-threshold strategy (depending on $\overline{n}$) that can succeed with probability $2/e^2$ for all but an exponentially small fraction of distributions supported on $[\bar{n}]$. In the sampling setting, we assume that we have access to $m$ samples $N^{(1)},\ldots,N^{(m)}\sim_{iid} \mathbf p$. In that case, we show that if $N\leq T$ with probability at least $1-O(\epsilon)$ for some $T\in \mathbb N$, $m\gtrsim \frac{1}{\epsilon^2}\max(\log(\frac{1}{\epsilon}),\epsilon \log(\frac{\log(T)}{\epsilon}))$ is enough to learn a strategy that is at least $\epsilon$-suboptimal, and we provide a lower bound of $\Omega(\frac{1}{\epsilon^2})$, showing that the sampling algorithm is optimal when $\epsilon=O(\frac{1}{\log\log(T)})$.
Floods can be very destructive causing heavy damage to life, property, and livelihoods. Global climate change and the consequent sea-level rise have increased the occurrence of extreme weather events, resulting in elevated and frequent flood risk. Therefore, accurate and timely flood forecasting in coastal river systems is critical to facilitate good flood management. However, the computational tools currently used are either slow or inaccurate. In this paper, we propose a Flood prediction tool using Graph Transformer Network (FloodGTN) for river systems. More specifically, FloodGTN learns the spatio-temporal dependencies of water levels at different monitoring stations using Graph Neural Networks (GNNs) and an LSTM. It is currently implemented to consider external covariates such as rainfall, tide, and the settings of hydraulic structures (e.g., outflows of dams, gates, pumps, etc.) along the river. We use a Transformer to learn the attention given to external covariates in computing water levels. We apply the FloodGTN tool to data from the South Florida Water Management District, which manages a coastal area prone to frequent storms and hurricanes. Experimental results show that FloodGTN outperforms the physics-based model (HEC-RAS) by achieving higher accuracy with 70% improvement while speeding up run times by at least 500x.
The problem of overdispersed claim counts and mismeasured covariates is common in insurance. On the one hand, the presence of overdispersion in the count data violates the homogeneity assumption, and on the other hand, measurement errors in covariates highlight the model risk issue in actuarial practice. The consequence can be inaccurate premium pricing which would negatively affect business competitiveness. Our goal is to address these two modelling problems simultaneously by capturing the unobservable correlations between observations that arise from overdispersed outcome and mismeasured covariate in actuarial process. To this end, we establish novel connections between the count-based generalized linear mixed model (GLMM) and a popular error-correction tool for non-linear modelling - Simulation Extrapolation (SIMEX). We consider a modelling framework based on the hierarchical Bayesian paradigm. To our knowledge, the approach of combining a hierarchical Bayes with SIMEX has not previously been discussed in the literature. We demonstrate the applicability of our approach on the workplace absenteeism data. Our results indicate that the hierarchical Bayesian GLMM incorporated with the SIMEX outperforms naive GLMM / SIMEX in terms of goodness of fit.
Steganography is the art of hiding information in plain sight. This form of covert communication can be used by bad actors to propagate malware, exfiltrate victim data, and communicate with other bad actors. Current image steganography defenses rely upon steganalysis, or the detection of hidden messages. These methods, however, are non-blind as they require information about known steganography techniques and are easily bypassed. Recent work has instead focused on a defense mechanism known as sanitization, which eliminates hidden information from images. In this work, we introduce a novel blind deep learning steganography sanitization method that utilizes a diffusion model framework to sanitize universal and dependent steganography (DM-SUDS), which both sanitizes and preserves image quality. We evaluate this approach against state-of-the-art deep learning sanitization frameworks and provide further detailed analysis through an ablation study. DM-SUDS outperforms previous sanitization methods and improves image preservation MSE by 71.32%, PSNR by 22.43% and SSIM by 17.30%. This is the first blind deep learning image sanitization framework to meet these image quality results.
Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.