Large Language Models (LLMs) have emerged as powerful generative Artificial Intelligence solutions which can be applied to several fields and areas of work. This paper presents results and reflection of an experiment done to use the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic Analysis. Previous research on this subject has largely worked on conducting deductive analysis. Thematic Analysis is a qualitative method for analysis commonly used in social sciences and it is based on interpretations made by the human analyst(s) and the identification of explicit and latent meanings in qualitative data. Attempting an analysis based on human interpretation with an LLM clearly is a provocation but also a way to learn something about how these systems can or cannot be used in qualitative research. The paper presents the motivations for attempting this emulation, it reflects on how the six steps to a Thematic Analysis proposed by Braun and Clarke can at least partially be reproduced with the LLM and it also reflects on what are the outputs produced by the model. The paper used two existing datasets of open access semi-structured interviews, previously analysed with Thematic Analysis by other researchers. It used the previously produced analysis (and the related themes) to compare with the results produced by the LLM. The results show that the model can infer at least partially some of the main Themes. The objective of the paper is not to replace human analysts in qualitative analysis but to learn if some elements of LLM data manipulation can to an extent be of support for qualitative research.
Single-chain Markov chain Monte Carlo simulates realizations from a Markov chain to estimate expectations with the empirical average. The single-chain simulation is generally of considerable length and restricts many advantages of modern parallel computation. This paper constructs a novel many-short-chains Monte Carlo (MSC) estimator by averaging over multiple independent sums from Markov chains of a guaranteed short length. The computational advantage is the independent Markov chain simulations can be fast and may be run in parallel. The MSC estimator requires an importance sampling proposal and a drift condition on the Markov chain without requiring convergence analysis on the Markov chain. A non-asymptotic error analysis is developed for the MSC estimator under both geometric and multiplicative drift conditions. Empirical performance is illustrated on an autoregressive process and the P\'olya-Gamma Gibbs sampler for Bayesian logistic regression to predict cardiovascular disease.
Diffusion models have recently emerged as a promising framework for Image Restoration (IR), owing to their ability to produce high-quality reconstructions and their compatibility with established methods. Existing methods for solving noisy inverse problems in IR, considers the pixel-wise data-fidelity. In this paper, we propose SaFaRI, a spatial-and-frequency-aware diffusion model for IR with Gaussian noise. Our model encourages images to preserve data-fidelity in both the spatial and frequency domains, resulting in enhanced reconstruction quality. We comprehensively evaluate the performance of our model on a variety of noisy inverse problems, including inpainting, denoising, and super-resolution. Our thorough evaluation demonstrates that SaFaRI achieves state-of-the-art performance on both the ImageNet datasets and FFHQ datasets, outperforming existing zero-shot IR methods in terms of LPIPS and FID metrics.
The expressivity of Graph Neural Networks (GNNs) can be entirely characterized by appropriate fragments of the first order logic. Namely, any query of the two variable fragment of graded modal logic (GC2) interpreted over labeled graphs can be expressed using a GNN whose size depends only on the depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021], this description holds for a family of activation functions, leaving the possibibility for a hierarchy of logics expressible by GNNs depending on the chosen activation function. In this article, we show that such hierarchy indeed exists by proving that GC2 queries cannot be expressed by GNNs with polynomial activation functions. This implies a separation between polynomial and popular non polynomial activations (such as Rectified Linear Units) and answers an open question formulated by [Grohe, 21].
Computational analysis with the finite element method requires geometrically accurate meshes. It is well known that high-order meshes can accurately capture curved surfaces with fewer degrees of freedom in comparison to low-order meshes. Existing techniques for high-order mesh generation typically output meshes with same polynomial order for all elements. However, high order elements away from curvilinear boundaries or interfaces increase the computational cost of the simulation without increasing geometric accuracy. In prior work, we have presented one such approach for generating body-fitted uniform-order meshes that takes a given mesh and morphs it to align with the surface of interest prescribed as the zero isocontour of a level-set function. We extend this method to generate mixed-order meshes such that curved surfaces of the domain are discretized with high-order elements, while low-order elements are used elsewhere. Numerical experiments demonstrate the robustness of the approach and show that it can be used to generate mixed-order meshes that are much more efficient than high uniform-order meshes. The proposed approach is purely algebraic, and extends to different types of elements (quadrilaterals/triangles/tetrahedron/hexahedra) in two- and three-dimensions.
BERT (Bidirectional Encoder Representations from Transformers) has revolutionized the field of natural language processing through its exceptional performance on numerous tasks. Yet, the majority of researchers have mainly concentrated on enhancements related to the model structure, such as relative position embedding and more efficient attention mechanisms. Others have delved into pretraining tricks associated with Masked Language Modeling, including whole word masking. DeBERTa introduced an enhanced decoder adapted for BERT's encoder model for pretraining, proving to be highly effective. We argue that the design and research around enhanced masked language modeling decoders have been underappreciated. In this paper, we propose several designs of enhanced decoders and introduce DrBERT (Decoder-refined BERT), a novel method for modeling training. Typically, a pretrained BERT model is fine-tuned for specific Natural Language Understanding (NLU) tasks. In our approach, we utilize the original BERT model as the encoder, making only changes to the decoder without altering the encoder. This approach does not necessitate extensive modifications to the model's architecture and can be seamlessly integrated into existing fine-tuning pipelines and services, offering an efficient and effective enhancement strategy. Compared to other methods, while we also incur a moderate training cost for the decoder during the pretraining process, our approach does not introduce additional training costs during the fine-tuning phase. We test multiple enhanced decoder structures after pretraining and evaluate their performance on the GLUE benchmark. Our results demonstrate that DrBERT, having only undergone subtle refinements to the model structure during pretraining, significantly enhances model performance without escalating the inference time and serving budget.
Accurately charting the progress of oil production is a problem of great current interest. Oil production is widely known to be cyclical: in any given system, after it reaches its peak, a decline will begin. With this in mind, Marion King Hubbert developed his peak theory in 1956 based on the bell-shaped curve that bears his name. In the present work, we consider a stochasticmodel based on the theory of diffusion processes and associated with the Hubbert curve. The problem of the maximum likelihood estimation of the parameters for this process is also considered. Since a complex system of equations appears, with a solution that cannot be guaranteed by classical numerical procedures, we suggest the use of metaheuristic optimization algorithms such as simulated annealing and variable neighborhood search. Some strategies are suggested for bounding the space of solutions, and a description is provided for the application of the algorithms selected. In the case of the variable neighborhood search algorithm, a hybrid method is proposed in which it is combined with simulated annealing. In order to validate the theory developed here, we also carry out some studies based on simulated data and consider 2 real crude oil production scenarios from Norway and Kazakhstan.
While Large Language Models (LLMs) have proven to be exceptional on a variety of tasks after alignment, they may still produce responses that contradict the context or world knowledge confidently, a phenomenon known as ``hallucination''. In this paper, we demonstrate that reducing the inconsistency between the external knowledge encapsulated in the training data and the intrinsic knowledge inherited in the pretraining corpus could mitigate hallucination in alignment. Specifically, we introduce a novel knowledge consistent alignment (KCA) approach, which involves automatically formulating examinations based on external knowledge for accessing the comprehension of LLMs. For data encompassing knowledge inconsistency, KCA implements several simple yet efficient strategies for processing. We illustrate the superior performance of the proposed KCA approach in mitigating hallucinations across six benchmarks using LLMs of different backbones and scales. Furthermore, we confirm the correlation between knowledge inconsistency and hallucination, signifying the effectiveness of reducing knowledge inconsistency in alleviating hallucinations. Our code, model weights, and data are public at \url{//github.com/fanqiwan/KCA}.
There has recently been an explosion of interest in how "higher-order" structures emerge in complex systems. This "emergent" organization has been found in a variety of natural and artificial systems, although at present the field lacks a unified understanding of what the consequences of higher-order synergies and redundancies are for systems. Typical research treat the presence (or absence) of synergistic information as a dependent variable and report changes in the level of synergy in response to some change in the system. Here, we attempt to flip the script: rather than treating higher-order information as a dependent variable, we use evolutionary optimization to evolve boolean networks with significant higher-order redundancies, synergies, or statistical complexity. We then analyse these evolved populations of networks using established tools for characterizing discrete dynamics: the number of attractors, average transient length, and Derrida coefficient. We also assess the capacity of the systems to integrate information. We find that high-synergy systems are unstable and chaotic, but with a high capacity to integrate information. In contrast, evolved redundant systems are extremely stable, but have negligible capacity to integrate information. Finally, the complex systems that balance integration and segregation (known as Tononi-Sporns-Edelman complexity) show features of both chaosticity and stability, with a greater capacity to integrate information than the redundant systems while being more stable than the random and synergistic systems. We conclude that there may be a fundamental trade-off between the robustness of a systems dynamics and its capacity to integrate information (which inherently requires flexibility and sensitivity), and that certain kinds of complexity naturally balance this trade-off.
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications.
Bias correction can often improve the finite sample performance of estimators. We show that the choice of bias correction method has no effect on the higher-order variance of semiparametrically efficient parametric estimators, so long as the estimate of the bias is asymptotically linear. It is also shown that bootstrap, jackknife, and analytical bias estimates are asymptotically linear for estimators with higher-order expansions of a standard form. In particular, we find that for a variety of estimators the straightforward bootstrap bias correction gives the same higher-order variance as more complicated analytical or jackknife bias corrections. In contrast, bias corrections that do not estimate the bias at the parametric rate, such as the split-sample jackknife, result in larger higher-order variances in the i.i.d. setting we focus on. For both a cross-sectional MLE and a panel model with individual fixed effects, we show that the split-sample jackknife has a higher-order variance term that is twice as large as that of the `leave-one-out' jackknife.