As large language models (LLMs) become more capable, there is growing excitement about the possibility of using LLMs as proxies for humans in real-world tasks where subjective labels are desired, such as in surveys and opinion polling. One widely-cited barrier to the adoption of LLMs is their sensitivity to prompt wording - but interestingly, humans also display sensitivities to instruction changes in the form of response biases. As such, we argue that if LLMs are going to be used to approximate human opinions, it is necessary to investigate the extent to which LLMs also reflect human response biases, if at all. In this work, we use survey design as a case study, where human response biases caused by permutations in wordings of "prompts" have been extensively studied. Drawing from prior work in social psychology, we design a dataset and propose a framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior. These inconsistencies tend to be more prominent in models that have been instruction fine-tuned. Furthermore, even if a model shows a significant change in the same direction as humans, we find that perturbations that are not meant to elicit significant changes in humans may also result in a similar change. These results highlight the potential pitfalls of using LLMs to substitute humans in parts of the annotation pipeline, and further underscore the importance of finer-grained characterizations of model behavior. Our code, dataset, and collected samples are available at //github.com/lindiatjuatja/BiasMonkey
Recent advancements in Text-to-SQL (Text2SQL) emphasize stimulating the large language models (LLM) on in-context learning, achieving significant results. Nevertheless, they face challenges when dealing with verbose database information and complex user intentions. This paper presents a two-stage framework to enhance the performance of current LLM-based natural language to SQL systems. We first introduce a novel prompt representation, called reference-enhanced representation, which includes schema information and randomly sampled cell values from tables to instruct LLMs in generating SQL queries. Then, in the first stage, question-SQL pairs are retrieved as few-shot demonstrations, prompting the LLM to generate a preliminary SQL (PreSQL). After that, the mentioned entities in PreSQL are parsed to conduct schema linking, which can significantly compact the useful information. In the second stage, with the linked schema, we simplify the prompt's schema information and instruct the LLM to produce the final SQL. Finally, as the post-refinement module, we propose using cross-consistency across different LLMs rather than self-consistency within a particular LLM. Our methods achieve new SOTA results on the Spider benchmark, with an execution accuracy of 87.6%.
Compared to other techniques, particle swarm optimization is more frequently utilized because of its ease of use and low variability. However, it is complicated to find the best possible solution in the search space in large-scale optimization problems. Moreover, changing algorithm variables does not influence algorithm convergence much. The PSO algorithm can be combined with other algorithms. It can use their advantages and operators to solve this problem. Therefore, this paper proposes the onlooker multi-parent crossover discrete particle swarm optimization (OMPCDPSO). To improve the efficiency of the DPSO algorithm, we utilized multi-parent crossover on the best solutions. We performed an independent and intensive neighborhood search using the onlooker bees of the bee algorithm. The algorithm uses onlooker bees and crossover. They do local search (exploitation) and global search (exploration). Each of these searches is among the best solutions (employed bees). The proposed algorithm was tested on the allocation problem, which is an NP-hard optimization problem. Also, we used two types of simulated data. They were used to test the scalability and complexity of the better algorithm. Also, fourteen 2D test functions and thirteen 30D test functions were used. They also used twenty IEEE CEC2005 benchmark functions to test the efficiency of OMPCDPSO. Also, to test OMPCDPSO's performance, we compared it to four new binary optimization algorithms and three classic ones. The results show that the OMPCDPSO version had high capability. It performed better than other algorithms. The developed algorithm in this research (OMCDPSO) in 36 test functions out of 47 (76.60%) is better than other algorithms. The Onlooker bees and multi-parent operators significantly impact the algorithm's performance.
In operations research (OR), predictive models often encounter out-of-distribution (OOD) scenarios where the data distribution differs from the training data distribution. In recent years, neural networks (NNs) are gaining traction in OR for their exceptional performance in fields such as image classification. However, NNs tend to make confident yet incorrect predictions when confronted with OOD data. Uncertainty estimation offers a solution to overconfident models, communicating when the output should (not) be trusted. Hence, reliable uncertainty quantification in NNs is crucial in the OR domain. Deep ensembles, composed of multiple independent NNs, have emerged as a promising approach, offering not only strong predictive accuracy but also reliable uncertainty estimation. However, their deployment is challenging due to substantial computational demands. Recent fundamental research has proposed more efficient NN ensembles, namely the snapshot, batch, and multi-input multi-output ensemble. This study is the first to provide a comprehensive comparison of a single NN, a deep ensemble, and the three efficient NN ensembles. In addition, we propose a Diversity Quality metric to quantify the ensembles' performance on the in-distribution and OOD sets in one single metric. The OR case study discusses industrial parts classification to identify and manage spare parts, important for timely maintenance of industrial plants. The results highlight the batch ensemble as a cost-effective and competitive alternative to the deep ensemble. It outperforms the deep ensemble in both uncertainty and accuracy while exhibiting a training time speedup of 7x, a test time speedup of 8x, and 9x memory savings.
As language models (LMs) become more capable, it is increasingly important to align them with human preferences. However, the dominant paradigm for training Preference Models (PMs) for that purpose suffers from fundamental limitations, such as lack of transparency and scalability, along with susceptibility to overfitting the preference dataset. We propose Compositional Preference Models (CPMs), a novel PM framework that decomposes one global preference assessment into several interpretable features, obtains scalar scores for these features from a prompted LM, and aggregates these scores using a logistic regression classifier. Through these simple steps, CPMs allow to control which properties of the preference data are used to train the preference model and to build it based on features that are believed to underlie the human preference judgment. Our experiments show that CPMs not only improve generalization and are more robust to overoptimization than standard PMs, but also that best-of-n samples obtained using CPMs tend to be preferred over samples obtained using conventional PMs. Overall, our approach demonstrates the benefits of endowing PMs with priors about which features determine human preferences while relying on LM capabilities to extract those features in a scalable and robust way.
Predictions in the form of probability distributions are crucial for decision-making. Quantile regression enables this within spatial interpolation settings for merging remote sensing and gauge precipitation data. However, ensemble learning of quantile regression algorithms remains unexplored in this context. Here, we address this gap by introducing nine quantile-based ensemble learners and applying them to large precipitation datasets. We employed a novel feature engineering strategy, reducing predictors to distance-weighted satellite precipitation at relevant locations, combined with location elevation. Our ensemble learners include six stacking and three simple methods (mean, median, best combiner), combining six individual algorithms: quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). These algorithms serve as both base learners and combiners within different stacking methods. We evaluated performance against QR using quantile scoring functions in a large dataset comprising 15 years of monthly gauge-measured and satellite precipitation in contiguous US (CONUS). Stacking with QR and QRNN yielded the best results across quantile levels of interest (0.025, 0.050, 0.075, 0.100, 0.200, 0.300, 0.400, 0.500, 0.600, 0.700, 0.800, 0.900, 0.925, 0.950, 0.975), surpassing the reference method by 3.91% to 8.95%. This demonstrates the potential of stacking to improve probabilistic predictions in spatial interpolation and beyond.
Numerical experiments indicate that deep learning algorithms overcome the curse of dimensionality when approximating solutions of semilinear PDEs. For certain linear PDEs and semilinear PDEs with gradient-independent nonlinearities this has also been proved mathematically, i.e., it has been shown that the number of parameters of the approximating DNN increases at most polynomially in both the PDE dimension $d\in \mathbb{N}$ and the reciprocal of the prescribed accuracy $\epsilon\in (0,1)$. The main contribution of this paper is to rigorously prove for the first time that deep neural networks can also overcome the curse dimensionality in the approximation of a certain class of nonlinear PDEs with gradient-dependent nonlinearities.
In logistic regression modeling, Firth's modified estimator is widely used to address the issue of data separation, which results in the nonexistence of the maximum likelihood estimate. Firth's modified estimator can be formulated as a penalized maximum likelihood estimator in which Jeffreys' prior is adopted as the penalty term. Despite its widespread use in practice, the formal verification of the corresponding estimate's existence has not been established. In this study, we establish the existence theorem of Firth's modified estimate in binomial logistic regression models, assuming only the full column rankness of the design matrix. We also discuss other binomial regression models obtained through alternating link functions and prove the existence of similar penalized maximum likelihood estimates for such models.
As artificial intelligence (AI) becomes more widespread, one question that arises is how human-AI interaction might impact human-human interaction. Chatbots, for example, are increasingly used as social companions, and while much is speculated, little is known empirically about how their use impacts human relationships. A common hypothesis is that relationships with companion chatbots are detrimental to social health by harming or replacing human interaction, but this hypothesis may be too simplistic, especially considering the social needs of users and the health of their preexisting human relationships. To understand how relationships with companion chatbots impact social health, we studied people who regularly used companion chatbots and people who did not use them. Contrary to expectations, companion chatbot users indicated that these relationships were beneficial to their social health, whereas non-users viewed them as harmful. Another common assumption is that people perceive conscious, humanlike AI as disturbing and threatening. Among both users and non-users, however, we found the opposite: perceiving companion chatbots as more conscious and humanlike correlated with more positive opinions and more pronounced social health benefits. Detailed accounts from users suggested that these humanlike chatbots may aid social health by supplying reliable and safe interactions, without necessarily harming human relationships, but this may depend on users' preexisting social needs and how they perceive both human likeness and mind in the chatbot.
LiDAR-based 3D perception algorithms have evolved rapidly alongside the emergence of large datasets. Nonetheless, considerable performance degradation often ensues when models trained on a specific dataset are applied to other datasets or real-world scenarios with different LiDAR. This paper aims to develop a unified model capable of handling different LiDARs, enabling continual learning across diverse LiDAR datasets and seamless deployment across heterogeneous platforms. We observe that the gaps among datasets primarily manifest in geometric disparities (such as variations in beams and point counts) and semantic inconsistencies (taxonomy conflicts). To this end, this paper proposes UniLiDAR, an occupancy prediction pipeline that leverages geometric realignment and semantic label mapping to facilitate multiple datasets training and mitigate performance degradation during deployment on heterogeneous platforms. Moreover, our method can be easily combined with existing 3D perception models. The efficacy of the proposed approach in bridging LiDAR domain gaps is verified by comprehensive experiments on two prominent datasets: OpenOccupancy-nuScenes and SemanticKITTI. UniLiDAR elevates the mIoU of occupancy prediction by 15.7% and 12.5%, respectively, compared to the model trained on the directly merged dataset. Moreover, it outperforms several SOTA methods trained on individual datasets. We expect our research to facilitate further study of 3D generalization, the code will be available soon.
Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.