This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning, focusing on the integration of image data into DCM's utility functions and its impact on model interpretability. We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework. Our study reveals that neural network (NN) components learn and replicate tabular variable representations from images when co-occurrences exist, thereby compromising the interpretability of DCM parameters. We propose and benchmark two methodologies to address this challenge: architectural design adjustments to segregate redundant information, and isomorphic information mitigation through source information masking and inpainting. Our experiments, conducted on a semi-synthetic dataset, demonstrate that while architectural modifications prove inconclusive, direct mitigation at the data source shows to be a more effective strategy in maintaining the integrity of DCM's interpretable parameters. The paper concludes with insights into the applicability of our findings in real-world settings and discusses the implications for future research in hybrid modeling that combines complex data modalities. Full control of tabular and image data congruence is attained by using the MIT moral machine dataset, and both inputs are merged into a choice model by deploying the Learning Multinomial Logit (L-MNL) framework.
This study evaluates the ability of Large Language Model (LLM)-based Subpopulation Representative Models (SRMs) to generalize from empirical data, utilizing in-context learning with data from the 2016 and 2020 American National Election Studies. We explore generalization across response variables and demographic subgroups. While conditioning with empirical data improves performance on the whole, the benefit of in-context learning varies considerably across demographics, sometimes hurting performance for one demographic while helping performance for others. The inequitable benefits of in-context learning for SRM present a challenge for practitioners implementing SRMs, and for decision-makers who might come to rely on them. Our work highlights a need for fine-grained benchmarks captured from diverse subpopulations that test not only fidelity but generalization.
This paper introduces the Quantified Boolean Bayesian Network (QBBN), which provides a unified view of logical and probabilistic reasoning. The QBBN is meant to address a central problem with the Large Language Model (LLM), which has become extremely popular in Information Retrieval, which is that the LLM hallucinates. A Bayesian Network, by construction, cannot hallucinate, because it can only return answers that it can explain. We show how a Bayesian Network over an unbounded number of boolean variables can be configured to represent the logical reasoning underlying human language. We do this by creating a key-value version of the First-Order Calculus, for which we can prove consistency and completeness. We show that the model is trivially trained over fully observed data, but that inference is non-trivial. Exact inference in a Bayesian Network is intractable (i.e. $\Omega(2^N)$ for $N$ variables). For inference, we investigate the use of Loopy Belief Propagation (LBP), which is not guaranteed to converge, but which has been shown to often converge in practice. Our experiments show that LBP indeed does converge very reliably, and our analysis shows that a round of LBP takes time $O(N2^n)$, where $N$ bounds the number of variables considered, and $n$ bounds the number of incoming connections to any factor, and further improvements may be possible. Our network is specifically designed to alternate between AND and OR gates in a Boolean Algebra, which connects more closely to logical reasoning, allowing a completeness proof for an expanded version of our network, and also allows inference to follow specific but adequate pathways, that turn out to be fast.
Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning capabilities required for solving complex scientific problems, we introduce an expansive benchmark suite SciBench for LLMs. SciBench contains a carefully curated dataset featuring a range of collegiate-level scientific problems from mathematics, chemistry, and physics domains. Based on the dataset, we conduct an in-depth benchmarking study of representative open-source and proprietary LLMs with various prompting strategies. The results reveal that the current LLMs fall short of delivering satisfactory performance, with the best overall score of merely 43.22%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms the others and some strategies that demonstrate improvements in certain problem-solving skills could result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.
This paper introduces SudokuSens, a generative framework for automated generation of training data in machine-learning-based Internet-of-Things (IoT) applications, such that the generated synthetic data mimic experimental configurations not encountered during actual sensor data collection. The framework improves the robustness of resulting deep learning models, and is intended for IoT applications where data collection is expensive. The work is motivated by the fact that IoT time-series data entangle the signatures of observed objects with the confounding intrinsic properties of the surrounding environment and the dynamic environmental disturbances experienced. To incorporate sufficient diversity into the IoT training data, one therefore needs to consider a combinatorial explosion of training cases that are multiplicative in the number of objects considered and the possible environmental conditions in which such objects may be encountered. Our framework substantially reduces these multiplicative training needs. To decouple object signatures from environmental conditions, we employ a Conditional Variational Autoencoder (CVAE) that allows us to reduce data collection needs from multiplicative to (nearly) linear, while synthetically generating (data for) the missing conditions. To obtain robustness with respect to dynamic disturbances, a session-aware temporal contrastive learning approach is taken. Integrating the aforementioned two approaches, SudokuSens significantly improves the robustness of deep learning for IoT applications. We explore the degree to which SudokuSens benefits downstream inference tasks in different data sets and discuss conditions under which the approach is particularly effective.
Over the past few years, deep learning methods have been applied for a wide range of Software Engineering (SE) tasks, including in particular for the important task of automatically predicting and localizing faults in software. With the rapid adoption of increasingly complex machine learning models, it however becomes more and more difficult for scholars to reproduce the results that are reported in the literature. This is in particular the case when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared. Given some recent -- and very worrying -- findings regarding reproducibility and progress in other areas of applied machine learning, the goal of this work is to analyze to what extent the field of software engineering, in particular in the area of software fault prediction, is plagued by similar problems. We have therefore conducted a systematic review of the current literature and examined the level of reproducibility of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences. Our analysis revealed that scholars are apparently largely aware of the reproducibility problem, and about two thirds of the papers provide code for their proposed deep learning models. However, it turned out that in the vast majority of cases, crucial elements for reproducibility are missing, such as the code of the compared baselines, code for data pre-processing or code for hyperparameter tuning. In these cases, it therefore remains challenging to exactly reproduce the results in the current research literature. Overall, our meta-analysis therefore calls for improved research practices to ensure the reproducibility of machine-learning based research.
We study inductive bias in Transformers in the infinitely over-parameterized Gaussian process limit and argue transformers tend to be biased towards more permutation symmetric functions in sequence space. We show that the representation theory of the symmetric group can be used to give quantitative analytical predictions when the dataset is symmetric to permutations between tokens. We present a simplified transformer block and solve the model at the limit, including accurate predictions for the learning curves and network outputs. We show that in common setups, one can derive tight bounds in the form of a scaling law for the learnability as a function of the context length. Finally, we argue WikiText dataset, does indeed possess a degree of permutation symmetry.
Deep learning methods are achieving ever-increasing performance on many artificial intelligence tasks. A major limitation of deep models is that they are not amenable to interpretability. This limitation can be circumvented by developing post hoc techniques to explain the predictions, giving rise to the area of explainability. Recently, explainability of deep models on images and texts has achieved significant progress. In the area of graph data, graph neural networks (GNNs) and their explainability are experiencing rapid developments. However, there is neither a unified treatment of GNN explainability methods, nor a standard benchmark and testbed for evaluations. In this survey, we provide a unified and taxonomic view of current GNN explainability methods. Our unified and taxonomic treatments of this subject shed lights on the commonalities and differences of existing methods and set the stage for further methodological developments. To facilitate evaluations, we generate a set of benchmark graph datasets specifically for GNN explainability. We summarize current datasets and metrics for evaluating GNN explainability. Altogether, this work provides a unified methodological treatment of GNN explainability and a standardized testbed for evaluations.
This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been the key solution to sequential decision-making problems. Along with the fast advance of RL in various domains. including robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which these transfer learning techniques would be approachable. We discuss the relationship between transfer learning and other relevant topics from an RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL.
The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.
With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose occupancy networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.