This paper investigates truck-involved crashes to determine the statistically significant factors that contribute to injury severity under different weather conditions. The analysis uses crash data from the state of Ohio between 2011 and 2015 available from the Highway Safety Information System. To determine if weather conditions should be considered separately for truck safety analyses, parameter transferability tests are conducted; the results suggest that weather conditions should be modeled separately with a high level of statistical confidence. To this end, three separate mixed logit models are estimated for three different weather conditions: normal, rain and snow. The estimated models identify a variety of statistically significant factors influencing the injury severity. Different weather conditions are found to have different contributing effects on injury severity in truck-involved crashes. Rural, rear-end and sideswipe crash parameters were found to have significantly different levels of impact on injury severity. Based on the findings of this study, several countermeasures are suggested: 1) safety and enforcement programs should focus on female truck drivers, 2) a variable speed limit sign should be used to lower speeds of trucks during rainy condition, and 3) trucks should be restricted or prohibited on non-interstates during rainy and snowy conditions. These countermeasures could reduce the number and severity of truck-involved crashes under different weather conditions.
The current evaluation of Large Language Models (LLMs) predominantly relies on benchmarks focusing on their embedded knowledge by testing through multiple-choice questions (MCQs), a format inherently suited for automated evaluation. Our study extends this evaluation to explore LLMs' pragmatic competence--a facet previously underexamined before the advent of sophisticated LLMs, specifically in the context of Korean. We employ two distinct evaluation setups: the conventional MCQ format, adapted for automatic evaluation, and Open-Ended Questions (OEQs), assessed by human experts, to examine LLMs' narrative response capabilities without predefined options. Our findings reveal that GPT-4 excels, scoring 81.11 and 85.69 in the MCQ and OEQ setups, respectively, with HyperCLOVA X, an LLM optimized for Korean, closely following, especially in the OEQ setup, demonstrating a score of 81.56 with a marginal difference of 4.13 points compared to GPT-4. Furthermore, while few-shot learning strategies generally enhance LLM performance, Chain-of-Thought (CoT) prompting introduces a bias toward literal interpretations, hindering accurate pragmatic inference. Considering the growing expectation for LLMs to understand and produce language that aligns with human communicative norms, our findings emphasize the importance for advancing LLMs' abilities to grasp and convey sophisticated meanings beyond mere literal interpretations.
Rental assistance programs provide individuals with financial assistance to prevent housing instabilities caused by evictions and avert homelessness. Since these programs operate under resource constraints, they must decide who to prioritize. Typically, funding is distributed by a reactive or first-come-first serve allocation process that does not systematically consider risk of future homelessness. We partnered with Allegheny County, PA to explore a proactive allocation approach that prioritizes individuals facing eviction based on their risk of future homelessness. Our ML system that uses state and county administrative data to accurately identify individuals in need of support outperforms simpler prioritization approaches by at least 20% while being fair and equitable across race and gender. Furthermore, our approach would identify 28% of individuals who are overlooked by the current process and end up homeless. Beyond improvements to the rental assistance program in Allegheny County, this study can inform the development of evidence-based decision support tools in similar contexts, including lessons about data needs, model design, evaluation, and field validation.
The paper underscores the significance of Large Language Models (LLMs) in reshaping recommender systems, attributing their value to unique reasoning abilities absent in traditional recommenders. Unlike conventional systems lacking direct user interaction data, LLMs exhibit exceptional proficiency in recommending items, showcasing their adeptness in comprehending intricacies of language. This marks a fundamental paradigm shift in the realm of recommendations. Amidst the dynamic research landscape, researchers actively harness the language comprehension and generation capabilities of LLMs to redefine the foundations of recommendation tasks. The investigation thoroughly explores the inherent strengths of LLMs within recommendation frameworks, encompassing nuanced contextual comprehension, seamless transitions across diverse domains, adoption of unified approaches, holistic learning strategies leveraging shared data reservoirs, transparent decision-making, and iterative improvements. Despite their transformative potential, challenges persist, including sensitivity to input prompts, occasional misinterpretations, and unforeseen recommendations, necessitating continuous refinement and evolution in LLM-driven recommender systems.
Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploit the lesion cues of both intra-frame and inter-frame simultaneously. To address this problem, we propose a novel Spatial-Temporal Progressive Fusion Network (STPFNet) for video based breast lesion segmentation problem. The main aspects of the proposed STPFNet are threefold. First, we propose to adopt a unified network architecture to capture both spatial dependences within each ultrasound frame and temporal correlations between different frames together for ultrasound data representation. Second, we propose a new fusion module, termed Multi-Scale Feature Fusion (MSFF), to fuse spatial and temporal cues together for lesion detection. MSFF can help to determine the boundary contour of lesion region to overcome the issue of lesion boundary blurring. Third, we propose to exploit the segmentation result of previous frame as the prior knowledge to suppress the noisy background and learn more robust representation. In particular, we introduce a new publicly available ultrasound video breast lesion segmentation dataset, termed UVBLS200, which is specifically dedicated to breast lesion segmentation. It contains 200 videos, including 80 videos of benign lesions and 120 videos of malignant lesions. Experiments on the proposed dataset demonstrate that the proposed STPFNet achieves better breast lesion detection performance than state-of-the-art methods.
Our paper addresses the challenge of inferring causal effects in social network data, characterized by complex interdependencies among individuals resulting in challenges such as non-independence of units, interference (where a unit's outcome is affected by neighbors' treatments), and introduction of additional confounding factors from neighboring units. We propose a novel methodology combining graph neural networks and double machine learning, enabling accurate and efficient estimation of direct and peer effects using a single observational social network. Our approach utilizes graph isomorphism networks in conjunction with double machine learning to effectively adjust for network confounders and consistently estimate the desired causal effects. We demonstrate that our estimator is both asymptotically normal and semiparametrically efficient. A comprehensive evaluation against four state-of-the-art baseline methods using three semi-synthetic social network datasets reveals our method's on-par or superior efficacy in precise causal effect estimation. Further, we illustrate the practical application of our method through a case study that investigates the impact of Self-Help Group participation on financial risk tolerance. The results indicate a significant positive direct effect, underscoring the potential of our approach in social network analysis. Additionally, we explore the effects of network sparsity on estimation performance.
This paper proposes novel inferential procedures for discovering the network Granger causality in high-dimensional vector autoregressive models. In particular, we mainly offer two multiple testing procedures designed to control the false discovery rate (FDR). The first procedure is based on the limiting normal distribution of the $t$-statistics with the debiased lasso estimator. The second procedure is its bootstrap version. We also provide a robustification of the first procedure against any cross-sectional dependence using asymptotic e-variables. Their theoretical properties, including FDR control and power guarantee, are investigated. The finite sample evidence suggests that both procedures can successfully control the FDR while maintaining high power. Finally, the proposed methods are applied to discovering the network Granger causality in a large number of macroeconomic variables and regional house prices in the UK.
Algorithmic verification of realistic systems to satisfy safety and other temporal requirements has suffered from poor scalability of the employed formal approaches. To design systems with rigorous guarantees, many approaches still rely on exact models of the underlying systems. Since this assumption can rarely be met in practice, models have to be inferred from measurement data or are bypassed completely. Whilst former usually requires the model structure to be known a-priori and immense amounts of data to be available, latter gives rise to a plethora of restrictive mathematical assumptions about the unknown dynamics. In a pursuit of developing scalable formal verification algorithms without shifting the problem to unrealistic assumptions, we employ the concept of barrier certificates, which can guarantee safety of the system, and learn the certificate directly from a compact set of system trajectories. We use conditional mean embeddings to embed data from the system into a reproducing kernel Hilbert space (RKHS) and construct an RKHS ambiguity set that can be inflated to robustify the result w.r.t. a set of plausible transition kernels. We show how to solve the resulting program efficiently using sum-of-squares optimization and a Gaussian process envelope. Our approach lifts the need for restrictive assumptions on the system dynamics and uncertainty, and suggests an improvement in the sample complexity of verifying the safety of a system on a tested case study compared to a state-of-the-art approach.
The problem of distributed optimization requires a group of networked agents to compute a parameter that minimizes the average of their local cost functions. While there are a variety of distributed optimization algorithms that can solve this problem, they are typically vulnerable to "Byzantine" agents that do not follow the algorithm. Recent attempts to address this issue focus on single dimensional functions, or assume certain statistical properties of the functions at the agents. In this paper, we provide two resilient, scalable, distributed optimization algorithms for multi-dimensional functions. Our schemes involve two filters, (1) a distance-based filter and (2) a min-max filter, which each remove neighborhood states that are extreme (defined precisely in our algorithms) at each iteration. We show that these algorithms can mitigate the impact of up to $F$ (unknown) Byzantine agents in the neighborhood of each regular agent. In particular, we show that if the network topology satisfies certain conditions, all of the regular agents' states are guaranteed to converge to a bounded region that contains the minimizer of the average of the regular agents' functions.
The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1,000 for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We show that this lends itself to a model image or a model signature which unlocks several capabilities with affordable cost: efficiently discovering the LLM's hidden size, obtaining full-vocabulary outputs, detecting and disambiguating different model updates, identifying the source LLM given a single full LLM output, and even estimating the output layer parameters. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.