亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Recent advances in Large Language Models (LLMs) have demonstrated significant potential in the field of Recommendation Systems (RSs). Most existing studies have focused on converting user behavior logs into textual prompts and leveraging techniques such as prompt tuning to enable LLMs for recommendation tasks. Meanwhile, research interest has recently grown in multimodal recommendation systems that integrate data from images, text, and other sources using modality fusion techniques. This introduces new challenges to the existing LLM-based recommendation paradigm which relies solely on text modality information. Moreover, although Multimodal Large Language Models (MLLMs) capable of processing multi-modal inputs have emerged, how to equip MLLMs with multi-modal recommendation capabilities remains largely unexplored. To this end, in this paper, we propose the Multimodal Large Language Model-enhanced Multimodaln Sequential Recommendation (MLLM-MSR) model. To capture the dynamic user preference, we design a two-stage user preference summarization method. Specifically, we first utilize an MLLM-based item-summarizer to extract image feature given an item and convert the image into text. Then, we employ a recurrent user preference summarization generation paradigm to capture the dynamic changes in user preferences based on an LLM-based user-summarizer. Finally, to enable the MLLM for multi-modal recommendation task, we propose to fine-tune a MLLM-based recommender using Supervised Fine-Tuning (SFT) techniques. Extensive evaluations across various datasets validate the effectiveness of MLLM-MSR, showcasing its superior ability to capture and adapt to the evolving dynamics of user preferences.

相關內容

Vision Language Models (VLMs) have demonstrated strong capabilities across various visual understanding and reasoning tasks. However, their real-world deployment is often constrained by high latency during inference due to substantial compute required to process the large number of input tokens (predominantly from the image) by the LLM. To reduce inference costs, one can either downsize the LLM or reduce the number of input image-tokens, the latter of which has been the focus of many recent works around token compression. However, it is unclear what the optimal trade-off is, as both the factors directly affect the VLM performance. We first characterize this optimal trade-off between the number of visual tokens and LLM parameters by establishing scaling laws that capture variations in performance with these two factors. Our results reveal a surprising trend: for visual reasoning tasks, the inference-optimal behavior in VLMs, i.e., minimum downstream error at any given fixed inference compute, is achieved when using the largest LLM that fits within the inference budget while minimizing visual token count - often to a single token. While the token reduction literature has mainly focused on maintaining base model performance by modestly reducing the token count (e.g., $5-10\times$), our results indicate that the compute-optimal inference regime requires operating under even higher token compression ratios. Based on these insights, we take some initial steps towards building approaches tailored for high token compression settings. Code is available at //github.com/locuslab/llava-token-compression.

Existing Multimodal Large Language Models (MLLMs) and Visual Language Pretrained Models (VLPMs) have shown remarkable performances in the general Visual Question Answering (VQA). However, these models struggle with VQA questions that require external commonsense knowledge due to the challenges in generating high-quality prompts and the high computational costs of fine-tuning. In this work, we propose a novel graph-based multimodal commonsense knowledge distillation framework that constructs a unified relational graph over commonsense knowledge, visual objects and questions through a Graph Convolutional Network (GCN) following a teacher-student environment. This proposed framework is flexible with any type of teacher and student models without further fine-tuning, and has achieved competitive performances on the ScienceQA dataset.

Large language models (LLMs) have garnered significant attention in recent years due to their impressive performance. While considerable research has evaluated these models from various perspectives, the extent to which LLMs can perform implicit and explicit emotion retrieval remains largely unexplored. To address this gap, this study investigates LLMs' emotion retrieval capabilities in commonsense. Through extensive experiments involving multiple models, we systematically evaluate the ability of LLMs on emotion retrieval. Specifically, we propose a supervised contrastive probing method to verify LLMs' performance for implicit and explicit emotion retrieval, as well as the diversity of the emotional events they retrieve. The results offer valuable insights into the strengths and limitations of LLMs in handling emotion retrieval.

Traumatic Brain Injuries (TBIs) resulting from Road Traffic Crashes (RTCs) can have fatal and disabling effects on patients. In this study, we evaluated the TBIs outcomes of patients involved in RTCs and identify key contributing factors affecting these outcomes. Data on 207 patients recorded by physicians at a tertiary hospital in Bangladesh was collected. A random parameters multinomial logit model with heterogeneity in the means was utilized to assess patients outcomes in three categories: Non-surgical, Surgical, and Fatal. From the random parameters, the study found that male patients (55.48%) are more likely to experience surgical and fatal outcomes. Male motorcycle users have a higher probability of experiencing fatal consequences. Additionally, 60.94% of incidents on rural roads result in surgeries and fatalities, with nighttime crashes on these roads significantly increasing the likelihood of fatal outcomes. Key factors impacting the likelihood of TBIs outcomes include older age, pedestrian involvement, bus and truck crashes, speeding, wet pavements, overtaking, low visibility, and weekday crashes. The study identified two significant interaction variables that increase the probability of fatal outcomes from TBIs: the interactions between low visibility and bus involvement, and between overtaking and wet pavements. While these factors individually had a higher probability of leading to both surgical and fatal outcomes, together these factors increase the risks to fatalities. Overall, our findings provide more detailed insights about the impact of TBIs outcomes resulting from RTCs and emphasize the need to develop more effective measures to improve road safety and patient outcomes.

Multivariate Item Response Theory (MIRT) is sought-after widely by applied researchers looking for interpretable (sparse) explanations underlying response patterns in questionnaire data. There is, however, an unmet demand for such sparsity discovery tools in practice. Our paper develops a Bayesian platform for binary and ordinal item MIRT which requires minimal tuning and scales well on large datasets due to its parallelizable features. Bayesian methodology for MIRT models has traditionally relied on MCMC simulation, which cannot only be slow in practice, but also often renders exact sparsity recovery impossible without additional thresholding. In this work, we develop a scalable Bayesian EM algorithm to estimate sparse factor loadings from mixed continuous, binary, and ordinal item responses. We address the seemingly insurmountable problem of unknown latent factor dimensionality with tools from Bayesian nonparametrics which enable estimating the number of factors. Rotations to sparsity through parameter expansion further enhance convergence and interpretability without identifiability constraints. In our simulation study, we show that our method reliably recovers both the factor dimensionality as well as the latent structure on high-dimensional synthetic data even for small samples. We demonstrate the practical usefulness of our approach on three datasets: an educational assessment dataset, a quality-of-life measurement dataset, and a bio-behavioral dataset. All demonstrations show that our tool yields interpretable estimates, facilitating interesting discoveries that might otherwise go unnoticed under a pure confirmatory factor analysis setting.

Backdoor attacks present a significant threat to the robustness of Federated Learning (FL) due to their stealth and effectiveness. They maintain both the main task of the FL system and the backdoor task simultaneously, causing malicious models to appear statistically similar to benign ones, which enables them to evade detection by existing defense methods. We find that malicious parameters in backdoored models are inactive on the main task, resulting in a significantly large empirical loss during the machine unlearning process on clean inputs. Inspired by this, we propose MASA, a method that utilizes individual unlearning on local models to identify malicious models in FL. To improve the performance of MASA in challenging non-independent and identically distributed (non-IID) settings, we design pre-unlearning model fusion that integrates local models with knowledge learned from other datasets to mitigate the divergence in their unlearning behaviors caused by the non-IID data distributions of clients. Additionally, we propose a new anomaly detection metric with minimal hyperparameters to filter out malicious models efficiently. Extensive experiments on IID and non-IID datasets across six different attacks validate the effectiveness of MASA. To the best of our knowledge, this is the first work to leverage machine unlearning to identify malicious models in FL. Code is available at \url{//github.com/JiiahaoXU/MASA}.

Reverse Unrestricted MIxed DAta Sampling (RU-MIDAS) regressions are used to model high-frequency responses by means of low-frequency variables. However, due to the periodic structure of RU-MIDAS regressions, the dimensionality grows quickly if the frequency mismatch between the high- and low-frequency variables is large. Additionally the number of high-frequency observations available for estimation decreases. We propose to counteract this reduction in sample size by pooling the high-frequency coefficients and further reduce the dimensionality through a sparsity-inducing convex regularizer that accounts for the temporal ordering among the different lags. To this end, the regularizer prioritizes the inclusion of lagged coefficients according to the recency of the information they contain. We demonstrate the proposed method on two empirical applications, one on realized volatility forecasting with macroeconomic data and another on demand forecasting for a bicycle-sharing system with ridership data on other transportation types.

Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at //github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.

This work aims to provide an engagement decision support tool for Beyond Visual Range (BVR) air combat in the context of Defensive Counter Air (DCA) missions. In BVR air combat, engagement decision refers to the choice of the moment the pilot engages a target by assuming an offensive stance and executing corresponding maneuvers. To model this decision, we use the Brazilian Air Force's Aerospace Simulation Environment (\textit{Ambiente de Simula\c{c}\~ao Aeroespacial - ASA} in Portuguese), which generated 3,729 constructive simulations lasting 12 minutes each and a total of 10,316 engagements. We analyzed all samples by an operational metric called the DCA index, which represents, based on the experience of subject matter experts, the degree of success in this type of mission. This metric considers the distances of the aircraft of the same team and the opposite team, the point of Combat Air Patrol, and the number of missiles used. By defining the engagement status right before it starts and the average of the DCA index throughout the engagement, we create a supervised learning model to determine the quality of a new engagement. An algorithm based on decision trees, working with the XGBoost library, provides a regression model to predict the DCA index with a coefficient of determination close to 0.8 and a Root Mean Square Error of 0.05 that can furnish parameters to the BVR pilot to decide whether or not to engage. Thus, using data obtained through simulations, this work contributes by building a decision support system based on machine learning for BVR air combat.

Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.

北京阿比特科技有限公司