亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Recent advancements in LLMs have shown their significant potential in tasks like text summarization and generation. Yet, they often encounter difficulty while solving complex physics problems that require arithmetic calculation and a good understanding of concepts. Moreover, many physics problems include images that contain important details required to understand the problem's context. We propose an LMM-based chatbot to answer multimodal physics MCQs. For domain adaptation, we utilize the MM-PhyQA dataset comprising Indian high school-level multimodal physics problems. To improve the LMM's performance, we experiment with two techniques, RLHF (Reinforcement Learning from Human Feedback) and Image Captioning. In image captioning, we add a detailed explanation of the diagram in each image, minimizing hallucinations and image processing errors. We further explore the integration of Reinforcement Learning from Human Feedback (RLHF) methodology inspired by the ranking approach in RLHF to enhance the human-like problem-solving abilities of the models. The RLHF approach incorporates human feedback into the learning process of LLMs, improving the model's problem-solving skills, truthfulness, and reasoning capabilities, minimizing the hallucinations in the answers, and improving the quality instead of using vanilla-supervised fine-tuned models. We employ the LLaVA open-source model to answer multimodal physics MCQs and compare the performance with and without using RLHF.

相關內容

The `Jacobi prior' is an alternative Bayesian method for predictive models. It performs better than well-known methods such as Lasso, Ridge, Elastic Net, and MCMC-based Horse-Shoe Prior, particularly in terms of prediction accuracy and run-time. This method is implemented for Gaussian process classification, adeptly handling a nonlinear decision boundary. The Jacobi prior demonstrates its capability to manage partitioned data across global servers, making it highly useful in distributed computing environments. Additionally, we show that the Jacobi prior is more than a hundred times faster than these methods while maintaining similar predictive accuracy. As the method is both fast and accurate, it is advantageous for organisations looking to reduce their environmental impact and meet ESG standards. To demonstrate the effectiveness of the Jacobi prior, we conducted a detailed simulation study with four experiments focusing on statistical consistency, accuracy, and speed. We also present two empirical studies: the first evaluates credit risk by analysing default probability using data from the U.S. Small Business Administration (SBA), and the second uses the Jacobi prior for classifying stars, quasars, and galaxies in a three-class problem using multinomial logit regression on data from the Sloan Digital Sky Survey. Different filters were used as features in this study. All codes and datasets for this paper are available in the following GitHub repository : //github.com/sourish-cmi/Jacobi-Prior/

AI-generated faces have enriched human life, such as entertainment, education, and art. However, they also pose misuse risks. Therefore, detecting AI-generated faces becomes crucial, yet current detectors show biased performance across different demographic groups. Mitigating biases can be done by designing algorithmic fairness methods, which usually require demographically annotated face datasets for model training. However, no existing dataset comprehensively encompasses both demographic attributes and diverse generative methods, which hinders the development of fair detectors for AI-generated faces. In this work, we introduce the AI-Face dataset, the first million-scale demographically annotated AI-generated face image dataset, including real faces, faces from deepfake videos, and faces generated by Generative Adversarial Networks and Diffusion Models. Based on this dataset, we conduct the first comprehensive fairness benchmark to assess various AI face detectors and provide valuable insights and findings to promote the future fair design of AI face detectors. Our AI-Face dataset and benchmark code are publicly available at //github.com/Purdue-M2/AI-Face-FairnessBench.

Large language models (LLMs) have shown amazing capabilities in knowledge memorization and present. However, when it comes to domain-specific knowledge and downstream tasks like medical, general LLMs are often unable to give precise answers. In addition, when people want LLMs to answer classification questions, they usually go through instruction tuning first, however, LLMs do not always give a direct index of the categorization after instruction tuning. In this paper, we proposed LlamaCare, a fine-tuned medical language model, and Extended Classification Integration(ECI), a module to handle classification problems of LLMs. Our contributions are : (i) We fine-tuned a large language model of medical knowledge with very low carbon emissions and achieved similar performance with ChatGPT by a 24G GPU. (ii) We solved the problem of redundant categorical answers and improved the performance of LLMs by proposing a new module called Extended Classification Integration. (iii) We released our processed data for one-shot and few-shot training for some benchmarks such as PubMedQA and USMLE 1-3 step. Our method achieves a close effect with the state-of-the-art model in benchmarks while costing lower GPU resources compared to LLMs with the same quantity of parameters. Our models, codes, and datasets can be found in //github.com/Stephen-SMJ/LLamaCare

Neural Machine Translation (NMT) for low-resource languages is still a challenging task in front of NLP researchers. In this work, we deploy a standard data augmentation methodology by back-translation to a new language translation direction Cantonese-to-English. We present the models we fine-tuned using the limited amount of real data and the synthetic data we generated using back-translation including OpusMT, NLLB, and mBART. We carried out automatic evaluation using a range of different metrics including lexical-based and embedding-based. Furthermore. we create a user-friendly interface for the models we included in this\textsc{ CantonMT} research project and make it available to facilitate Cantonese-to-English MT research. Researchers can add more models into this platform via our open-source\textsc{ CantonMT} toolkit \url{//github.com/kenrickkung/CantoneseTranslation}.

In this work we present CppFlow - a novel and performant planner for the Cartesian Path Planning problem, which finds valid trajectories up to 129x faster than current methods, while also succeeding on more difficult problems where others fail. At the core of the proposed algorithm is the use of a learned, generative Inverse Kinematics solver, which is able to efficiently produce promising entire candidate solution trajectories on the GPU. Precise, valid solutions are then found through classical approaches such as differentiable programming, global search, and optimization. In combining approaches from these two paradigms we get the best of both worlds - efficient approximate solutions from generative AI which are made exact using the guarantees of traditional planning and optimization. We evaluate our system against other state of the art methods on a set of established baselines as well as new ones introduced in this work and find that our method significantly outperforms others in terms of the time to find a valid solution and planning success rate, and performs comparably in terms of trajectory length over time. The work is made open source and available for use upon acceptance.

There remain two critical challenges that hinder the development of ERC. Firstly, there is a lack of exploration into mining deeper insights from the data itself for conversational emotion tasks. Secondly, the systems exhibit vulnerability to random modality feature missing, which is a common occurrence in realistic settings. Focusing on these two key challenges, we propose a novel framework for incomplete multimodal learning in ERC, called "Inverted Teacher-studEnt seArCH Network (ITEACH-Net)." ITEACH-Net comprises two novel components: the Emotion Context Changing Encoder (ECCE) and the Inverted Teacher-Student (ITS) framework. Specifically, leveraging the tendency for emotional states to exhibit local stability within conversational contexts, ECCE captures these patterns and further perceives their evolution over time. Recognizing the varying challenges of handling incomplete versus complete data, ITS employs a teacher-student framework to decouple the respective computations. Subsequently, through Neural Architecture Search, the student model develops enhanced computational capabilities for handling incomplete data compared to the teacher model. During testing, we design a novel evaluation method, testing the model's performance under different missing rate conditions without altering the model weights. We conduct experiments on three benchmark ERC datasets, and the results demonstrate that our ITEACH-Net outperforms existing methods in incomplete multimodal ERC. We believe ITEACH-Net can inspire relevant research on the intrinsic nature of emotions within conversation scenarios and pave a more robust route for incomplete learning techniques. Codes will be made available.

Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to accelerate training first, followed by a progressive rate reduction to obtain better eventual performance. This decremental approach differs from current methods of incrementing sparsity to a desired target, which costs significantly more time than dense training. We call our method SNIPER training: Single-shot Initialization Pruning Evolving-Rate training. Our experiments on FastSpeech2 show that we were able to obtain better losses in the first few training epochs with SNIPER, and that the final SNIPER-trained models outperformed constant-sparsity models and edged out dense models, with negligible difference in training time.

Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also intrigues great interests in the time series community. Among multiple advantages of transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review transformer schemes for time series modeling by highlighting their strengths as well as limitations through a new taxonomy to summarize existing time series transformers in two perspectives. From the perspective of network modifications, we summarize the adaptations of module level and architecture level of the time series transformers. From the perspective of applications, we categorize time series transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance. To the best of our knowledge, this paper is the first work to comprehensively and systematically summarize the recent advances of Transformers for modeling time series data. We hope this survey will ignite further research interests in time series Transformers.

Knowledge graphs are important resources for many artificial intelligence tasks but often suffer from incompleteness. In this work, we propose to use pre-trained language models for knowledge graph completion. We treat triples in knowledge graphs as textual sequences and propose a novel framework named Knowledge Graph Bidirectional Encoder Representations from Transformer (KG-BERT) to model these triples. Our method takes entity and relation descriptions of a triple as input and computes scoring function of the triple with the KG-BERT language model. Experimental results on multiple benchmark knowledge graphs show that our method can achieve state-of-the-art performance in triple classification, link prediction and relation prediction tasks.

ASR (automatic speech recognition) systems like Siri, Alexa, Google Voice or Cortana has become quite popular recently. One of the key techniques enabling the practical use of such systems in people's daily life is deep learning. Though deep learning in computer vision is known to be vulnerable to adversarial perturbations, little is known whether such perturbations are still valid on the practical speech recognition. In this paper, we not only demonstrate such attacks can happen in reality, but also show that the attacks can be systematically conducted. To minimize users' attention, we choose to embed the voice commands into a song, called CommandSong. In this way, the song carrying the command can spread through radio, TV or even any media player installed in the portable devices like smartphones, potentially impacting millions of users in long distance. In particular, we overcome two major challenges: minimizing the revision of a song in the process of embedding commands, and letting the CommandSong spread through the air without losing the voice "command". Our evaluation demonstrates that we can craft random songs to "carry" any commands and the modify is extremely difficult to be noticed. Specially, the physical attack that we play the CommandSongs over the air and record them can success with 94 percentage.

北京阿比特科技有限公司