亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

To mitigate the high inference latency stemming from autoregressive decoding in Large Language Models (LLMs), Speculative Decoding has emerged as a novel decoding paradigm for LLM inference. In each decoding step, this method first efficiently drafts several future tokens and then verifies them in parallel. Unlike autoregressive decoding, Speculative Decoding facilitates the simultaneous decoding of multiple tokens per step, thereby accelerating inference. This paper presents a comprehensive overview and analysis of this promising decoding paradigm. We begin by providing a formal definition and formulation of Speculative Decoding. Then, we organize in-depth discussions on its key facets, including current leading techniques, the challenges faced, and potential future directions in this field. We aim for this work to serve as a catalyst for further research on Speculative Decoding, ultimately contributing to more efficient LLM inference.

相關內容

Despite achieving rapid developments and with widespread applications, Large Vision-Language Models (LVLMs) confront a serious challenge of being prone to generating hallucinations. An over-reliance on linguistic priors has been identified as a key factor leading to these hallucinations. In this paper, we propose to alleviate this problem by introducing a novel image-biased decoding (IBD) technique. Our method derives the next-token probability distribution by contrasting predictions from a conventional LVLM with those of an image-biased LVLM, thereby amplifying the correct information highly correlated with image content while mitigating the hallucinatory errors caused by excessive dependence on text. We further conduct a comprehensive statistical analysis to validate the reliability of our method, and design an adaptive adjustment strategy to achieve robust and flexible handling under varying conditions. Experimental results across multiple evaluation metrics verify that our method, despite not requiring additional training data and only with a minimal increase in model parameters, can significantly reduce hallucinations in LVLMs and enhance the truthfulness of the generated response.

Batch Normalization's (BN) unique property of depending on other samples in a batch is known to cause problems in several tasks, including sequential modeling. Yet, BN-related issues are hardly studied for long video understanding, despite the ubiquitous use of BN in CNNs (Convolutional Neural Networks) for feature extraction. Especially in surgical workflow analysis, where the lack of pretrained feature extractors has led to complex, multi-stage training pipelines, limited awareness of BN issues may have hidden the benefits of training CNNs and temporal models end to end. In this paper, we analyze pitfalls of BN in video learning, including issues specific to online tasks such as a 'cheating' effect in anticipation. We observe that BN's properties create major obstacles for end-to-end learning. However, using BN-free backbones, even simple CNN-LSTMs beat the state of the art {\color{\colorrevtwo}on three surgical workflow benchmarks} by utilizing adequate end-to-end training strategies which maximize temporal context. We conclude that awareness of BN's pitfalls is crucial for effective end-to-end learning in surgical tasks. By reproducing results on natural-video datasets, we hope our insights will benefit other areas of video learning as well. Code is available at: \url{//gitlab.com/nct_tso_public/pitfalls_bn}

Human Activity Recognition (HAR) has been extensively studied, with recent emphasis on the implementation of advanced Machine Learning (ML) and Deep Learning (DL) algorithms for accurate classification. This study investigates the efficacy of two ML algorithms, eXtreme Gradient Boosting (XGBoost) and MiniRocket, in the realm of HAR using data collected from smartphone sensors. The experiments are conducted on a dataset obtained from the UCI repository, comprising accelerometer and gyroscope signals captured from 30 volunteers performing various activities while wearing a smartphone. The dataset undergoes preprocessing, including noise filtering and feature extraction, before being utilized for training and testing the classifiers. Monte Carlo cross-validation is employed to evaluate the models' robustness. The findings reveal that both XGBoost and MiniRocket attain accuracy, F1 score, and AUC values as high as 0.99 in activity classification. XGBoost exhibits a slightly superior performance compared to MiniRocket. Notably, both algorithms surpass the performance of other ML and DL algorithms reported in the literature for HAR tasks. Additionally, the study compares the computational efficiency of the two algorithms, revealing XGBoost's advantage in terms of training time. Furthermore, the performance of MiniRocket, which achieves accuracy and F1 values of 0.94, and an AUC value of 0.96 using raw data and utilizing only one channel from the sensors, highlights the potential of directly leveraging unprocessed signals. It also suggests potential advantages that could be gained by utilizing sensor fusion or channel fusion techniques. Overall, this research sheds light on the effectiveness and computational characteristics of XGBoost and MiniRocket in HAR tasks, providing insights for future studies in activity recognition using smartphone sensor data.

Sentence Representation Learning (SRL) is a crucial task in Natural Language Processing (NLP), where contrastive Self-Supervised Learning (SSL) is currently a mainstream approach. However, the reasons behind its remarkable effectiveness remain unclear. Specifically, in other research fields, contrastive SSL shares similarities in both theory and practical performance with non-contrastive SSL (e.g., alignment & uniformity, Barlow Twins, and VICReg). However, in SRL, contrastive SSL outperforms non-contrastive SSL significantly. Therefore, two questions arise: First, what commonalities enable various contrastive losses to achieve superior performance in SRL? Second, how can we make non-contrastive SSL, which is similar to contrastive SSL but ineffective in SRL, effective? To address these questions, we start from the perspective of gradients and discover that four effective contrastive losses can be integrated into a unified paradigm, which depends on three components: the Gradient Dissipation, the Weight, and the Ratio. Then, we conduct an in-depth analysis of the roles these components play in optimization and experimentally demonstrate their significance for model performance. Finally, by adjusting these components, we enable non-contrastive SSL to achieve outstanding performance in SRL.

This paper addresses the problem of how to select explanations for XAI (Explainable AI)-based Intelligent Decision Support Systems (IDSSs). IDSSs have shown promise in improving user decisions through XAI-generated explanations along with AI predictions. As the development of XAI made various explanations available, we believe that IDSSs can be greatly improved if they can strategically select explanations that guide users to better decisions. This paper proposes X-Selector, a method for dynamically selecting explanations. X-Selector aims to guide users to better decisions by predicting the impact of different combinations of explanations on user decisions. We compared X-Selector's performance with two naive strategies (all possible explanations and explanations only for the most likely prediction) and two baselines (no explanation and no AI support). The results suggest the potential of X-Selector to guide users to recommended decisions and improve the performance when AI accuracy is high and a challenge when it is low.

Radar Automated Target Recognition (RATR) for Unmanned Aerial Vehicles (UAVs) involves transmitting Electromagnetic Waves (EMWs) and performing target type recognition on the received radar echo, crucial for defense and aerospace applications. Previous studies highlighted the advantages of multistatic radar configurations over monostatic ones in RATR. However, fusion methods in multistatic radar configurations often suboptimally combine classification vectors from individual radars probabilistically. To address this, we propose a fully Bayesian RATR framework employing Optimal Bayesian Fusion (OBF) to aggregate classification probability vectors from multiple radars. OBF, based on expected 0-1 loss, updates a Recursive Bayesian Classification (RBC) posterior distribution for target UAV type, conditioned on historical observations across multiple time steps. We evaluate the approach using simulated random walk trajectories for seven drones, correlating target aspect angles to Radar Cross Section (RCS) measurements in an anechoic chamber. Comparing against single radar Automated Target Recognition (ATR) systems and suboptimal fusion methods, our empirical results demonstrate that the OBF method integrated with RBC significantly enhances classification accuracy compared to other fusion methods and single radar configurations.

The Alpha Magnetic Spectrometer (AMS) is a high-precision particle detector onboard the International Space Station containing six different subdetectors. The Transition Radiation Detector and Electromagnetic Calorimeter (ECAL) are used to separate electrons/positrons from the abundant cosmic-ray proton background. The positron flux measured in space by AMS falls with a power law which unexpectedly softens above 25 GeV and then hardens above 280 GeV. Several theoretical models try to explain these phenomena, and a purer measurement of positrons at higher energies is needed to help test them. The currently used methods to reject the proton background at high energies involve extrapolating shower features from the ECAL to use as inputs for boosted decision tree and likelihood classifiers. We present a new approach for particle identification with the AMS ECAL using deep learning (DL). By taking the energy deposition within all the ECAL cells as an input and treating them as pixels in an image-like format, we train an MLP, a CNN, and multiple ResNets and Convolutional vision Transformers (CvTs) as shower classifiers. Proton rejection performance is evaluated using Monte Carlo (MC) events and ISS data separately. For MC, using events with a reconstructed energy between 0.2 - 2 TeV, at 90% electron accuracy, the proton rejection power of our CvT model is more than 5 times that of the other DL models. Similarly, for ISS data with a reconstructed energy between 50 - 70 GeV, the proton rejection power of our CvT model is more than 2.5 times that of the other DL models.

This groundbreaking study explores the expanse of Large Language Models (LLMs), such as Generative Pre-Trained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT) across varied domains ranging from technology, finance, healthcare to education. Despite their established prowess in Natural Language Processing (NLP), these LLMs have not been systematically examined for their impact on domains such as fitness, and holistic well-being, urban planning, climate modelling as well as disaster management. This review paper, in addition to furnishing a comprehensive analysis of the vast expanse and extent of LLMs' utility in diverse domains, recognizes the research gaps and realms where the potential of LLMs is yet to be harnessed. This study uncovers innovative ways in which LLMs can leave a mark in the fields like fitness and wellbeing, urban planning, climate modelling and disaster response which could inspire future researches and applications in the said avenues.

Large Language Models (LLMs) employing Chain-of-Thought (CoT) prompting have broadened the scope for improving multi-step reasoning capabilities. We generally divide multi-step reasoning into two phases: path generation to generate the reasoning path(s); and answer calibration post-processing the reasoning path(s) to obtain a final answer. However, the existing literature lacks systematic analysis on different answer calibration approaches. In this paper, we summarize the taxonomy of recent answer calibration techniques and break them down into step-level and path-level strategies. We then conduct a thorough evaluation on these strategies from a unified view, systematically scrutinizing step-level and path-level answer calibration across multiple paths. Experimental results reveal that integrating the dominance of both strategies tends to derive optimal outcomes. Our study holds the potential to illuminate key insights for optimizing multi-step reasoning with answer calibration.

Marine mammal communication is a complex field, hindered by the diversity of vocalizations and environmental factors. The Watkins Marine Mammal Sound Database (WMMD) is an extensive labeled dataset used in machine learning applications. However, the methods for data preparation, preprocessing, and classification found in the literature are quite disparate. This study first focuses on a brief review of the state-of-the-art benchmarks on the dataset, with an emphasis on clarifying data preparation and preprocessing methods. Subsequently, we propose the application of the Wavelet Scattering Transform (WST) in place of standard methods based on the Short-Time Fourier Transform (STFT). The study also tackles a classification task using an ad-hoc deep architecture with residual layers. We outperform the existing classification architecture by $6\%$ in accuracy using WST and $8\%$ using Mel spectrogram preprocessing, effectively reducing by half the number of misclassified samples, and reaching a top accuracy of $96\%$.

北京阿比特科技有限公司