好诱人的搜子好爽免费观看_亚洲日韩中文字幕一级乱码在线播放不卡_伊人丁香狠狠色综合久久_亚洲综合无码一区二区日日_无套内射AV二区_精品亚洲A无码一区二区三区_囗工全彩肉肉无遮挡无码

Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space. In this work, we unleash LLMs' creative potential for exploring multiple diverse problem solving strategies by framing an LLM as a hierarchical policy via in-context learning. This policy comprises of a visionary leader that proposes multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instruction. The follower uses each of the leader's directives as a guide and samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal. Additionally, we propose an effective and efficient tournament-based approach to select among these explored solution groups to reach the final answer. Our approach produces meaningful and inspiring hints, enhances problem-solving strategy exploration, and improves the final answer accuracy on challenging problems in the MATH dataset. Code will be released at //github.com/lz1oceani/LLM-As-Hierarchical-Policy.

相關內容

語言模型化

關注 9

視覺問答 · 自動問答 · Boosting（一種模型訓練加速方式） · 圖像檢索 · Performer ·

2023 年 12 月 19 日

VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

Chun-Mei Feng,Yang Bai,Tao Luo,Zhen Li,Salman Khan,Wangmeng Zuo,Xinxing Xu,Rick Siow Mong Goh,Yong Liu

Albeit progress has been made in Composed Image Retrieval (CIR), we empirically find that a certain percentage of failure retrieval results are not consistent with their relative captions. To address this issue, this work provides a Visual Question Answering (VQA) perspective to boost the performance of CIR. The resulting VQA4CIR is a post-processing approach and can be directly plugged into existing CIR methods. Given the top-C retrieved images by a CIR method, VQA4CIR aims to decrease the adverse effect of the failure retrieval results being inconsistent with the relative caption. To find the retrieved images inconsistent with the relative caption, we resort to the "QA generation to VQA" self-verification pipeline. For QA generation, we suggest fine-tuning LLM (e.g., LLaMA) to generate several pairs of questions and answers from each relative caption. We then fine-tune LVLM (e.g., LLaVA) to obtain the VQA model. By feeding the retrieved image and question to the VQA model, one can find the images inconsistent with relative caption when the answer by VQA is inconsistent with the answer in the QA pair. Consequently, the CIR performance can be boosted by modifying the ranks of inconsistently retrieved images. Experimental results show that our proposed method outperforms state-of-the-art CIR methods on the CIRR and Fashion-IQ datasets.

圖片分類 · Performer · 小樣本學習 · MoDELS · Learning ·

2023 年 12 月 19 日

On the Efficacy of Differentially Private Few-shot Image Classification

Marlon Tobaben,Aliaksandra Shysheya,John Bronskill,Andrew Paverd,Shruti Tople,Santiago Zanella-Beguelin,Richard E Turner,Antti Honkela

from arxiv, 49 pages, 24 figures; published in TMLR 12/2023 //openreview.net/forum?id=hFsr59Imzm

There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-tuned on private downstream datasets that are relatively large and similar in distribution to the pretraining data. However, in many applications including personalization and federated learning, it is crucial to perform well (i) in the few-shot setting, as obtaining large amounts of labeled data may be problematic; and (ii) on datasets from a wide variety of domains for use in various specialist settings. To understand under which conditions few-shot DP can be effective, we perform an exhaustive set of experiments that reveals how the accuracy and vulnerability to attack of few-shot DP image classification models are affected as the number of shots per class, privacy level, model architecture, downstream dataset, and subset of learnable parameters in the model vary. We show that to achieve DP accuracy on par with non-private models, the shots per class must be increased as the privacy level increases. We also show that learning parameter-efficient FiLM adapters under DP is competitive with learning just the final classifier layer or learning all of the network parameters. Finally, we evaluate DP federated learning systems and establish state-of-the-art performance on the challenging FLAIR benchmark.

循環網絡 · Networking · MoDELS · 分離的 · Integration ·

2023 年 12 月 19 日

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

Shengkui Zhao,Yukun Ma,Chongjia Ni,Chong Zhang,Hao Wang,Trung Hieu Nguyen,Kun Zhou,Jiaqi Yip,Dianwen Ng,Bin Ma

from arxiv, 5 pages, 3 figures, accepted by ICASSP 2024

Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to model both long-range, coarse-scale dependencies and fine-scale recurrent patterns by integrating a recurrent module into the MossFormer framework. Instead of applying the recurrent neural networks (RNNs) that use traditional recurrent connections, we present a recurrent module based on a feedforward sequential memory network (FSMN), which is considered "RNN-free" recurrent network due to the ability to capture recurrent patterns without using recurrent connections. Our recurrent module mainly comprises an enhanced dilated FSMN block by using gated convolutional units (GCU) and dense connections. In addition, a bottleneck layer and an output layer are also added for controlling information flow. The recurrent module relies on linear projections and convolutions for seamless, parallel processing of the entire sequence. The integrated MossFormer2 hybrid model demonstrates remarkable enhancements over MossFormer and surpasses other state-of-the-art methods in WSJ0-2/3mix, Libri2Mix, and WHAM!/WHAMR! benchmarks.

大語言模型 · 語言模型化 · MoDELS · 知識 (knowledge) · 控制器 ·

2023 年 12 月 18 日

Opportunities and Challenges of Applying Large Language Models in Building Energy Efficiency and Decarbonization Studies: An Exploratory Overview

Liang Zhang,Zhelun Chen

In recent years, the rapid advancement and impressive capabilities of Large Language Models (LLMs) have been evident across various domains. This paper explores the application, implications, and potential of LLMs in building energy efficiency and decarbonization studies. The wide-ranging capabilities of LLMs are examined in the context of the building energy field, including intelligent control systems, code generation, data infrastructure, knowledge extraction, and education. Despite the promising potential of LLMs, challenges including complex and expensive computation, data privacy, security and copyright, complexity in fine-tuned LLMs, and self-consistency are discussed. The paper concludes with a call for future research focused on the enhancement of LLMs for domain-specific tasks, multi-modal LLMs, and collaborative research between AI and energy experts.

優化器 · Networking · Neural Networks · 圖形處理器 · 核化 ·

2023 年 12 月 18 日

MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training

Hongwu Peng,Xi Xie,Kaustubh Shivdikar,MD Amit Hasan,Jiahui Zhao,Shaoyi Huang,Omer Khan,David Kaeli,Caiwen Ding

from arxiv, ASPLOS 2024 accepted publication

In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware. Existing solutions such as PyG, DGL with cuSPARSE, and GNNAdvisor frameworks partially address these challenges but memory traffic is still significant. We argue that drastic performance improvements can only be achieved by the vertical optimization of algorithm and system innovations, rather than treating the speedup optimization as an "after-thought" (i.e., (i) given a GNN algorithm, designing an accelerator, or (ii) given hardware, mainly optimizing the GNN algorithm). In this paper, we present MaxK-GNN, an advanced high-performance GPU training system integrating algorithm and system innovation. (i) We introduce the MaxK nonlinearity and provide a theoretical analysis of MaxK nonlinearity as a universal approximator, and present the Compressed Balanced Sparse Row (CBSR) format, designed to store the data and index of the feature matrix after nonlinearity; (ii) We design a coalescing enhanced forward computation with row-wise product-based SpGEMM Kernel using CBSR for input feature matrix fetching and strategic placement of a sparse output accumulation buffer in shared memory; (iii) We develop an optimized backward computation with outer product-based and SSpMM Kernel. We conduct extensive evaluations of MaxK-GNN and report the end-to-end system run-time. Experiments show that MaxK-GNN system could approach the theoretical speedup limit according to Amdahl's law. We achieve comparable accuracy to SOTA GNNs, but at a significantly increased speed: 3.22/4.24 times speedup (vs. theoretical limits, 5.52/7.27 times) on Reddit compared to DGL and GNNAdvisor implementations.

Networking · 優化器 · 講稿 · 估計/估計量 · Backbone ·

2023 年 12 月 18 日

On the Benefits of Rate-Adaptive Transceivers: A Network Planning Study

Jasper Müller,Gabriele Di Rosa,Tobias Fehenberger,Mario Wenning,Sai Kireet Patri,J?rg-Peter Elbers,Carmen Mas-Machuca

from arxiv, Copyright 2023 IEEE. This work has been partially funded in the framework of the CELTIC-NEXT project AI-NET-PROTECT (Project ID C2019/3-4) (#16KIS1279K) and in the programme "Souver\"an. Digital. Vernetzt." joint project 6G-life (#16KISK002) by the German Federal Ministry of Education and Research

Flexible-grid Elastic Optical Networks (EONs) have been widely deployed in recent years to support the growing demand for bandwidth-intensive applications. To address this cost-efficiently, optimized utilization of EONs is required. Next-generation bandwidth-variable transceivers (BVTs) will offer increased adaptivity in symbol rate as well as modulation through probabilistic constellation shaping. In this work, we therefore investigate the impact of increased configuration granularity on various aspects of optical networks. We account for practical implementation considerations of BVT configurations for the estimation of the required signal-to-noise ratio. Additionally, an optimization algorithm is presented that selects the most efficient configuration for each considered data rate and bandwidth combination. Based on the advanced transceiver configurations, we conduct a network planning study using a physical-layer-aware algorithm for flexible-grid EONs, and present results for a national and a continental optical backbone network topology. Our research demonstrates that a rise in modulation rate adaptivity results in substantial savings in resources, decreasing the number of necessary lightpaths by as much as 20% in EONs. In contrast, increased symbol rate granularity only results in minor savings.

秩 · 穩健性 · MoDELS · INFORMS · 正則化項 ·

2023 年 12 月 16 日

Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off

Yu-An Liu,Ruqing Zhang,Mingkun Zhang,Wei Chen,Maarten de Rijke,Jiafeng Guo,Xueqi Cheng

from arxiv, Accepted by AAAI 24

Neural ranking models (NRMs) have shown great success in information retrieval (IR). But their predictions can easily be manipulated using adversarial examples, which are crafted by adding imperceptible perturbations to legitimate documents. This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs. By incorporating adversarial examples into training data, adversarial training has become the de facto defense approach to adversarial attacks against NRMs. However, this defense mechanism is subject to a trade-off between effectiveness and adversarial robustness. In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs. We decompose the robust ranking error into two components, i.e., a natural ranking error for effectiveness evaluation and a boundary ranking error for assessing adversarial robustness. Then, we define the perturbation invariance of a ranking model and prove it to be a differentiable upper bound on the boundary ranking error for attainable computation. Informed by our theoretical analysis, we design a novel \emph{perturbation-invariant adversarial training} (PIAT) method for ranking models to achieve a better effectiveness-robustness trade-off. We design a regularized surrogate loss, in which one term encourages the effectiveness to be maximized while the regularization term encourages the output to be smooth, so as to improve adversarial robustness. Experimental results on several ranking models demonstrate the superiority of PITA compared to existing adversarial defenses.

Better · 模型評估 · 穩健性 · ICLR · Extensibility ·

2023 年 12 月 15 日

Closing the Gap: Achieving Better Accuracy-Robustness Tradeoffs Against Query-Based Attacks

Pascal Zimmer,Sébastien Andreina,Giorgia Azzurra Marson,Ghassan Karame

from arxiv, To appear in the Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 2024

Although promising, existing defenses against query-based attacks share a common limitation: they offer increased robustness against attacks at the price of a considerable accuracy drop on clean samples. In this work, we show how to efficiently establish, at test-time, a solid tradeoff between robustness and accuracy when mitigating query-based attacks. Given that these attacks necessarily explore low-confidence regions, our insight is that activating dedicated defenses, such as RND (Qin et al., NeuRIPS 2021) and Random Image Transformations (Xie et al., ICLR 2018), only for low-confidence inputs is sufficient to prevent them. Our approach is independent of training and supported by theory. We verify the effectiveness of our approach for various existing defenses by conducting extensive experiments on CIFAR-10, CIFAR-100, and ImageNet. Our results confirm that our proposal can indeed enhance these defenses by providing better tradeoffs between robustness and accuracy when compared to state-of-the-art approaches while being completely training-free.

學習器 · MOOCs · 圖 · Learning · 在線 ·

2023 年 12 月 11 日

Finding Paths for Explainable MOOC Recommendation: A Learner Perspective

Jibril Frej,Neel Shah,Marta Kne?evi?,Tanya Nazaretsky,Tanja K?ser

The increasing availability of Massive Open Online Courses (MOOCs) has created a necessity for personalized course recommendation systems. These systems often combine neural networks with Knowledge Graphs (KGs) to achieve richer representations of learners and courses. While these enriched representations allow more accurate and personalized recommendations, explainability remains a significant challenge which is especially problematic for certain domains with significant impact such as education and online learning. Recently, a novel class of recommender systems that uses reinforcement learning and graph reasoning over KGs has been proposed to generate explainable recommendations in the form of paths over a KG. Despite their accuracy and interpretability on e-commerce datasets, these approaches have scarcely been applied to the educational domain and their use in practice has not been studied. In this work, we propose an explainable recommendation system for MOOCs that uses graph reasoning. To validate the practical implications of our approach, we conducted a user study examining user perceptions of our new explainable recommendations. We demonstrate the generalizability of our approach by conducting experiments on two educational datasets: COCO and Xuetang.

圖 · Networking · 學成 · Performer · 深度學習 ·

2020 年 10 月 9 日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Emanuele Rossi,Ben Chamberlain,Fabrizio Frasca,Davide Eynard,Federico Monti,Michael Bronstein

Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.