99欧美日韩精品一区二区红桃,欧美狂野视频一区国产精品,精品国产午夜福利在线观看国产馆

While most research on controllable text generation has focused on steering base Language Models, the emerging instruction-tuning and prompting paradigm offers an alternate approach to controllability. We compile and release ConGenBench, a testbed of 17 different controllable generation tasks, using a subset of it to benchmark the performance of 9 different baselines and methods on Instruction-tuned Language Models. To our surprise, we find that prompting-based approaches outperform controllable text generation methods on most datasets and tasks, highlighting a need for research on controllable text generation with Instruction-tuned Language Models in specific. Prompt-based approaches match human performance on most stylistic tasks while lagging on structural tasks, foregrounding a need to study more varied constraints and more challenging stylistic tasks. To facilitate such research, we provide an algorithm that uses only a task dataset and a Large Language Model with in-context capabilities to automatically generate a constraint dataset. This method eliminates the fields dependence on pre-curated constraint datasets, hence vastly expanding the range of constraints that can be studied in the future.

相關內容

控制器

關注 5

跡 · 路徑 · INTERACT · Performer · Notability ·

2024 年 6 月 13 日

Towards Accelerating Real-Time Path Tracing with Foveated Framework

Bipul Mohanto,Sven Kluge,Oliver Staadt

Path tracing is one of the most widespread rendering techniques for high-end graphics fidelity. However, the slow convergence time and presence of intensive noises make it infeasible for numerous real-time applications where physically corrected photorealistic effects are salient. Additionally, the increased demand for pixel density, geometric complexity, advanced material, and multiple lights hinder the algorithm from attaining an interactive frame rate for real-time applications. To address these issues, we developed a framework to accelerate path tracing through foveated rendering, a robust technique that leverages human vision. Our dynamic foveated path-tracing framework integrates fixation data and selectively lowers the rendering resolution towards the periphery. The framework is built on NVIDIA's OptiX 7.5 API with CUDA 12.1, serving as the base of future foveated path tracing research. Through comprehensive experimentation, we demonstrated the effectiveness of our framework in this paper. Depending on the scene complexity, our solution can significantly enhance rendering performance up to a factor of 25 without any notable visual differences. We further evaluated the framework using a structured error map algorithm with variable sample numbers and foveated area size.

設計 · INTERACT · TOOLS · 論文 · AVS ·

2024 年 6 月 13 日

A Tangible Multi-Display Toolkit to Support the Collaborative Design Exploration of AV-Pedestrian Interfaces

Marius Hoggenmuller,Martin Tomitsch,Callum Parker,Trung Thanh Nguyen,Dawei Zhou,Stewart Worrall,Eduardo Nebot

The advent of cyber-physical systems, such as robots and autonomous vehicles (AVs), brings new opportunities and challenges for the domain of interaction design. Though there is consensus about the value of human-centred development, there is a lack of documented tailored methods and tools for involving multiple stakeholders in design exploration processes. In this paper we present a novel approach using a tangible multi-display toolkit. Orchestrating computer-generated imagery across multiple displays, the toolkit enables multiple viewing angles and perspectives to be captured simultaneously (e.g. top-view, first-person pedestrian view). Participants are able to directly interact with the simulated environment through tangible objects. At the same time, the objects physically simulate the interface's behaviour (e.g. through an integrated LED display). We evaluated the toolkit in design sessions with experts to collect feedback and input on the design of an AV-pedestrian interface. The paper reports on how the combination of tangible objects and multiple displays supports collaborative design explorations.

Mixup · 混合 · 流形 · 標注 · 線性的 ·

2024 年 6 月 11 日

Tailoring Mixup to Data for Calibration

Quentin Bouniot,Pavlo Mozharovskyi,Florence d'Alché-Buc

Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved performance, Mixup is also a good technique for improving calibration and predictive uncertainty. However, mixing data carelessly can lead to manifold intrusion, i.e., conflicts between the synthetic labels assigned and the true label distributions, which can deteriorate calibration. In this work, we argue that the likelihood of manifold intrusion increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves performance and calibration of models, while being much more efficient. The code for our work is available at //github.com/qbouniot/sim_kernel_mixup.

ResNet · 縮放 · Performer · 分解的 · 層 ·

2024 年 6 月 10 日

Scaling ResNets in the Large-depth Regime

Pierre Marion,Adeline Fermanian,Gérard Biau,Jean-Philippe Vert

from arxiv, 44 pages, 9 figures. Updated with clarifications and additional references

Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth $L$ increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor $\alpha_L$. We show in a probabilistic setting that with standard i.i.d.~initializations, the only non-trivial dynamics is for $\alpha_L = \frac{1}{\sqrt{L}}$; other choices lead either to explosion or to identity mapping. This scaling factor corresponds in the continuous-time limit to a neural stochastic differential equation, contrarily to a widespread interpretation that deep ResNets are discretizations of neural ordinary differential equations. By contrast, in the latter regime, stability is obtained with specific correlated initializations and $\alpha_L = \frac{1}{L}$. Our analysis suggests a strong interplay between scaling and regularity of the weights as a function of the layer index. Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training.

KNN · Performer · 語音識別 · 端到端 · E2E ·

2024 年 6 月 7 日

Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Shaojun Li,Daimeng Wei,Jiaxin Guo,ZongYao Li,Zhanglin Wu,Zhiqiang Rao,Yuanchang Luo,Xianghui He,Hao Yang

from arxiv, Accepted to Interspeech 2024

Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model output by finding correctly pronounced tokens from its pre-built datastore during the decoding phase. Moreover, we utilize x-vector to dynamically adjust kNN interpolation parameters for data sparsity issue. This approach was validated using KeSpeech and MagicData corpora under in-domain and all-domain settings. Our method consistently performs comparably to fine-tuning without the associated performance degradation during speaker changes. Furthermore, in the all-domain setting, our method achieves state-of-the-art results, reducing the CER in both single speaker and multi-speaker test scenarios.

GPUs · 可辨認的 · MoDELS · Networking · Boosting（一種模型訓練加速方式） ·

2024 年 6 月 7 日

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Jianbo Dong,Bin Luo,Jun Zhang,Pengcheng Zhang,Fei Feng,Yikai Zhu,Ang Liu,Zian Chen,Yi Shi,Hairong Jiao,Gang Lu,Yu Guan,Ennan Zhai,Wencong Xiao,Hanyu Zhao,Man Yuan,Siran Yang,Xiang Li,Jiamang Wang,Rui Men,Jianwei Zhang,Huang Zhong,Dennis Cai,Yuan Xie,Binzhang Fu

The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the training tasks. The inability to quickly identify the faulty components results in a substantial waste of GPU resources. Secondly, since GPUs must wait for parameter synchronization to complete before proceeding to the next round of computation, network congestions can greatly increase the waiting time for GPUs. To address these challenges, this paper introduces a communication-driven solution, namely the C4. The key insights of C4 are two folds. First, in parallel training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies are certainly due to some form of hardware malfunction. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving few large flows, allows C4 to efficiently execute traffic planning, substantially reducing network congestion. C4 has been extensively implemented across our production systems, cutting error-induced overhead by roughly 30% and enhancing runtime performance by about 15% for certain applications with moderate communication costs.

離散化 · MoDELS · 得分 · 估計/估計量 · Perplexity ·

2024 年 6 月 6 日

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou,Chenlin Meng,Stefano Ermon

from arxiv, ICML 2024 Oral. Code at //github.com/louaaron/Score-Entropy-Discrete-Diffusion

Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel loss that naturally extends score matching to discrete spaces, integrates seamlessly to build discrete diffusion models, and significantly boosts performance. Experimentally, we test our Score Entropy Discrete Diffusion models (SEDD) on standard language modeling tasks. For comparable model sizes, SEDD beats existing language diffusion paradigms (reducing perplexity by $25$-$75$\%) and is competitive with autoregressive models, in particular outperforming GPT-2. Furthermore, compared to autoregressive mdoels, SEDD generates faithful text without requiring distribution annealing techniques like temperature scaling (around $6$-$8\times$ better generative perplexity than un-annealed GPT-2), can trade compute and quality (similar quality with $32\times$ fewer network evaluations), and enables controllable infilling (matching nucleus sampling quality while enabling other strategies besides left to right prompting).

稀疏自編碼器 · contrastive · 自編碼器 · 稀疏 · Agent ·

2024 年 6 月 6 日

Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents

Yoann Poupart

from arxiv, Worskhop on Interpretable Policies in Reinforcement Learning @ RLC-2024, 18 pages and 15 figures

AI led chess systems to a superhuman level, yet these systems heavily rely on black-box algorithms. This is unsustainable in ensuring transparency to the end-user, particularly when these systems are responsible for sensitive decision-making. Recent interpretability work has shown that the inner representations of Deep Neural Networks (DNNs) were fathomable and contained human-understandable concepts. Yet, these methods are seldom contextualised and are often based on a single hidden state, which makes them unable to interpret multi-step reasoning, e.g. planning. In this respect, we propose contrastive sparse autoencoders (CSAE), a novel framework for studying pairs of game trajectories. Using CSAE, we are able to extract and interpret concepts that are meaningful to the chess-agent plans. We primarily focused on a qualitative analysis of the CSAE features before proposing an automated feature taxonomy. Furthermore, to evaluate the quality of our trained CSAE, we devise sanity checks to wave spurious correlations in our results.

語音識別 · 門控 · Performer · MoDELS · 噪聲 ·

2024 年 6 月 6 日

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

Jiaming Zhou,Shiwan Zhao,Hui Wang,Tian-Hao Zhang,Haoqin Sun,Xuechen Wang,Yong Qin

The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference. Our method selects the appropriate datastore for decoding each frame, ensuring the injection of language-specific information into the ASR process. We apply this framework to cutting-edge CTC-based models, developing an advanced CS-ASR system. Extensive experiments demonstrate the remarkable effectiveness of our gated datastore mechanism in enhancing the performance of zero-shot Chinese-English CS-ASR.

Shuffle · 泛函 · ONCE · 凸函數 · Performer ·

2024 年 6 月 6 日

On the Last-Iterate Convergence of Shuffling Gradient Methods

Zijian Liu,Zhengyuan Zhou

from arxiv, ICML 2024

Shuffling gradient methods are widely used in modern machine learning tasks and include three popular implementations: Random Reshuffle (RR), Shuffle Once (SO), and Incremental Gradient (IG). Compared to the empirical success, the theoretical guarantee of shuffling gradient methods was not well-understood for a long time. Until recently, the convergence rates had just been established for the average iterate for convex functions and the last iterate for strongly convex problems (using squared distance as the metric). However, when using the function value gap as the convergence criterion, existing theories cannot interpret the good performance of the last iterate in different settings (e.g., constrained optimization). To bridge this gap between practice and theory, we prove the first last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing last-iterate lower bounds or are as fast as the previous best upper bounds for the average iterate.