爱琴海论坛视频播放三免费-国产日韩VO免费一区二区

Reg-ROMs are stabilization strategies that leverage spatial filtering to alleviate the spurious numerical oscillations generally displayed by the classical G-ROM in under-resolved numerical simulations of turbulent flows. In this paper, we propose a new Reg-ROM, the time-relaxation ROM (TR-ROM), which filters the marginally resolved scales. We compare the new TR-ROM with the two other Reg-ROMs in current use, i.e., the L-ROM and the EFR-ROM, in the numerical simulation of the turbulent channel flow at $Re_{\tau} = 180$ and $Re_{\tau} = 395$ in both the reproduction and the predictive regimes. For each Reg-ROM, we investigate two different filters: (i) the differential filter (DF), and (ii) a new higher-order algebraic filter (HOAF). In our numerical investigation, we monitor the Reg-ROM performance for the ROM dimension, $N$, and the filter order. We also perform sensitivity studies of the three Reg-ROMs for the time interval, relaxation parameter, and filter radius. The numerical results yield the following conclusions: (i) All three Reg-ROMs are significantly more accurate than the G-ROM and (ii) more accurate than the ROM projection, representing the best theoretical approximation of the training data in the given ROM space. (iii) With the optimal parameter values, the TR-ROM is more accurate than the other two Reg-ROMs in all tests. (iv) For most $N$ values, DF yields the most accurate results for all three Reg-ROMs. (v) The optimal parameters trained in the reproduction regime are also optimal for the predictive regime for most $N$ values. (vi) All three Reg-ROMs are sensitive to the filter radius and the filter order, and the EFR-ROM and the TR-ROM are sensitive to the relaxation parameter. (vii) The optimal range for the filter radius and the effect of relaxation parameter are similar for the two $\rm Re_\tau$ values.

相關內容

優化器

關注 4

Performer · 推斷 · 蒸餾 · 計算成本 · Learning ·

2024 年 2 月 9 日

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Zheng Xiong,Risto Vuorio,Jacob Beck,Matthieu Zimmer,Kun Shao,Shimon Whiteson

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.

機器翻譯 · Prompt · MoDELS · binary · Automator ·

2024 年 2 月 8 日

A Prompt Response to the Demand for Automatic Gender-Neutral Translation

Beatrice Savoldi,Andrea Piergentili,Dennis Fucci,Matteo Negri,Luisa Bentivogli

from arxiv, Accepted at EACL 2024

Gender-neutral translation (GNT) that avoids biased and undue binary assumptions is a pivotal challenge for the creation of more inclusive translation technologies. Advancements for this task in Machine Translation (MT), however, are hindered by the lack of dedicated parallel data, which are necessary to adapt MT systems to satisfy neutral constraints. For such a scenario, large language models offer hitherto unforeseen possibilities, as they come with the distinct advantage of being versatile in various (sub)tasks when provided with explicit instructions. In this paper, we explore this potential to automate GNT by comparing MT with the popular GPT-4 model. Through extensive manual analyses, our study empirically reveals the inherent limitations of current MT systems in generating GNTs and provides valuable insights into the potential and challenges associated with prompting for neutrality.

線性的 · 泛函 · 近似 · 正則化項 · 近似誤差 ·

2024 年 2 月 8 日

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Semih Cayci,Niao He,R. Srikant

Natural policy gradient (NPG) methods with entropy regularization achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the \emph{persistence of excitation} condition, and achieves a fast convergence rate of $\tilde{O}(1/T)$ up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.

INFORMS · LORA · 模型評估 · 大語言模型 · Extensibility ·

2024 年 2 月 8 日

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Haotong Qin,Xudong Ma,Xingyu Zheng,Xiaoyang Li,Yang Zhang,Shouda Liu,Jie Luo,Xianglong Liu,Michele Magno

The LoRA-finetuning quantization of LLMs has been extensively studied to obtain accurate yet compact LLMs for deployment on resource-constrained hardware. However, existing methods cause the quantized LLM to severely degrade and even fail to benefit from the finetuning of LoRA. This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information retention. The proposed IR-QLoRA mainly relies on two technologies derived from the perspective of unified information: (1) statistics-based Information Calibration Quantization allows the quantized parameters of LLM to retain original information accurately; (2) finetuning-based Information Elastic Connection makes LoRA utilizes elastic representation transformation with diverse information. Comprehensive experiments show that IR-QLoRA can significantly improve accuracy across LLaMA and LLaMA2 families under 2-4 bit-widths, e.g., 4- bit LLaMA-7B achieves 1.4% improvement on MMLU compared with the state-of-the-art methods. The significant performance gain requires only a tiny 0.31% additional time consumption, revealing the satisfactory efficiency of our IRQLoRA. We highlight that IR-QLoRA enjoys excellent versatility, compatible with various frameworks (e.g., NormalFloat and Integer quantization) and brings general accuracy gains. The code is available at //github.com/htqin/ir-qlora.

激活函數 · 泛函 · 隱藏層 · MoDELS · Networking ·

2024 年 2 月 8 日

Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data

Farhad Pourkamali-Anaraki,Tahamina Nasrin,Robert E. Jensen,Amy M. Peterson,Christopher J. Hansen

from arxiv, 7 figures

A pivotal aspect in the design of neural networks lies in selecting activation functions, crucial for introducing nonlinear structures that capture intricate input-output patterns. While the effectiveness of adaptive or trainable activation functions has been studied in domains with ample data, like image classification problems, significant gaps persist in understanding their influence on classification accuracy and predictive uncertainty in settings characterized by limited data availability. This research aims to address these gaps by investigating the use of two types of adaptive activation functions. These functions incorporate shared and individual trainable parameters per hidden layer and are examined in three testbeds derived from additive manufacturing problems containing fewer than one hundred training instances. Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models that outperform fixed-shape activation functions and the less flexible method of using identical trainable activation functions in a hidden layer. Therefore, this work presents an elegant way of facilitating the design of adaptive neural networks in scientific and engineering problems.

Integration · MoDELS · WMI · Weight · AI ·

2024 年 2 月 7 日

A Unified Framework for Probabilistic Verification of AI Systems via Weighted Model Integration

Paolo Morettin,Andrea Passerini,Roberto Sebastiani

The probabilistic formal verification (PFV) of AI systems is in its infancy. So far, approaches have been limited to ad-hoc algorithms for specific classes of models and/or properties. We propose a unifying framework for the PFV of AI systems based onWeighted Model Integration (WMI), which allows to frame the problem in very general terms. Crucially, this reduction enables the verification of many properties of interest, like fairness, robustness or monotonicity, over a wide range of machine learning models, without making strong distributional assumptions. We support the generality of the approach by solving multiple verification tasks with a single, off-the-shelf WMI solver, then discuss the scalability challenges and research directions related to this promising framework.

貝葉斯網/貝葉斯網絡 · Analysis · MoDELS · 大語言模型 · 樣例 ·

2024 年 2 月 7 日

A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models

Marc Braun,Jenny Kunz

The self-rationalising capabilities of LLMs are appealing because the generated explanations can give insights into the plausibility of the predictions. However, how faithful the explanations are to the predictions is questionable, raising the need to explore the patterns behind them further. To this end, we propose a hypothesis-driven statistical framework. We use a Bayesian network to implement a hypothesis about how a task (in our example, natural language inference) is solved, and its internal states are translated into natural language with templates. Those explanations are then compared to LLM-generated free-text explanations using automatic and human evaluations. This allows us to judge how similar the LLM's and the Bayesian network's decision processes are. We demonstrate the usage of our framework with an example hypothesis and two realisations in Bayesian networks. The resulting models do not exhibit a strong similarity to GPT-3.5. We discuss the implications of this as well as the framework's potential to approximate LLM decisions better in future work.

優化器 · 統計量 · 規范化的 · 離散化 · 查準率/準確率 ·

2024 年 2 月 7 日

Asymptotic Dynamics of Alternating Minimization for Non-Convex Optimization

Koki Okajima,Takashi Takahashi

from arxiv, 19 pages, 8 figures

This study investigates the asymptotic dynamics of alternating minimization applied to optimize a bilinear non-convex function with normally distributed covariates. We employ the replica method from statistical physics in a multi-step approach to precisely trace the algorithm's evolution. Our findings indicate that the dynamics can be described effectively by a two--dimensional discrete stochastic process, where each step depends on all previous time steps, revealing a memory dependency in the procedure. The theoretical framework developed in this work is broadly applicable for the analysis of various iterative algorithms, extending beyond the scope of alternating minimization.

Neural Networks · Networking · 可約的 · Continuity · 推斷 ·

2021 年 6 月 21 日

A Survey of Quantization Methods for Efficient Neural Network Inference

Amir Gholami,Sehoon Kim,Zhen Dong,Zhewei Yao,Michael W. Mahoney,Kurt Keutzer

from arxiv, Book Chapter: Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

命名實體識別 · entity · 學成 · 深度學習 · 可辨認的 ·

2020 年 3 月 13 日

A Survey on Deep Learning for Named Entity Recognition

Jing Li,Aixin Sun,Jianglei Han,Chenliang Li

from arxiv, 20 pages, 12 figures, 3 tables. arXiv admin note: text overlap with arXiv:1702.02098, arXiv:1904.10503 by other authors

Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.