青青国产成人久久激情91-亚洲AV久播在线观看

The meaning of complex phrases in natural language is composed of their individual components. The task of compositional generalization evaluates a model's ability to understand new combinations of components. Previous studies trained smaller, task-specific models, which exhibited poor generalization. While large language models (LLMs) exhibit impressive generalization abilities on many tasks through in-context learning (ICL), their potential for compositional generalization remains unexplored. In this paper, we first empirically investigate prevailing ICL methods in compositional generalization. We find that they struggle with complex compositional questions due to cumulative errors in long reasoning steps and intricate logic required for tool-making. Consequently, we propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools. Our method enhances the effectiveness of tool creation and usage with minimal human effort. Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%.

相關內容

泛化理論

關注 25

Hopf · 類別 · CASES · UniFormer · CASE ·

2024 年 2 月 2 日

Combinatorics of rectangulations: Old and new bijections

Andrei Asinowski,Jean Cardinal,Stefan Felsner,éric Fusy

from arxiv, 43 pages, 31 figures

A rectangulation is a decomposition of a rectangle into finitely many rectangles. Via natural equivalence relations, rectangulations can be seen as combinatorial objects with a rich structure, with links to lattice congruences, flip graphs, polytopes, lattice paths, Hopf algebras, etc. In this paper, we first revisit the structure of the respective equivalence classes: weak rectangulations that preserve rectangle-segment adjacencies, and strong rectangulations that preserve rectangle-rectangle adjacencies. We thoroughly investigate posets defined by adjacency in rectangulations of both kinds, and unify and simplify known bijections between rectangulations and permutation classes. This yields a uniform treatment of mappings between permutations and rectangulations that unifies the results from earlier contributions, and emphasizes parallelism and differences between the weak and the strong cases. Then, we consider the special case of guillotine rectangulations, and prove that they can be characterized - under all known mappings between permutations and rectangulations - by avoidance of two mesh patterns that correspond to "windmills" in rectangulations. This yields new permutation classes in bijection with weak guillotine rectangulations, and the first known permutation class in bijection with strong guillotine rectangulations. Finally, we address enumerative issues and prove asymptotic bounds for several families of strong rectangulations.

Extensibility · Performer · 矩 · 模型評估 · 相似度 ·

2024 年 2 月 2 日

Generalized framework for admissibility preserving Lax-Wendroff Flux Reconstruction for hyperbolic conservation laws with source terms

Arpit Babbar,Praveen Chandrashekar

from arxiv, 15 pages, 10 figures, has been submitted to proceedings for ICOSAHOM2023

Lax-Wendroff Flux Reconstruction (LWFR) is a single-stage, high order, quadrature free method for solving hyperbolic conservation laws. We perform a cell average decomposition of the LWFR scheme that is similar to the one used in the admissibility preserving framework of Zhang and Shu (2010). By performing a flux limiting of the time averaged numerical flux, the decomposition is used to obtain an admissibility preserving LWFR scheme. The admissibility preservation framework is further extended to a newly proposed extension of LWFR scheme for conservation laws with source terms. This is the first extension of the high order LW scheme that can handle source terms. The admissibility and accuracy are verified by numerical experiments on the Ten Moment equations of Livermore et al.

Copulas · 分解的 · MoDELS · Performer · 潛在 ·

2024 年 2 月 1 日

Factor copula models for non-Gaussian longitudinal data

Subhajit Chattopadhyay,Sumitra Purkayastha

from arxiv, 25 pages, 2 figures and 14 tables

This article presents factor copula approaches to model temporal dependency of non- Gaussian (continuous/discrete) longitudinal data. Factor copula models are canonical vine copulas which explain the underlying dependence structure of a multivariate data through latent variables, and therefore can be easily interpreted and implemented to unbalanced longitudinal data. We develop regression models for continuous, binary and ordinal longitudinal data including covariates, by using factor copula constructions with subject-specific latent variables. Considering homogeneous within-subject dependence, our proposed models allow for feasible parametric inference in moderate to high dimensional situations, using two-stage (IFM) estimation method. We assess the finite sample performance of the proposed models with extensive simulation studies. In the empirical analysis, the proposed models are applied for analysing different longitudinal responses of two real world data sets. Moreover, we compare the performances of these models with some widely used random effect models using standard model selection techniques and find substantial improvements. Our studies suggest that factor copula models can be good alternatives to random effect models and can provide better insights to temporal dependency of longitudinal data of arbitrary nature.

有向 · 級聯 · MoDELS · CASE · CASES ·

2024 年 2 月 1 日

Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases

Giulio Zhou,Tsz Kin Lam,Alexandra Birch,Barry Haddow

from arxiv, Accepted at Findings of EACL 2024

Speech-to-Text Translation (S2TT) has typically been addressed with cascade systems, where speech recognition systems generate a transcription that is subsequently passed to a translation model. While there has been a growing interest in developing direct speech translation systems to avoid propagating errors and losing non-verbal content, prior work in direct S2TT has struggled to conclusively establish the advantages of integrating the acoustic signal directly into the translation process. This work proposes using contrastive evaluation to quantitatively measure the ability of direct S2TT systems to disambiguate utterances where prosody plays a crucial role. Specifically, we evaluated Korean-English translation systems on a test set containing wh-phrases, for which prosodic features are necessary to produce translations with the correct intent, whether it's a statement, a yes/no question, a wh-question, and more. Our results clearly demonstrate the value of direct translation systems over cascade translation models, with a notable 12.9% improvement in overall accuracy in ambiguous cases, along with up to a 15.6% increase in F1 scores for one of the major intent categories. To the best of our knowledge, this work stands as the first to provide quantitative evidence that direct S2TT models can effectively leverage prosody. The code for our evaluation is openly accessible and freely available for review and utilisation.

Integration · 樣本 · 樣本復雜度 · IPM · 統計量 ·

2024 年 2 月 1 日

Hierarchical Integral Probability Metrics: A distance on random probability measures with low sample complexity

Marta Catalano,Hugo Lavenant

Random probabilities are a key component to many nonparametric methods in Statistics and Machine Learning. To quantify comparisons between different laws of random probabilities several works are starting to use the elegant Wasserstein over Wasserstein distance. In this paper we prove that the infinite-dimensionality of the space of probabilities drastically deteriorates its sample complexity, which is slower than any polynomial rate in the sample size. We thus propose a new distance that preserves many desirable properties of the former while achieving a parametric rate of convergence. In particular, our distance 1) metrizes weak convergence; 2) can be estimated numerically through samples with low complexity; 3) can be bounded analytically from above and below. The main ingredient are integral probability metrics, which lead to the name hierarchical IPM.

INTERACT · 自下而上 · Chatbot · contrastive · 情景 ·

2024 年 2 月 1 日

The whack-a-mole governance challenge for AI-enabled synthetic biology: literature review and emerging frameworks

Trond Arne Undheim

AI-enabled synthetic biology has tremendous potential but also significantly increases biorisks and brings about a new set of dual use concerns. The picture is complicated given the vast innovations envisioned to emerge by combining emerging technologies, as AI-enabled synthetic biology potentially scales up bioengineering into industrial biomanufacturing. However, the literature review indicates that goals such as maintaining a reasonable scope for innovation, or more ambitiously to foster a huge bioeconomy don't necessarily contrast with biosafety, but need to go hand in hand. This paper presents a literature review of the issues and describes emerging frameworks for policy and practice that transverse the options of command-and control, stewardship, bottom-up, and laissez-faire governance. How to achieve early warning systems that enable prevention and mitigation of future AI-enabled biohazards from the lab, from deliberate misuse, or from the public realm, will constantly need to evolve, and adaptive, interactive approaches should emerge. Although biorisk is subject to an established governance regime, and scientists generally adhere to biosafety protocols, even experimental, but legitimate use by scientists could lead to unexpected developments. Recent advances in chatbots enabled by generative AI have revived fears that advanced biological insight can more easily get into the hands of malignant individuals or organizations. Given these sets of issues, society needs to rethink how AI-enabled synthetic biology should be governed. The suggested way to visualize the challenge at hand is whack-a-mole governance, although the emerging solutions are perhaps not so different either.

Networking · Neural Networks · 通用動力公司 · Analysis · Extensibility ·

2024 年 2 月 1 日

GD doesn't make the cut: Three ways that non-differentiability affects neural network training

Siddharth Krishna Kumar

This paper investigates the distinctions between gradient methods applied to non-differentiable functions (NGDMs) and classical gradient descents (GDs) designed for differentiable functions. First, we demonstrate significant differences in the convergence properties of NGDMs compared to GDs, challenging the applicability of the extensive neural network convergence literature based on $L-smoothness$ to non-smooth neural networks. Next, we demonstrate the paradoxical nature of NGDM solutions for $L_{1}$-regularized problems, showing that increasing the regularization penalty leads to an increase in the $L_{1}$ norm of optimal solutions in NGDMs. Consequently, we show that widely adopted $L_{1}$ penalization-based techniques for network pruning do not yield expected results. Finally, we explore the Edge of Stability phenomenon, indicating its inapplicability even to Lipschitz continuous convex differentiable functions, leaving its relevance to non-convex non-differentiable neural networks inconclusive. Our analysis exposes misguided interpretations of NGDMs in widely referenced papers and texts due to an overreliance on strong smoothness assumptions, emphasizing the necessity for a nuanced understanding of foundational assumptions in the analysis of these systems.

圖 · Networking · Neural Networks · 圖形處理器 · 可約的 ·

2024 年 1 月 31 日

GrainGNN: A dynamic graph neural network for predicting 3D grain microstructure

Yigong Qin,Stephen DeWitt,Balasubramaniam Radhakrishnan,George Biros

We propose GrainGNN, a surrogate model for the evolution of polycrystalline grain structure under rapid solidification conditions in metal additive manufacturing. High fidelity simulations of solidification microstructures are typically performed using multicomponent partial differential equations (PDEs) with moving interfaces. The inherent randomness of the PDE initial conditions (grain seeds) necessitates ensemble simulations to predict microstructure statistics, e.g., grain size, aspect ratio, and crystallographic orientation. Currently such ensemble simulations are prohibitively expensive and surrogates are necessary. In GrainGNN, we use a dynamic graph to represent interface motion and topological changes due to grain coarsening. We use a reduced representation of the microstructure using hand-crafted features; we combine pattern finding and altering graph algorithms with two neural networks, a classifier (for topological changes) and a regressor (for interface motion). Both networks have an encoder-decoder architecture; the encoder has a multi-layer transformer long-short-term-memory architecture; the decoder is a single layer perceptron. We evaluate GrainGNN by comparing it to high-fidelity phase field simulations for in-distribution and out-of-distribution grain configurations for solidification under laser power bed fusion conditions. GrainGNN results in 80\%--90\% pointwise accuracy; and nearly identical distributions of scalar quantities of interest (QoI) between phase field and GrainGNN simulations compared using Kolmogorov-Smirnov test. GrainGNN's inference speedup (PyTorch on single x86 CPU) over a high-fidelity phase field simulation (CUDA on a single NVIDIA A100 GPU) is 150$\times$--2000$\times$ for 100-initial grain problem. Further, using GrainGNN, we model the formation of 11,600 grains in 220 seconds on a single CPU core.

Continuity · state-of-the-art · 學成 · Extensibility · Networking ·

2021 年 4 月 16 日

A continual learning survey: Defying forgetting in classification tasks

Matthias De Lange,Rahaf Aljundi,Marc Masana,Sarah Parisot,Xu Jia,Ales Leonardis,Gregory Slabaugh,Tinne Tuytelaars

from arxiv, Accepted TPAMI paper, including Appendix, code publicly available

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.

過擬合 · SimPLe · Principle · 模型評估 · 統計量 ·

2021 年 3 月 16 日

Deep learning: a statistical viewpoint

Peter L. Bartlett,Andrea Montanari,Alexander Rakhlin

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.