亚洲精品无码黄色网站在线观看_亚洲AV无码一区二区三区久久_色偷偷激情日本亚洲一区二区_欧美日韩一区二区精品在线观看_黄色片在线免费观看视频_国产精品国产三级传区网站_无码AA片在线视频

Instruction tuning (IT) is widely used to teach pretrained large language models (LLMs) to follow arbitrary instructions, but is under-studied in multilingual settings. In this work, we conduct a systematic study of zero-shot cross-lingual transfer in IT, when an LLM is instruction-tuned on English-only data and then tested on user prompts in other languages. We advocate for the importance of evaluating various aspects of model responses in multilingual instruction following and investigate the influence of different model configuration choices. We find that cross-lingual transfer does happen successfully in IT even if all stages of model training are English-centric, but only if multiliguality is taken into account in hyperparameter tuning and with large enough IT data. English-trained LLMs are capable of generating correct-language, comprehensive and helpful responses in other languages, but suffer from low factuality and may occasionally have fluency errors.

相關內容

tuning

關注 2

Learning · 泛化理論 · MoDELS · 變換 · 線性的 ·

2024 年 6 月 4 日

Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Tianyu He,Darshil Doshi,Aritra Das,Andrey Gromov

from arxiv, 21 pages, 19 figures

Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions $z = a \, x + b \, y \;\mathrm{mod}\; p$ labeled by the vector $(a, b) \in \mathbb{Z}_p^2$. We use some of these tasks for pre-training and the rest for out-of-distribution testing. We empirically show that a GPT-style transformer exhibits a transition from in-distribution to out-of-distribution generalization as the number of pre-training tasks increases. We find that the smallest model capable of out-of-distribution generalization requires two transformer blocks, while for deeper models, the out-of-distribution generalization phase is \emph{transient}, necessitating early stopping. Finally, we perform an interpretability study of the pre-trained models, revealing the highly structured representations in both phases; and discuss the learnt algorithm.

離散化 · MASS · MoDELS · 數值分析 ·

2024 年 6 月 4 日

Structure-preserving semi-convex-splitting numerical scheme for a Cahn-Hilliard cross-diffusion system in lymphangiogenesis

Ansgar Jüngel,Boyi Wang

A fully discrete semi-convex-splitting finite-element scheme with stabilization for a Cahn-Hilliard cross-diffusion system is analyzed. The system consists of parabolic fourth-order equations for the volume fraction of the fiber phase and solute concentration, modeling pre-patterning of lymphatic vessel morphology. The existence of discrete solutions is proved, and it is shown that the numerical scheme is energy stable up to stabilization, conserves the solute mass, and preserves the lower and upper bounds of the fiber phase fraction. Numerical experiments in two space dimensions using FreeFEM illustrate the phase segregation and pattern formation.

圖 · Attention · MoDELS · state-of-the-art · SimPLe ·

2024 年 6 月 4 日

GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon

Sanhita Pathak,Vinay Kaushik,Brejesh Lall

from arxiv, 18 pages, 7 Figures and 6 Tables

Virtual try-on, a rapidly evolving field in computer vision, is transforming e-commerce by improving customer experiences through precise garment warping and seamless integration onto the human body. While existing methods such as TPS and flow address the garment warping but overlook the finer contextual details. In this paper, we introduce a novel graph based warping technique which emphasizes the value of context in garment flow. Our graph based warping module generates warped garment as well as a coarse person image, which is utilised by a simple refinement network to give a coarse virtual tryon image. The proposed work exploits latent diffusion model to generate the final tryon, treating garment transfer as an inpainting task. The diffusion model is conditioned with decoupled cross attention based inversion of visual and textual information. We introduce an occlusion aware warping constraint that generates dense warped garment, without any holes and occlusion. Our method, validated on VITON-HD and Dresscode datasets, showcases substantial state-of-the-art qualitative and quantitative results showing considerable improvement in garment warping, texture preservation, and overall realism.

數據集 · MoDELS · 模型評估 · 語言模型化 · Twitter ·

2024 年 6 月 4 日

A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages

Saminu Mohammad Aliyu,Gregory Maksha Wajiga,Muhammad Murtala

from arxiv, 9 pages

The proliferation of online offensive language necessitates the development of effective detection mechanisms, especially in multilingual contexts. This study addresses the challenge by developing and introducing novel datasets for offensive language detection in three major Nigerian languages: Hausa, Yoruba, and Igbo. We collected data from Twitter and manually annotated it to create datasets for each of the three languages, using native speakers. We used pre-trained language models to evaluate their efficacy in detecting offensive language in our datasets. The best-performing model achieved an accuracy of 90\%. To further support research in offensive language detection, we plan to make the dataset and our models publicly available.

Branch · 隱馬爾科夫模型 · Markov · MoDELS · 線性的 ·

2024 年 6 月 3 日

An efficient solution to Hidden Markov Models on trees with coupled branches

Farzan Vafa,Sahand Hormoz

from arxiv, 24 + 6 pages, 5 figures

Hidden Markov Models (HMMs) are powerful tools for modeling sequential data, where the underlying states evolve in a stochastic manner and are only indirectly observable. Traditional HMM approaches are well-established for linear sequences, and have been extended to other structures such as trees. In this paper, we extend the framework of HMMs on trees to address scenarios where the tree-like structure of the data includes coupled branches -- a common feature in biological systems where entities within the same lineage exhibit dependent characteristics. We develop a dynamic programming algorithm that efficiently solves the likelihood, decoding, and parameter learning problems for tree-based HMMs with coupled branches. Our approach scales polynomially with the number of states and nodes, making it computationally feasible for a wide range of applications and does not suffer from the underflow problem. We demonstrate our algorithm by applying it to simulated data and propose self-consistency checks for validating the assumptions of the model used for inference. This work not only advances the theoretical understanding of HMMs on trees but also provides a practical tool for analyzing complex biological data where dependencies between branches cannot be ignored.

掃視 · Less · CASES · 多峰值 · 模型評估 ·

2024 年 6 月 3 日

Evidence for five types of fixation during a random saccade eye tracking task: Implications for the study of oculomotor fatigue

Lee Friedman,Oleg V. Komogortsev

from arxiv, 23 pages, 19 figures

Our interest was to evaluate changes in fixation duration as a function of time-on-task (TOT) during a random saccade task. We employed a large, publicly available dataset. The frequency histogram of fixation durations was multimodal and modelled as a Gaussian mixture. We found five fixation types. The ``ideal'' response would be a single accurate saccade after each target movement, with a typical saccade latency of 200-250 msec, followed by a long fixation (> 800 msec) until the next target jump. We found fixations like this, but they comprised only 10% of all fixations and were the first fixation after target movement only 23.4% of the time. More frequently (57.4% of the time), the first fixation after target movement was short (117.7 msec mean) and was commonly followed by a corrective saccade. Across the entire 100 sec of the task, median total fixation duration decreased. This decrease was approximated with a power law fit with R^2=0.94. A detailed examination of the frequency of each of our five fixation types over time on task (TOT) revealed that the three shortest duration fixation types became more and more frequent with TOT whereas the two longest fixations became less and less frequent. In all cases, the changes over TOT followed power law relationships, with R^2 values between 0.73 and 0.93. We concluded that, over the 100 second duration of our task, long fixations are common in the first 15 to 22 seconds but become less common after that. Short fixations are relatively uncommon in the first 15 to 22 seconds but become more and more common as the task progressed. Apparently. the ability to produce an ideal response, although somewhat likely in the first 22 seconds, rapidly declines. This might be related to a noted decline in saccade accuracy over time.

Integration · 相同 · 相互獨立的 · 樣例 · 確切的 ·

2024 年 6 月 1 日

Shadow Hamiltonians of structure-preserving integrators for Nambu mechanics

Atsushi Horikoshi

from arxiv, 14 pages, 4 figures, final version

Symplectic integrators are widely implemented numerical integrators for Hamiltonian mechanics, which preserve the Hamiltonian structure (symplecticity) of the system. Although the symplectic integrator does not conserve the energy of the system, it is well known that there exists a conserving modified Hamiltonian, called the shadow Hamiltonian. For the Nambu mechanics, which is a kind of generalized Hamiltonian mechanics, we can also construct structure-preserving integrators by the same procedure used to construct the symplectic integrators. In the structure-preserving integrator, however, the existence of shadow Hamiltonians is nontrivial. This is because the Nambu mechanics is driven by multiple Hamiltonians and it is nontrivial whether the time evolution by the integrator can be cast into the Nambu mechanical time evolution driven by multiple shadow Hamiltonians. In this paper we present a general procedure to calculate the shadow Hamiltonians of structure-preserving integrators for Nambu mechanics, and give an example where the shadow Hamiltonians exist. This is the first attempt to determine the concrete forms of the shadow Hamiltonians for a Nambu mechanical system. We show that the fundamental identity, which corresponds to the Jacobi identity in Hamiltonian mechanics, plays an important role in calculating the shadow Hamiltonians using the Baker-Campbell-Hausdorff formula. It turns out that the resulting shadow Hamiltonians have indefinite forms depending on how the fundamental identities are used. This is not a technical artifact, because the exact shadow Hamiltonians obtained independently have the same indefiniteness.

MoDELS · Vision · 推斷 · 控制器 · Learning ·

2024 年 5 月 31 日

Amortizing intractable inference in diffusion models for vision, language, and control

Siddarth Venkatraman,Moksh Jain,Luca Scimeca,Minsu Kim,Marcin Sendera,Mohsin Hasan,Luke Rowe,Sarthak Mittal,Pablo Lemos,Emmanuel Bengio,Alexandre Adam,Jarrid Rector-Brooks,Yoshua Bengio,Glen Berseth,Nikolay Malkin

from arxiv, Code: //github.com/GFNOrg/diffusion-finetuning

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generative model prior $p(\mathbf{x})$ and a black-box constraint or likelihood function $r(\mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning.

Lyapunov · MoDELS · 多樣性 · 變換 · 數據集 ·

2024 年 5 月 31 日

Cyclic image generation using chaotic dynamics

Takaya Tanaka,Yutaka Yamaguti

Successive image generation using cyclic transformations is demonstrated by extending the CycleGAN model to transform images among three different categories. Repeated application of the trained generators produces sequences of images that transition among the different categories. The generated image sequences occupy a more limited region of the image space compared with the original training dataset. Quantitative evaluation using precision and recall metrics indicates that the generated images have high quality but reduced diversity relative to the training dataset. Such successive generation processes are characterized as chaotic dynamics in terms of dynamical system theory. Positive Lyapunov exponents estimated from the generated trajectories confirm the presence of chaotic dynamics, with the Lyapunov dimension of the attractor found to be comparable to the intrinsic dimension of the training data manifold. The results suggest that chaotic dynamics in the image space defined by the deep generative model contribute to the diversity of the generated images, constituting a novel approach for multi-class image generation. This model can be interpreted as an extension of classical associative memory to perform hetero-association among image categories.

知識 (knowledge) · INFORMS · 語言表示 · MoDELS · Extensibility ·

2022 年 7 月 28 日

MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base

Hui Li,Xuekang Yang,Xin Zhao,Lin Yu,Jiping Zheng,Wei Sun

from arxiv, 11 pages, 6 figures

Incorporating prior knowledge into pre-trained language models has proven to be effective for knowledge-driven NLP tasks, such as entity typing and relation extraction. Current pre-training procedures usually inject external knowledge into models by using knowledge masking, knowledge fusion and knowledge replacement. However, factual information contained in the input sentences have not been fully mined, and the external knowledge for injecting have not been strictly checked. As a result, the context information cannot be fully exploited and extra noise will be introduced or the amount of knowledge injected is limited. To address these issues, we propose MLRIP, which modifies the knowledge masking strategies proposed by ERNIE-Baidu, and introduce a two-stage entity replacement strategy. Extensive experiments with comprehensive analyses illustrate the superiority of MLRIP over BERT-based models in military knowledge-driven NLP tasks.