亚洲黄色网站不卡免费,成人影片免费完整电影,日本免费一区二区三区高清观看,大肥婆老熟女一区二区,国产亚洲免费视频久久

The main limiting factor in the development of robust multilingual dialogue evaluation metrics is the lack of multilingual data and the limited availability of open sourced multilingual dialogue systems. In this work, we propose a workaround for this lack of data by leveraging a strong multilingual pretrained LLM and augmenting existing English dialogue data using Machine Translation. We empirically show that the naive approach of finetuning a pretrained multilingual encoder model with translated data is insufficient to outperform the strong baseline of finetuning a multilingual model with only source data. Instead, the best approach consists in the careful curation of translated data using MT Quality Estimation metrics, excluding low quality translations that hinder its performance.

相關內容

任務對話系統

關注 36

話題 · INFORMS · MoDELS · 相似度 · 原點 ·

2023 年 10 月 18 日

Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling

Hai Yu,Chong Deng,Qinglin Zhang,Jiaqing Liu,Qian Chen,Wen Wang

from arxiv, Accepted by EMNLP 2023. Codes is available at //github.com/alibaba-damo-academy/SpokenNLP/

Topic segmentation is critical for obtaining structured long documents and improving downstream tasks like information retrieval. Due to its ability of automatically exploring clues of topic shift from a large amount of labeled data, recent supervised neural models have greatly promoted the development of long document topic segmentation, but leaving the deeper relationship of semantic coherence and topic segmentation underexplored. Therefore, this paper enhances the supervised model's ability to capture coherence from both structure and similarity perspectives to further improve the topic segmentation performance, including the Topic-aware Sentence Structure Prediction (TSSP) and Contrastive Semantic Similarity Learning (CSSL). Specifically, the TSSP task is proposed to force the model to comprehend structural information by learning the original relations of adjacent sentences in a disarrayed document, which is constructed by jointly disrupting the original document at the topic and sentence levels. In addition, we utilize inter- and intra-topic information to construct contrastive samples and design the CSSL objective to ensure that the sentences representations in the same topic have higher semantic similarity, while those in different topics are less similar. Extensive experiments show that the Longformer with our approach significantly outperforms old state-of-the-art (SOTA) methods. Our approach improves $F_{1}$ of old SOTA by 3.42 (73.74 -> 77.16) and reduces $P_{k}$ by 1.11 points (15.0 -> 13.89) on WIKI-727K and achieves an average reduction of 0.83 points on $P_{k}$ on WikiSection. The average $P_{k}$ drop of 2.82 points on the two out-of-domain datasets also illustrates the robustness of our approach

Learning · 優化器 · ML · 編譯器 · 強化學習 ·

2023 年 10 月 17 日

Optimizing Memory Mapping Using Deep Reinforcement Learning

Pengming Wang,Mikita Sazanovich,Berkin Ilbeyi,Phitchaya Mangpo Phothilimthana,Manish Purohit,Han Yang Tay,Ngan V?,Miaosen Wang,Cosmin Paduraru,Edouard Leurent,Anton Zhernov,Po-Sen Huang,Julian Schrittwieser,Thomas Hubert,Robert Tung,Paula Kurylowicz,Kieran Milan,Oriol Vinyals,Daniel J. Mankowitz

Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, namely the memory mapping problem that occurs during compilation of machine learning programs: That is, mapping tensors to different memory layers to optimize execution time. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. RL is a solution paradigm well-suited for sequential decision making problems that are amenable to planning, and combinatorial search spaces with high-dimensional data inputs. We formulate the problem as a single-player game, which we call the mallocGame, such that high-reward trajectories of the game correspond to efficient memory mappings on the target hardware. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions that lead to faster execution times on real ML workloads on ML accelerators. We compare the performance of mallocMuZero to the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads. In addition, we show that mallocMuZero is capable of improving the execution time of the recently published AlphaTensor matrix multiplication model.

Performer · 操作 · 平滑 · 單純形 · 優化器 ·

2023 年 10 月 17 日

Local Element Operations for Curved Simplex Meshes

Andrew Shi,Per-Olof Persson

from arxiv, final version accepted at IJNME, 21 pages, 18 figures,

Mesh optimization procedures are generally a combination of node smoothing and discrete operations which affect a small number of elements to improve the quality of the overall mesh. These procedures are useful as a post-processing step in mesh generation procedures and in applications such as fluid simulations with severely deforming domains. In order to perform high-order mesh optimization, these ingredients must also be extended to high-order (curved) meshes. In this work, we present a method to perform local element operations on curved meshes. The mesh operations discussed in this work are edge/face swaps, edge collapses, and edge splitting (more generally refinement) for triangular and tetrahedral meshes. These local operations are performed by first identifying the patch of elements which contain the edge/face being acted on, performing the operation as a straight-sided one by placing the high-order nodes via an isoparametric mapping from the master element, and smoothing the high-order nodes on the elements in the patch by minimizing a Jacobian-based high-order mesh distortion measure. Since the initial straight-sided guess from the placement of the nodes via the isoparametric mapping frequently results in invalid elements, the distortion measure must be regularized which allows for mesh untangling for the optimization to succeed. We present several examples in 2D and 3D to demonstrate these local operations and how they can be combined with a high-order node smoothing procedure to maintain mesh quality when faced with severe deformations.

似然 · 近似 · 近似貝葉斯計算 · Extensibility · 隱馬爾科夫模型 ·

2023 年 10 月 16 日

Simulation Based Composite Likelihood

Lorenzo Rimella,Chris Jewell,Paul Fearnhead

Inference for high-dimensional hidden Markov models is challenging due to the exponential-in-dimension computational cost of the forward algorithm. To address this issue, we introduce an innovative composite likelihood approach called "Simulation Based Composite Likelihood" (SimBa-CL). With SimBa-CL, we approximate the likelihood by the product of its marginals, which we estimate using Monte Carlo sampling. In a similar vein to approximate Bayesian computation (ABC), SimBa-CL requires multiple simulations from the model, but, in contrast to ABC, it provides a likelihood approximation that guides the optimization of the parameters. Leveraging automatic differentiation libraries, it is simple to calculate gradients and Hessians to not only speed-up optimization, but also to build approximate confidence sets. We conclude with an extensive experimental section, where we empirically validate our theoretical results, conduct a comparative analysis with SMC, and apply SimBa-CL to real-world Aphtovirus data.

Networking · state-of-the-art · HTTPS · 模型評估 · 似然 ·

2023 年 10 月 16 日

Hierarchically Coherent Multivariate Mixture Networks

Kin G. Olivares,David Luo,Cristian Challu,Stefania La Vattiata,Max Mergenthaler,Artur Dubrawski

Large collections of time series data are often organized into hierarchies with different levels of aggregation; examples include product and geographical groupings. Probabilistic coherent forecasting is tasked to produce forecasts consistent across levels of aggregation. In this study, we propose to augment neural forecasting architectures with a coherent multivariate mixture output. We optimize the networks with a composite likelihood objective, allowing us to capture time series' relationships while maintaining high computational efficiency. Our approach demonstrates 13.2% average accuracy improvements on most datasets compared to state-of-the-art baselines. We conduct ablation studies of the framework components and provide theoretical foundations for them. To assist related work, the code is available at this //github.com/Nixtla/neuralforecast.

Networking · Analysis · MoDELS · Processing（編程語言） · Extensibility ·

2023 年 10 月 16 日

Generalizing Medical Image Representations via Quaternion Wavelet Networks

Luigi Sigillo,Eleonora Grassucci,Aurelio Uncini,Danilo Comminiello

from arxiv, This paper has been submitted to IEEE Transactions on Medical Imaging

Neural network generalizability is becoming a broad research field due to the increasing availability of datasets from different sources and for various tasks. This issue is even wider when processing medical data, where a lack of methodological standards causes large variations being provided by different imaging centers or acquired with various devices and cofactors. To overcome these limitations, we introduce a novel, generalizable, data- and task-agnostic framework able to extract salient features from medical images. The proposed quaternion wavelet network (QUAVE) can be easily integrated with any pre-existing medical image analysis or synthesis task, and it can be involved with real, quaternion, or hypercomplex-valued models, generalizing their adoption to single-channel data. QUAVE first extracts different sub-bands through the quaternion wavelet transform, resulting in both low-frequency/approximation bands and high-frequency/fine-grained features. Then, it weighs the most representative set of sub-bands to be involved as input to any other neural model for image processing, replacing standard data samples. We conduct an extensive experimental evaluation comprising different datasets, diverse image analysis, and synthesis tasks including reconstruction, segmentation, and modality translation. We also evaluate QUAVE in combination with both real and quaternion-valued models. Results demonstrate the effectiveness and the generalizability of the proposed framework that improves network performance while being flexible to be adopted in manifold scenarios.

有向 · 情景 · INFORMS · SimPLe · Better ·

2023 年 10 月 14 日

Direct Sum Theorems From Fortification

Hao Wu

from arxiv, fix some typos; remove a section of agree problem with a gap in the proof pointed by an anonymous reviewer, the author has tried to fix it but could not fix it for now

We revisit the direct sum questions in communication complexity which asks whether the resource needed to solve $n$ communication problems together is (approximately) the sum of resources needed to solve these problems separately. Our work starts with the observation that Dinur and Meir's fortification lemma can be generalized to a general fortification lemma for a sub-additive measure over set. By applying this lemma to the case of cover number, we obtain a dual form of cover number, called "$\delta$-fooling set" which is a generalized fooling set. Any rectangle which contains enough number of elements from a $\delta$-fooling set can not be monochromatic. With this fact, we are able to reprove the classic direct sum theorem of cover number with a simple double counting argument. Formally, let $S \subseteq (A\times B) \times O$ and $T \subseteq (P\times Q) \times Z$ be two communication problems, $ \log \mathsf{Cov}\left(S\times T\right) \geq \log \mathsf{Cov}\left(S\right) + \log\mathsf{Cov}(T) -\log\log|P||Q|-4.$ where $\mathsf{Cov}$ denotes the cover number. One issue of current deterministic direct sum theorems about communication complexity is that they provide no information when $n$ is small, especially when $n=2$. In this work, we prove a new direct sum theorem about protocol size which imply a better direct sum theorem for two functions in terms of protocol size. Formally, let $\mathsf{L}$ denotes complexity of the protocol size of a communication problem, given a communication problem $F:A \times B \rightarrow \{0,1\}$, $ \log\mathsf{L}\left(F\times F\right)\geq \log \mathsf{L}\left(F\right) +\Omega\left(\sqrt{\log\mathsf{L}\left(F\right)}\right)-\log\log|A||B| -4$. All our results are obtained in a similar way using the $\delta$-fooling set to construct a hardcore for the direct sum problem.

Learning · Processing（編程語言） · MoDELS · 分解的 · 表示學習 ·

2022 年 11 月 21 日

Disentangled Representation Learning

Xin Wang,Hong Chen,Si'ao Tang,Zihao Wu,Wenwu Zhu

from arxiv, 22 pages,9 figures

Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, data mining etc. In this article, we comprehensively review DRL from various aspects including motivations, definitions, methodologies, evaluations, applications and model designs. We discuss works on DRL based on two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition. We further categorize the methodologies for DRL into four groups, i.e., Traditional Statistical Approaches, Variational Auto-encoder Based Approaches, Generative Adversarial Networks Based Approaches, Hierarchical Approaches and Other Approaches. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.

長短期記憶網絡 · 命名實體識別 · MoDELS · Better · 門控 ·

2018 年 5 月 15 日

Chinese NER Using Lattice LSTM

Yue Zhang,Jie Yang

from arxiv, Accepted at ACL 2018 as Long paper

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.

BLEU · MoDELS · 注意力機制 · Transformer · Networking ·

2017 年 12 月 6 日

Attention Is All You Need

Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin

from arxiv, 15 pages, 5 figures

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.