亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Alongside the continuous process of improving AI performance through the development of more sophisticated models, researchers have also focused their attention to the emerging concept of data-centric AI, which emphasizes the important role of data in a systematic machine learning training process. Nonetheless, the development of models has also continued apace. One result of this progress is the development of the Transformer Architecture, which possesses a high level of capability in multiple domains such as Natural Language Processing (NLP), Computer Vision (CV) and Time Series Forecasting (TSF). Its performance is, however, heavily dependent on input data preprocessing and output data evaluation, justifying a data-centric approach to future research. We argue that data-centric AI is essential for training AI models, particularly for transformer-based TSF models efficiently. However, there is a gap regarding the integration of transformer-based TSF and data-centric AI. This survey aims to pin down this gap via the extensive literature review based on the proposed taxonomy. We review the previous research works from a data-centric AI perspective and we intend to lay the foundation work for the future development of transformer-based architecture and data-centric AI.

相關內容

分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)學(xue)(xue)是(shi)(shi)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)的(de)(de)(de)(de)(de)實踐和科學(xue)(xue)。Wikipedia類(lei)(lei)(lei)(lei)(lei)別(bie)說(shuo)明了一(yi)(yi)(yi)種分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa),可(ke)(ke)(ke)以(yi)(yi)通過自(zi)動方式提取(qu)Wikipedia類(lei)(lei)(lei)(lei)(lei)別(bie)的(de)(de)(de)(de)(de)完整分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)。截至2009年,已經證明,可(ke)(ke)(ke)以(yi)(yi)使用(yong)人工構建的(de)(de)(de)(de)(de)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)(例(li)如像WordNet這(zhe)樣(yang)的(de)(de)(de)(de)(de)計算詞典的(de)(de)(de)(de)(de)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa))來改進和重(zhong)組(zu)(zu)Wikipedia類(lei)(lei)(lei)(lei)(lei)別(bie)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)。 從廣義上(shang)講,分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)還適用(yong)于(yu)(yu)除(chu)父(fu)子(zi)層次結構以(yi)(yi)外的(de)(de)(de)(de)(de)關系方案,例(li)如網絡結構。然后分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)可(ke)(ke)(ke)能(neng)包括有多父(fu)母的(de)(de)(de)(de)(de)單身孩子(zi),例(li)如,“汽(qi)車”可(ke)(ke)(ke)能(neng)與父(fu)母雙方一(yi)(yi)(yi)起出現“車輛”和“鋼結構”;但是(shi)(shi)對某(mou)些人而言,這(zhe)僅意味著“汽(qi)車”是(shi)(shi)幾種不同分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)的(de)(de)(de)(de)(de)一(yi)(yi)(yi)部分(fen)(fen)(fen)(fen)。分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)也可(ke)(ke)(ke)能(neng)只是(shi)(shi)將事物(wu)組(zu)(zu)織(zhi)成(cheng)組(zu)(zu),或者是(shi)(shi)按字母順序排列(lie)的(de)(de)(de)(de)(de)列(lie)表;但是(shi)(shi)在這(zhe)里(li),術(shu)語詞匯更(geng)(geng)合適。在知識管理(li)(li)中的(de)(de)(de)(de)(de)當前用(yong)法(fa)(fa)(fa)(fa)中,分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)被認(ren)為比(bi)本體(ti)(ti)論窄(zhai),因為本體(ti)(ti)論應用(yong)了各種各樣(yang)的(de)(de)(de)(de)(de)關系類(lei)(lei)(lei)(lei)(lei)型(xing)。 在數學(xue)(xue)上(shang),分(fen)(fen)(fen)(fen)層分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(fa)(fa)(fa)(fa)是(shi)(shi)給定對象(xiang)集的(de)(de)(de)(de)(de)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)樹結構。該結構的(de)(de)(de)(de)(de)頂部是(shi)(shi)適用(yong)于(yu)(yu)所(suo)有對象(xiang)的(de)(de)(de)(de)(de)單個分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei),即根(gen)節(jie)點(dian)(dian)。此(ci)根(gen)下的(de)(de)(de)(de)(de)節(jie)點(dian)(dian)是(shi)(shi)更(geng)(geng)具體(ti)(ti)的(de)(de)(de)(de)(de)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei),適用(yong)于(yu)(yu)總(zong)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)對象(xiang)集的(de)(de)(de)(de)(de)子(zi)集。推理(li)(li)的(de)(de)(de)(de)(de)進展從一(yi)(yi)(yi)般(ban)到更(geng)(geng)具體(ti)(ti)。

知識薈萃

精品入門和(he)(he)進階教程、論文和(he)(he)代碼整理等

更多

查(cha)看相關(guan)VIP內容、論文、資(zi)訊等

Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects. Nevertheless, there is still a notable absence of a comprehensive survey. In this paper, we propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications. In addition, we conduct an in-depth analysis of different methods and point out potential research directions in the future. We hope this survey offers the community quick access and a comprehensive overview of this research area, with the intention of inspiring future research endeavors.

The advent of large language models marks a revolutionary breakthrough in artificial intelligence. With the unprecedented scale of training and model parameters, the capability of large language models has been dramatically improved, leading to human-like performances in understanding, language synthesizing, and common-sense reasoning, etc. Such a major leap-forward in general AI capacity will change the pattern of how personalization is conducted. For one thing, it will reform the way of interaction between humans and personalization systems. Instead of being a passive medium of information filtering, large language models present the foundation for active user engagement. On top of such a new foundation, user requests can be proactively explored, and user's required information can be delivered in a natural and explainable way. For another thing, it will also considerably expand the scope of personalization, making it grow from the sole function of collecting personalized information to the compound function of providing personalized services. By leveraging large language models as general-purpose interface, the personalization systems may compile user requests into plans, calls the functions of external tools to execute the plans, and integrate the tools' outputs to complete the end-to-end personalization tasks. Today, large language models are still being developed, whereas the application in personalization is largely unexplored. Therefore, we consider it to be the right time to review the challenges in personalization and the opportunities to address them with LLMs. In particular, we dedicate this perspective paper to the discussion of the following aspects: the development and challenges for the existing personalization system, the newly emerged capabilities of large language models, and the potential ways of making use of large language models for personalization.

Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: //github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.

The rapid development of deep learning has made a great progress in segmentation, one of the fundamental tasks of computer vision. However, the current segmentation algorithms mostly rely on the availability of pixel-level annotations, which are often expensive, tedious, and laborious. To alleviate this burden, the past years have witnessed an increasing attention in building label-efficient, deep-learning-based segmentation algorithms. This paper offers a comprehensive review on label-efficient segmentation methods. To this end, we first develop a taxonomy to organize these methods according to the supervision provided by different types of weak labels (including no supervision, coarse supervision, incomplete supervision and noisy supervision) and supplemented by the types of segmentation problems (including semantic segmentation, instance segmentation and panoptic segmentation). Next, we summarize the existing label-efficient segmentation methods from a unified perspective that discusses an important question: how to bridge the gap between weak supervision and dense prediction -- the current methods are mostly based on heuristic priors, such as cross-pixel similarity, cross-label constraint, cross-view consistency, cross-image relation, etc. Finally, we share our opinions about the future research directions for label-efficient deep segmentation.

Molecular design and synthesis planning are two critical steps in the process of molecular discovery that we propose to formulate as a single shared task of conditional synthetic pathway generation. We report an amortized approach to generate synthetic pathways as a Markov decision process conditioned on a target molecular embedding. This approach allows us to conduct synthesis planning in a bottom-up manner and design synthesizable molecules by decoding from optimized conditional codes, demonstrating the potential to solve both problems of design and synthesis simultaneously. The approach leverages neural networks to probabilistically model the synthetic trees, one reaction step at a time, according to reactivity rules encoded in a discrete action space of reaction templates. We train these networks on hundreds of thousands of artificial pathways generated from a pool of purchasable compounds and a list of expert-curated templates. We validate our method with (a) the recovery of molecules using conditional generation, (b) the identification of synthesizable structural analogs, and (c) the optimization of molecular structures given oracle functions relevant to drug discovery.

A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.

This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.

We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.

Small data challenges have emerged in many learning problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. Many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader review of other emerging topics, from unsupervised and semi-supervised domain adaptation to the fundamental roles of transformation equivariance and invariance in training a wide spectrum of deep networks. It is impossible for us to write an exclusive encyclopedia to include all related works. Instead, we aim at exploring the main ideas, principles and methods in this area to reveal where we are heading on the journey towards addressing the small data challenges in this big data era.

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.

北京阿比特科技有限公司