好男人在线观看免费2019,精品自在线观看影片天天看

Deep Neural Networks often inherit spurious correlations embedded in training data and hence may fail to generalize to unseen domains, which have different distributions from the domain to provide training data. M. Arjovsky et al. (2019) introduced the concept out-of-distribution (o.o.d.) risk, which is the maximum risk among all domains, and formulated the issue caused by spurious correlations as a minimization problem of the o.o.d. risk. Invariant Risk Minimization (IRM) is considered to be a promising approach to minimize the o.o.d. risk: IRM estimates a minimum of the o.o.d. risk by solving a bi-level optimization problem. While IRM has attracted considerable attention with empirical success, it comes with few theoretical guarantees. Especially, a solid theoretical guarantee that the bi-level optimization problem gives the minimum of the o.o.d. risk has not yet been established. Aiming at providing a theoretical justification for IRM, this paper rigorously proves that a solution to the bi-level optimization problem minimizes the o.o.d. risk under certain conditions. The result also provides sufficient conditions on distributions providing training data and on a dimension of feature space for the bi-leveled optimization problem to minimize the o.o.d. risk.

相關內容

優化器

關注 4

語言模型化 · Extensibility · MoDELS · 有向 · INFORMS ·

2023 年 9 月 13 日

A Comprehensive Overview of Large Language Models

Humza Naveed,Asad Ullah Khan,Shi Qiu,Muhammad Saqib,Saeed Anwar,Muhammad Usman,Naveed Akhtar,Nick Barnes,Ajmal Mian

from arxiv, Work in-progress

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations of the underlying neural networks, context length improvements, model alignment, training datasets, benchmarking, efficiency and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides that overview to the research community. It not only focuses on a systematic treatment of the existing literature on a broad range of LLM related concept, but also pays special attention to providing comprehensive summaries with extensive details about the individual existing models, datasets and major insights. We also pay heed to aligning our overview with the emerging outlook of this research direction by accounting for the other recently materializing reviews of the broader research direction of LLMs. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of this research direction. This review article is intended to not only provide a systematic survey, but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research direction.

可約的 · 優化器 · 計算成本 · FAST · 代價 ·

2023 年 9 月 12 日

Fast Simulation of High-Depth QAOA Circuits

Danylo Lykov,Ruslan Shaydulin,Yue Sun,Yuri Alexeev,Marco Pistoia

from arxiv, Additional references added in v2

Until high-fidelity quantum computers with a large number of qubits become widely available, classical simulation remains a vital tool for algorithm design, tuning, and validation. We present a simulator for the Quantum Approximate Optimization Algorithm (QAOA). Our simulator is designed with the goal of reducing the computational cost of QAOA parameter optimization and supports both CPU and GPU execution. Our central observation is that the computational cost of both simulating the QAOA state and computing the QAOA objective to be optimized can be reduced by precomputing the diagonal Hamiltonian encoding the problem. We reduce the time for a typical QAOA parameter optimization by eleven times for $n = 26$ qubits compared to a state-of-the-art GPU quantum circuit simulator based on cuQuantum. Our simulator is available on GitHub: //github.com/jpmorganchase/QOKit

MoDELS · 可辨認的 · Analysis · Machine Learning · Learning ·

2023 年 9 月 12 日

Modeling Supply and Demand in Public Transportation Systems

Miranda Bihler,Hala Nelson,Erin Okey,Noe Reyes Rivas,John Webb,Anna White

from arxiv, 28 pages, 2022 REU project at James Madison University

The Harrisonburg Department of Public Transportation (HDPT) aims to leverage their data to improve the efficiency and effectiveness of their operations. We construct two supply and demand models that help the department identify gaps in their service. The models take many variables into account, including the way that the HDPT reports to the federal government and the areas with the most vulnerable populations in Harrisonburg City. We employ data analysis and machine learning techniques to make our predictions.

MoDELS · Analysis · 穩健性 ·

2023 年 9 月 12 日

A Survey of Local Differential Privacy and Its Variants

Likun Qin,Nan Wang,Tianshuo Qiu

The introduction and advancements in Local Differential Privacy (LDP) variants have become a cornerstone in addressing the privacy concerns associated with the vast data produced by smart devices, which forms the foundation for data-driven decision-making in crowdsensing. While harnessing the power of these immense data sets can offer valuable insights, it simultaneously poses significant privacy risks for the users involved. LDP, a distinguished privacy model with a decentralized architecture, stands out for its capability to offer robust privacy assurances for individual users during data collection and analysis. The essence of LDP is its method of locally perturbing each user's data on the client-side before transmission to the server-side, safeguarding against potential privacy breaches at both ends. This article offers an in-depth exploration of LDP, emphasizing its models, its myriad variants, and the foundational structure of LDP algorithms.

估計/估計量 · 混合時間 · Markov · 馬爾可夫鏈 · 混合 ·

2023 年 9 月 12 日

Empirical and Instance-Dependent Estimation of Markov Chain and Mixing Time

Geoffrey Wolfer

We address the problem of estimating the mixing time of a Markov chain from a single trajectory of observations. Unlike most previous works which employed Hilbert space methods to estimate spectral gaps, we opt for an approach based on contraction with respect to total variation. Specifically, we estimate the contraction coefficient introduced in Wolfer [2020], inspired from Dobrushin's. This quantity, unlike the spectral gap, controls the mixing time up to strong universal constants and remains applicable to non-reversible chains. We improve existing fully data-dependent confidence intervals around this contraction coefficient, which are both easier to compute and thinner than spectral counterparts. Furthermore, we introduce a novel analysis beyond the worst-case scenario by leveraging additional information about the transition matrix. This allows us to derive instance-dependent rates for estimating the matrix with respect to the induced uniform norm, and some of its mixing properties.

Performer · 長短期記憶網絡 · 線性的 · 學成 · INTERACT ·

2023 年 9 月 10 日

A Unifying Framework of Bilinear LSTMs

Mohit Rajpal,Bryan Kian Hsiang Low

from arxiv, paper abandoned and will never be submitted for peer review

This paper presents a novel unifying framework of bilinear LSTMs that can represent and utilize the nonlinear interaction of the input features present in sequence datasets for achieving superior performance over a linear LSTM and yet not incur more parameters to be learned. To realize this, our unifying framework allows the expressivity of the linear vs. bilinear terms to be balanced by correspondingly trading off between the hidden state vector size vs. approximation quality of the weight matrix in the bilinear term so as to optimize the performance of our bilinear LSTM, while not incurring more parameters to be learned. We empirically evaluate the performance of our bilinear LSTM in several language-based sequence learning tasks to demonstrate its general applicability.

知識 (knowledge) · 語言模型化 · MoDELS · NLU · Learning ·

2022 年 11 月 17 日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Linmei Hu,Zeyi Liu,Ziwang Zhao,Lei Hou,Liqiang Nie,Juanzi Li

Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.

Learning · 泛化理論 · 概率近似正確 · 泛化誤差 · 少試學習 ·

2022 年 7 月 29 日

A Survey of Learning on Small Data

Xiaofeng Cao,Weixin Bu,Shengjun Huang,Yingpeng Tang,Yaming Guo,Yi Chang,Ivor W. Tsang

Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.

圖卷積神經網絡/圖卷積網絡 · 圖卷積 · Networking · 圖 · 卷積 ·

2021 年 9 月 8 日

Interpretable and Efficient Heterogeneous Graph Convolutional Network

Yaming Yang,Ziyu Guan,Jianxin Li,Wei Zhao,Jiangtao Cui,Quan Wang

from arxiv, This paper has been accepted by TKDE 2021

Graph Convolutional Network (GCN) has achieved extraordinary success in learning effective task-specific representations of nodes in graphs. However, regarding Heterogeneous Information Network (HIN), existing HIN-oriented GCN methods still suffer from two deficiencies: (1) they cannot flexibly explore all possible meta-paths and extract the most useful ones for a target object, which hinders both effectiveness and interpretability; (2) they often need to generate intermediate meta-path based dense graphs, which leads to high computational complexity. To address the above issues, we propose an interpretable and efficient Heterogeneous Graph Convolutional Network (ie-HGCN) to learn the representations of objects in HINs. It is designed as a hierarchical aggregation architecture, i.e., object-level aggregation first, followed by type-level aggregation. The novel architecture can automatically extract useful meta-paths for each object from all possible meta-paths (within a length limit), which brings good model interpretability. It can also reduce the computational cost by avoiding intermediate HIN transformation and neighborhood attention. We provide theoretical analysis about the proposed ie-HGCN in terms of evaluating the usefulness of all possible meta-paths, its connection to the spectral graph convolution on HINs, and its quasi-linear time complexity. Extensive experiments on three real network datasets demonstrate the superiority of ie-HGCN over the state-of-the-art methods.

卷積神經網絡 · Neural Networks · Performer · Seven · Processing（編程語言） ·

2019 年 1 月 17 日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Asifullah Khan,Anabia Sohail,Umme Zahoora,Aqsa Saeed Qureshi

from arxiv, Number of Pages: 60 Number of Figures: 11 Number of Tables:1

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.