亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<dir id='3ckf2'><del id='NCkWE'><del id='ir8AW'></del><pre id='CgqRC'><pre id='TdPrc'><option id='V94x4'><address id='3GNs4'></address><bdo id='9hVH5'><tr id='i3uaz'><acronym id='CGaru'><pre id='Kf4Ls'></pre></acronym><div id='Zx3cf'></div></tr></bdo></option></pre><small id='OJX4M'><address id='MNPHX'><u id='uvTXu'><legend id='2SP3K'><option id='p8Lcs'><abbr id='WNc0d'></abbr><li id='LoSBp'><pre id='JeByW'></pre></li></option></legend><select id='Y1DDa'></select></u></address></small></pre></del><sup id='pwfqB'></sup><blockquote id='gRaB1'><dt id='Q8UKZ'></dt></blockquote><blockquote id='9LQDm'></blockquote></dir><tt id='gMrIV'></tt><u id='WqqfW'><tt id='yEhcd'><form id='AIVI8'></form></tt><td id='4nrLI'><dt id='Lu2IY'></dt></td></u>

<code id='4nenC'><i id='7CFls'><q id='C8NRY'><legend id='aIzLN'><pre id='nZM6Z'><style id='q3HVJ'><acronym id='j2yl3'><i id='K83Yz'><form id='4zMKy'><option id='ErgEA'><center id='gmsPp'></center></option></form></i></acronym></style><tt id='dKjxm'></tt></pre></legend></q></i></code><center id='e5TLj'></center>

<dd id='BEfkK'></dd>

<style id='ArSPn'></style><sub id='nddHA'><dfn id='GQN5N'><abbr id='xU9GK'><big id='KrD07'><bdo id='QTC19'></bdo></big></abbr></dfn></sub>_{<dir id='DKnIL'></dir>}

·

Better · SGD · Neural Networks · Networking · Networks ·

2023 年 10 月 11 日

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Zixiang Chen,Junkai Zhang,Yiwen Kou,Xiangning Chen,Cho-Jui Hsieh,Quanquan Gu

from arxiv, 52 pages, 4 figures, 2 tables. In NeurIPS 2023

The challenge of overfitting, in which the model memorizes the training data and fails to generalize to test data, has become increasingly significant in the training of large neural networks. To tackle this challenge, Sharpness-Aware Minimization (SAM) has emerged as a promising training method, which can improve the generalization of neural networks even in the presence of label noise. However, a deep understanding of how SAM works, especially in the setting of nonlinear neural networks and classification tasks, remains largely missing. This paper fills this gap by demonstrating why SAM generalizes better than Stochastic Gradient Descent (SGD) for a certain data model and two-layer convolutional ReLU networks. The loss landscape of our studied problem is nonsmooth, thus current explanations for the success of SAM based on the Hessian information are insufficient. Our result explains the benefits of SAM, particularly its ability to prevent noise learning in the early stages, thereby facilitating more effective learning of features. Experiments on both synthetic and real data corroborate our theory.

相關內容

Better

泛函 · MoDELS · 估計/估計量 · 極大 · Learning ·

2023 年 11 月 27 日

Should We Learn Most Likely Functions or Parameters?

Shikai Qiu,Tim G. J. Rudner,Sanyam Kapoor,Andrew Gordon Wilson

from arxiv, NeurIPS 2023. Code available at //github.com/activatedgeek/function-space-map

Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting.

Continuity · 動態采樣 · 樣本 · Learning · DSS ·

2023 年 11 月 27 日

Continual Test-time Domain Adaptation via Dynamic Sample Selection

Yanshuo Wang,Jie Hong,Ali Cheraghian,Shafin Rahman,David Ahmedt-Aristizabal,Lars Petersson,Mehrtash Harandi

from arxiv, 2024 IEEE/CVF Winter Conference on Applications of Computer Vision

The objective of Continual Test-time Domain Adaptation (CTDA) is to gradually adapt a pre-trained model to a sequence of target domains without accessing the source data. This paper proposes a Dynamic Sample Selection (DSS) method for CTDA. DSS consists of dynamic thresholding, positive learning, and negative learning processes. Traditionally, models learn from unlabeled unknown environment data and equally rely on all samples' pseudo-labels to update their parameters through self-training. However, noisy predictions exist in these pseudo-labels, so all samples are not equally trustworthy. Therefore, in our method, a dynamic thresholding module is first designed to select suspected low-quality from high-quality samples. The selected low-quality samples are more likely to be wrongly predicted. Therefore, we apply joint positive and negative learning on both high- and low-quality samples to reduce the risk of using wrong information. We conduct extensive experiments that demonstrate the effectiveness of our proposed method for CTDA in the image domain, outperforming the state-of-the-art results. Furthermore, our approach is also evaluated in the 3D point cloud domain, showcasing its versatility and potential for broader applicability.

Automator · MoDELS · Networking · Learning · Machine Learning ·

2023 年 11 月 27 日

Machine Learning-Based Jamun Leaf Disease Detection: A Comprehensive Review

Auvick Chandra Bhowmik,Dr. Md. Taimur Ahad,Yousuf Rayhan Emon

Jamun leaf diseases pose a significant threat to agricultural productivity, negatively impacting both yield and quality in the jamun industry. The advent of machine learning has opened up new avenues for tackling these diseases effectively. Early detection and diagnosis are essential for successful crop management. While no automated systems have yet been developed specifically for jamun leaf disease detection, various automated systems have been implemented for similar types of disease detection using image processing techniques. This paper presents a comprehensive review of machine learning methodologies employed for diagnosing plant leaf diseases through image classification, which can be adapted for jamun leaf disease detection. It meticulously assesses the strengths and limitations of various Vision Transformer models, including Transfer learning model and vision transformer (TLMViT), SLViT, SE-ViT, IterationViT, Tiny-LeViT, IEM-ViT, GreenViT, and PMViT. Additionally, the paper reviews models such as Dense Convolutional Network (DenseNet), Residual Neural Network (ResNet)-50V2, EfficientNet, Ensemble model, Convolutional Neural Network (CNN), and Locally Reversible Transformer. These machine-learning models have been evaluated on various datasets, demonstrating their real-world applicability. This review not only sheds light on current advancements in the field but also provides valuable insights for future research directions in machine learning-based jamun leaf disease detection and classification.

MoDELS · Performer · GPT-4V · 具身智能 · 模型性能 ·

2023 年 11 月 27 日

Can Vision-Language Models Think from a First-Person Perspective?

Sijie Cheng,Zhicheng Guo,Jingwen Wu,Kechen Fang,Peng Li,Huaping Liu,Yang Liu

Vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for advancing autonomous agents and robotics, remains largely unexplored. To bridge this research gap, we introduce EgoThink, a novel visual question-answering benchmark that encompasses six core capabilities with twelve detailed dimensions. The benchmark is constructed using selected clips from egocentric videos, with manually annotated question-answer pairs containing first-person information. To comprehensively assess VLMs, we evaluate eighteen popular VLMs on EgoThink. Moreover, given the open-ended format of the answers, we use GPT-4 as the automatic judge to compute single-answer grading. Experimental results indicate that although GPT-4V leads in numerous dimensions, all evaluated VLMs still possess considerable potential for improvement in first-person perspective tasks. Meanwhile, enlarging the number of trainable parameters has the most significant impact on model performance on EgoThink. In conclusion, EgoThink serves as a valuable addition to existing evaluation benchmarks for VLMs, providing an indispensable resource for future research in the realm of embodied artificial intelligence and robotics.

圖 · 圖形處理器 · MoDELS · Networking · Performer ·

2023 年 11 月 26 日

Will More Expressive Graph Neural Networks do Better on Generative Tasks?

Xiandong Zou,Xiangyu Zhao,Pietro Liò,Yiren Zhao

Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs on six different molecular generative objectives on the ZINC-250k dataset in two different generative frameworks: autoregressive generation models, such as GCPN and GraphAF, and one-shot generation models, such as GraphEBM. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN, GraphAF, and GraphEBM on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design.

白盒 · 變換 · Learning · 稀疏 · 表示 ·

2023 年 11 月 24 日

White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?

Yaodong Yu,Sam Buchanan,Druv Pai,Tianzhe Chu,Ziyang Wu,Shengbang Tong,Hao Bai,Yuexiang Zhai,Benjamin D. Haeffele,Yi Ma

from arxiv, This paper integrates the works arXiv:2306.01129 and arXiv:2308.16271 into a complete story. In this paper, we improve the writing and organization, and also add conceptual, empirical, and theoretical improvements over the previous work. V2: small typo fixes and formatting improvements

In this paper, we contend that a natural objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a low-dimensional Gaussian mixture supported on incoherent subspaces. The goodness of such a representation can be evaluated by a principled measure, called sparse rate reduction, that simultaneously maximizes the intrinsic information gain and extrinsic sparsity of the learned representation. From this perspective, popular deep network architectures, including transformers, can be viewed as realizing iterative schemes to optimize this measure. Particularly, we derive a transformer block from alternating optimization on parts of this objective: the multi-head self-attention operator compresses the representation by implementing an approximate gradient descent step on the coding rate of the features, and the subsequent multi-layer perceptron sparsifies the features. This leads to a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable. We show, by way of a novel connection between denoising and compression, that the inverse to the aforementioned compressive encoding can be realized by the same class of CRATE architectures. Thus, the so-derived white-box architectures are universal to both encoders and decoders. Experiments show that these networks, despite their simplicity, indeed learn to compress and sparsify representations of large-scale real-world image and text datasets, and achieve performance very close to highly engineered transformer-based models: ViT, MAE, DINO, BERT, and GPT2. We believe the proposed computational framework demonstrates great potential in bridging the gap between theory and practice of deep learning, from a unified perspective of data compression. Code is available at: //ma-lab-berkeley.github.io/CRATE .

真實值 · 可辨認的 · 數據集 · HTTPS · 計算學習理論 ·

2021 年 12 月 15 日

Do Feature Attribution Methods Correctly Attribute Features?

Yilun Zhou,Serena Booth,Marco Tulio Ribeiro,Julie Shah

from arxiv, AAAI 2022. Video summary at //www.youtube.com/watch?v=kAodFw6jvvo

Feature attribution methods are popular in interpretable machine learning. These methods compute the attribution of each input feature to represent its importance, but there is no consensus on the definition of "attribution", leading to many competing methods with little systematic evaluation, complicated in particular by the lack of ground truth attribution. To address this, we propose a dataset modification procedure to induce such ground truth. Using this procedure, we evaluate three common methods: saliency maps, rationales, and attentions. We identify several deficiencies and add new perspectives to the growing body of evidence questioning the correctness and reliability of these methods applied on datasets in the wild. We further discuss possible avenues for remedy and recommend new attribution methods to be tested against ground truth before deployment. The code is available at \url{//github.com/YilunZhou/feature-attribution-evaluation}.

全局極小值 · 優化器 · 極小值 · 非凸 · 近似 ·

2021 年 3 月 24 日

Why Do Local Methods Solve Nonconvex Problems?

from arxiv, This is the Chapter 21 of the book "Beyond the Worst-Case Analysis of Algorithms"

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.

圖 · Neural Networks · state-of-the-art · SimPLe · 向量化 ·

2018 年 10 月 1 日

How Powerful are Graph Neural Networks?

Keyulu Xu,Weihua Hu,Jure Leskovec,Stefanie Jegelka

Graph Neural Networks (GNNs) for representation learning of graphs broadly follow a neighborhood aggregation framework, where the representation vector of a node is computed by recursively aggregating and transforming feature vectors of its neighboring nodes. Many GNN variants have been proposed and have achieved state-of-the-art results on both node and graph classification tasks. However, despite GNNs revolutionizing graph representation learning, there is limited understanding of their representational properties and limitations. Here, we present a theoretical framework for analyzing the expressive power of GNNs in capturing different graph structures. Our results characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures. We then develop a simple architecture that is provably the most expressive among the class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism test. We empirically validate our theoretical findings on a number of graph classification benchmarks, and demonstrate that our model achieves state-of-the-art performance.

長短期記憶網絡 · 命名實體識別 · MoDELS · Better · 門控 ·

2018 年 5 月 15 日

Chinese NER Using Lattice LSTM

Yue Zhang,Jie Yang

from arxiv, Accepted at ACL 2018 as Long paper

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

Neural Networks

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='mcy5g'><strong id='mcy5g'></strong><small id='mcy5g'></small><button id='mcy5g'></button><li id='mcy5g'><noscript id='mcy5g'><big id='mcy5g'></big><dt id='mcy5g'></dt></noscript></li></tr><ol id='mcy5g'><option id='mcy5g'><table id='mcy5g'><blockquote id='mcy5g'><tbody id='mcy5g'></tbody></blockquote></table></option></ol><u id='mcy5g'></u><kbd id='mcy5g'><kbd id='mcy5g'></kbd></kbd>

<code id='mcy5g'><strong id='mcy5g'></strong></code>

<fieldset id='mcy5g'></fieldset>

<span id='mcy5g'></span>

<ins id='mcy5g'></ins>

<acronym id='mcy5g'><em id='mcy5g'></em><td id='mcy5g'><div id='mcy5g'></div></td></acronym><address id='mcy5g'><big id='mcy5g'><big id='mcy5g'></big><legend id='mcy5g'></legend></big></address>

<i id='mcy5g'><div id='mcy5g'><ins id='mcy5g'></ins></div></i>

<i id='mcy5g'></i>