亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the benefits of complex-valued weights for neural networks. We prove that shallow complex neural networks with quadratic activations have no spurious local minima. In contrast, shallow real neural networks with quadratic activations have infinitely many spurious local minima under the same conditions. In addition, we provide specific examples to demonstrate that complex-valued weights turn poor local minima into saddle points.

相關內容

神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)(Neural Networks)是世界上三個(ge)最古老的(de)(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)建模學(xue)會(hui)(hui)的(de)(de)(de)檔案期刊(kan):國際(ji)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)學(xue)會(hui)(hui)(INNS)、歐(ou)洲神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)學(xue)會(hui)(hui)(ENNS)和(he)日本(ben)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)學(xue)會(hui)(hui)(JNNS)。神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)提供(gong)了(le)一個(ge)論(lun)壇,以(yi)發(fa)展和(he)培育一個(ge)國際(ji)社會(hui)(hui)的(de)(de)(de)學(xue)者(zhe)和(he)實踐者(zhe)感(gan)興(xing)趣(qu)的(de)(de)(de)所有(you)(you)方(fang)(fang)面的(de)(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)和(he)相關方(fang)(fang)法(fa)(fa)的(de)(de)(de)計(ji)算(suan)(suan)智能。神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)歡迎高質(zhi)量論(lun)文的(de)(de)(de)提交,有(you)(you)助(zhu)于(yu)全面的(de)(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)研究,從行(xing)為和(he)大腦建模,學(xue)習算(suan)(suan)法(fa)(fa),通過(guo)數學(xue)和(he)計(ji)算(suan)(suan)分(fen)析(xi),系統的(de)(de)(de)工(gong)程(cheng)和(he)技(ji)(ji)(ji)術(shu)應(ying)用(yong),大量使用(yong)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)的(de)(de)(de)概念和(he)技(ji)(ji)(ji)術(shu)。這(zhe)一獨特而廣泛的(de)(de)(de)范圍(wei)促進(jin)了(le)生(sheng)(sheng)物(wu)和(he)技(ji)(ji)(ji)術(shu)研究之(zhi)間的(de)(de)(de)思(si)想交流(liu),并有(you)(you)助(zhu)于(yu)促進(jin)對生(sheng)(sheng)物(wu)啟發(fa)的(de)(de)(de)計(ji)算(suan)(suan)智能感(gan)興(xing)趣(qu)的(de)(de)(de)跨學(xue)科(ke)社區的(de)(de)(de)發(fa)展。因此,神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)絡(luo)編(bian)委會(hui)(hui)代表(biao)的(de)(de)(de)專家領域包括心理學(xue),神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)生(sheng)(sheng)物(wu)學(xue),計(ji)算(suan)(suan)機(ji)科(ke)學(xue),工(gong)程(cheng),數學(xue),物(wu)理。該雜志發(fa)表(biao)文章、信件(jian)和(he)評(ping)論(lun)以(yi)及給編(bian)輯(ji)的(de)(de)(de)信件(jian)、社論(lun)、時事、軟件(jian)調查(cha)和(he)專利信息。文章發(fa)表(biao)在五(wu)個(ge)部分(fen)之(zhi)一:認知科(ke)學(xue),神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)科(ke)學(xue),學(xue)習系統,數學(xue)和(he)計(ji)算(suan)(suan)分(fen)析(xi)、工(gong)程(cheng)和(he)應(ying)用(yong)。 官網(wang)地址:

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network's prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to space-time. Our objective encourages temporally-persistent features in the same video, and in spite of its simplicity, it works surprisingly well across: (i) different unsupervised frameworks, (ii) pre-training datasets, (iii) downstream datasets, and (iv) backbone architectures. We draw a series of intriguing observations from this study, e.g., we discover that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds. In addition to state-of-the-art results in multiple benchmarks, we report a few promising cases in which unsupervised pre-training can outperform its supervised counterpart. Code is made available at //github.com/facebookresearch/SlowFast

Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.

Graph neural networks (GNNs) have emerged as a powerful paradigm for embedding-based entity alignment due to their capability of identifying isomorphic subgraphs. However, in real knowledge graphs (KGs), the counterpart entities usually have non-isomorphic neighborhood structures, which easily causes GNNs to yield different representations for them. To tackle this problem, we propose a new KG alignment network, namely AliNet, aiming at mitigating the non-isomorphism of neighborhood structures in an end-to-end manner. As the direct neighbors of counterpart entities are usually dissimilar due to the schema heterogeneity, AliNet introduces distant neighbors to expand the overlap between their neighborhood structures. It employs an attention mechanism to highlight helpful distant neighbors and reduce noises. Then, it controls the aggregation of both direct and distant neighborhood information using a gating mechanism. We further propose a relation loss to refine entity representations. We perform thorough experiments with detailed ablation studies and analyses on five entity alignment datasets, demonstrating the effectiveness of AliNet.

Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representations, are usually overlooked. In this paper, we propose a novel graph pooling operator, called Hierarchical Graph Pooling with Structure Learning (HGP-SL), which can be integrated into various graph neural network architectures. HGP-SL incorporates graph pooling and structure learning into a unified module to generate hierarchical representations of graphs. More specifically, the graph pooling operation adaptively selects a subset of nodes to form an induced subgraph for the subsequent layers. To preserve the integrity of graph's topological information, we further introduce a structure learning mechanism to learn a refined graph structure for the pooled graph at each layer. By combining HGP-SL operator with graph neural networks, we perform graph level representation learning with focus on graph classification task. Experimental results on six widely used benchmarks demonstrate the effectiveness of our proposed model.

Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs---a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.

Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

Recently, deep learning has achieved very promising results in visual object tracking. Deep neural networks in existing tracking methods require a lot of training data to learn a large number of parameters. However, training data is not sufficient for visual object tracking as annotations of a target object are only available in the first frame of a test sequence. In this paper, we propose to learn hierarchical features for visual object tracking by using tree structure based Recursive Neural Networks (RNN), which have fewer parameters than other deep neural networks, e.g. Convolutional Neural Networks (CNN). First, we learn RNN parameters to discriminate between the target object and background in the first frame of a test sequence. Tree structure over local patches of an exemplar region is randomly generated by using a bottom-up greedy search strategy. Given the learned RNN parameters, we create two dictionaries regarding target regions and corresponding local patches based on the learned hierarchical features from both top and leaf nodes of multiple random trees. In each of the subsequent frames, we conduct sparse dictionary coding on all candidates to select the best candidate as the new target location. In addition, we online update two dictionaries to handle appearance changes of target objects. Experimental results demonstrate that our feature learning algorithm can significantly improve tracking performance on benchmark datasets.

北京阿比特科技有限公司