亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this article, we study approximation properties of the variation spaces corresponding to shallow neural networks with a variety of activation functions. We introduce two main tools for estimating the metric entropy, approximation rates, and $n$-widths of these spaces. First, we introduce the notion of a smoothly parameterized dictionary and give upper bounds on the non-linear approximation rates, metric entropy and $n$-widths of their absolute convex hull. The upper bounds depend upon the order of smoothness of the parameterization. This result is applied to dictionaries of ridge functions corresponding to shallow neural networks, and they improve upon existing results in many cases. Next, we provide a method for lower bounding the metric entropy and $n$-widths of variation spaces which contain certain classes of ridge functions. This result gives sharp lower bounds on the $L^2$-approximation rates, metric entropy, and $n$-widths for variation spaces corresponding to neural networks with a range of important activation functions, including ReLU$^k$, sigmoidal activation functions with bounded variation, and the B-spline activation functions.

相關內容

神經網絡(Neural Networks)是世界上三個最古老的神經建模學會的檔案期刊:國際神經網絡學會(INNS)、歐洲神經網絡學會(ENNS)和日本神經網絡學會(JNNS)。神經網絡提供了一個論壇,以發展和培育一個國際社會的學者和實踐者感興趣的所有方面的神經網絡和相關方法的計算智能。神經網絡歡迎高質量論文的提交,有助于全面的神經網絡研究,從行為和大腦建模,學習算法,通過數學和計算分析,系統的工程和技術應用,大量使用神經網絡的概念和技術。這一獨特而廣泛的范圍促進了生物和技術研究之間的思想交流,并有助于促進對生物啟發的計算智能感興趣的跨學科社區的發展。因此,神經網絡編委會代表的專家領域包括心理學,神經生物學,計算機科學,工程,數學,物理。該雜志發表文章、信件和評論以及給編輯的信件、社論、時事、軟件調查和專利信息。文章發表在五個部分之一:認知科學,神經科學,學習系統,數學和計算分析、工程和應用。 官網地址:

Quantified responsibility ascription in complex scenarios is of crucial importance in current debates regarding collective action, for example in the face of various environmental crises. Within this endeavor, we recently proposed considering a probabilistic view of causation, rather than the deterministic views employed in much of the previous formal responsibility literature, and presented a corresponding framework as well as initial candidate functions applicable to a range of scenarios. In the current paper, we extend this contribution by formally evaluating the qualities of proposed functions through an axiomatic approach. First, we decompose responsibility ascription into distinct contributing functions, before defining a number of desirable properties, or axioms, for these. Afterwards we evaluate the proposed functions regarding compliance with these axioms. We find an incompatibility between axioms determining upper and lower bounds in one of the contributing functions, imposing a choice for one variant - upper bound or lower bound. For the other contributing function we are able to axiomatically characterize one specific function. As the previously mentioned incompatibility extends to the combined responsibility function we finally present maximally axiom compliant combined functions for each variant - including the upper bound axiom or including the lower bound axiom.

Frequency information lies at the base of discriminating between textures, and therefore between different objects. Classical CNN architectures limit the frequency learning through fixed filter sizes, and lack a way of explicitly controlling it. Here, we build on the structured receptive field filters with Gaussian derivative basis. Yet, rather than using predetermined derivative orders, which typically result in fixed frequency responses for the basis functions, we learn these. We show that by learning the order of the basis we can accurately learn the frequency of the filters, and hence adapt to the optimal frequencies for the underlying learning task. We investigate the well-founded mathematical formulation of fractional derivatives to adapt the filter frequencies during training. Our formulation leads to parameter savings and data efficiency when compared to the standard CNNs and the Gaussian derivative CNN filter networks that we build upon.

For a graph class $\mathcal{C}$, the $\mathcal{C}$-Edge-Deletion problem asks for a given graph $G$ to delete the minimum number of edges from $G$ in order to obtain a graph in $\mathcal{C}$. We study the $\mathcal{C}$-Edge-Deletion problem for $\mathcal{C}$ the permutation graphs, interval graphs, and other related graph classes. It follows from Courcelle's Theorem that these problems are fixed parameter tractable when parameterized by treewidth. In this paper, we present concrete FPT algorithms for these problems. By giving explicit algorithms and analyzing these in detail, we obtain algorithms that are significantly faster than the algorithms obtained by using Courcelle's theorem.

We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. Specifically, we are given a set $T$ of $n$ points in $\mathbb{R}^d$ and a parameter $0< \alpha <\frac 1 2$ such that an $\alpha$-fraction of the points in $T$ are i.i.d. samples from a well-behaved distribution $\mathcal{D}$ and the remaining $(1-\alpha)$-fraction are arbitrary. The goal is to output a small list of vectors, at least one of which is close to the mean of $\mathcal{D}$. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees, with running time $O(n^{1 + \epsilon_0} d)$, for any fixed $\epsilon_0 > 0$. All prior algorithms for this problem had additional polynomial factors in $\frac 1 \alpha$. We leverage this result, together with additional techniques, to obtain the first almost-linear time algorithms for clustering mixtures of $k$ separated well-behaved distributions, nearly-matching the statistical guarantees of spectral methods. Prior clustering algorithms inherently relied on an application of $k$-PCA, thereby incurring runtimes of $\Omega(n d k)$. This marks the first runtime improvement for this basic statistical problem in nearly two decades. The starting point of our approach is a novel and simpler near-linear time robust mean estimation algorithm in the $\alpha \to 1$ regime, based on a one-shot matrix multiplicative weights-inspired potential decrease. We crucially leverage this new algorithmic framework in the context of the iterative multi-filtering technique of Diakonikolas et al. '18, '20, providing a method to simultaneously cluster and downsample points using one-dimensional projections -- thus, bypassing the $k$-PCA subroutines required by prior algorithms.

The fundamental sparsest cut problem takes as input a graph $G$ together with the edge costs and demands, and seeks a cut that minimizes the ratio between the costs and demands across the cuts. For $n$-node graphs~$G$ of treewidth~$k$, \chlamtac, Krauthgamer, and Raghavendra (APPROX 2010) presented an algorithm that yields a factor-$2^{2^k}$ approximation in time $2^{O(k)} \cdot \operatorname{poly}(n)$. Later, Gupta, Talwar and Witmer (STOC 2013) showed how to obtain a $2$-approximation algorithm with a blown-up run time of $n^{O(k)}$. An intriguing open question is whether one can simultaneously achieve the best out of the aforementioned results, that is, a factor-$2$ approximation in time $2^{O(k)} \cdot \operatorname{poly}(n)$. In this paper, we make significant progress towards this goal, via the following results: (i) A factor-$O(k^2)$ approximation that runs in time $2^{O(k)} \cdot \operatorname{poly}(n)$, directly improving the work of Chlamt\'a\v{c} et al. while keeping the run time single-exponential in $k$. (ii) For any $\varepsilon>0$, a factor-$O(1/\varepsilon^2)$ approximation whose run time is $2^{O(k^{1+\varepsilon}/\varepsilon)} \cdot \operatorname{poly}(n)$, implying a constant-factor approximation whose run time is nearly single-exponential in $k$ and a factor-$O(\log^2 k)$ approximation in time $k^{O(k)} \cdot \operatorname{poly}(n)$. Key to these results is a new measure of a tree decomposition that we call combinatorial diameter, which may be of independent interest.

A key advantage of isogeometric discretizations is their accurate and well-behaved eigenfrequencies and eigenmodes. For degree two and higher, however, optical branches of spurious outlier frequencies and modes may appear due to boundaries or reduced continuity at patch interfaces. In this paper, we introduce a variational approach based on perturbed eigenvalue analysis that eliminates outlier frequencies without negatively affecting the accuracy in the remainder of the spectrum and modes. We then propose a pragmatic iterative procedure that estimates the perturbation parameters in such a way that the outlier frequencies are effectively reduced. We demonstrate that our approach allows for a much larger critical time-step size in explicit dynamics calculations. In addition, we show that the critical time-step size obtained with the proposed approach does not depend on the polynomial degree of spline basis functions.

Graph signals are signals with an irregular structure that can be described by a graph. Graph neural networks (GNNs) are information processing architectures tailored to these graph signals and made of stacked layers that compose graph convolutional filters with nonlinear activation functions. Graph convolutions endow GNNs with invariance to permutations of the graph nodes' labels. In this paper, we consider the design of trainable nonlinear activation functions that take into consideration the structure of the graph. This is accomplished by using graph median filters and graph max filters, which mimic linear graph convolutions and are shown to retain the permutation invariance of GNNs. We also discuss modifications to the backpropagation algorithm necessary to train local activation functions. The advantages of localized activation function architectures are demonstrated in four numerical experiments: source localization on synthetic graphs, authorship attribution of 19th century novels, movie recommender systems and scientific article classification. In all cases, localized activation functions are shown to improve model capacity.

Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

We demonstrate that many detection methods are designed to identify only a sufficently accurate bounding box, rather than the best available one. To address this issue we propose a simple and fast modification to the existing methods called Fitness NMS. This method is tested with the DeNet model and obtains a significantly improved MAP at greater localization accuracies without a loss in evaluation rate. Next we derive a novel bounding box regression loss based on a set of IoU upper bounds that better matches the goal of IoU maximization while still providing good convergence properties. Following these novelties we investigate RoI clustering schemes for improving evaluation rates for the DeNet \textit{wide} model variants and provide an analysis of localization performance at various input image dimensions. We obtain a MAP[0.5:0.95] of 33.6\%@79Hz and 41.8\%@5Hz for MSCOCO and a Titan X (Maxwell).

北京阿比特科技有限公司