亚州AV无码专区在线电影_国产亚洲欧美丝袜在线观看三区_老师好紧好爽搔浪我还要_在线播放高清资源国产成人精品_久久99久久99精品视频免费_免费观看高清A级毛片视频_久久久人人看大香伊精品综合

We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators. We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need.

相關內容

線性的(de)

關注 1

情景 · 直徑 · 圖 · FOCS · SODA ·

2023 年 10 月 18 日

Simpler and Higher Lower Bounds for Shortcut Sets

Virginia Vassilevska Williams,Yinzhan Xu,Zixuan Xu

from arxiv, To appear in SODA 2024. Abstract shortened to fit arXiv requirements

We provide a variety of lower bounds for the well-known shortcut set problem: how much can one decrease the diameter of a directed graph on $n$ vertices and $m$ edges by adding $O(n)$ or $O(m)$ of shortcuts from the transitive closure of the graph. Our results are based on a vast simplification of the recent construction of Bodwin and Hoppenworth [FOCS 2023] which was used to show an $\widetilde{\Omega}(n^{1/4})$ lower bound for the $O(n)$-sized shortcut set problem. We highlight that our simplification completely removes the use of the convex sets by B\'ar\'any and Larman [Math. Ann. 1998] used in all previous lower bound constructions. Our simplification also removes the need for randomness and further removes some log factors. This allows us to generalize the construction to higher dimensions, which in turn can be used to show the following results. For $O(m)$-sized shortcut sets, we show an $\Omega(n^{1/5})$ lower bound, improving on the previous best $\Omega(n^{1/8})$ lower bound. For all $\varepsilon > 0$, we show that there exists a $\delta > 0$ such that there are $n$-vertex $O(n)$-edge graphs $G$ where adding any shortcut set of size $O(n^{2-\varepsilon})$ keeps the diameter of $G$ at $\Omega(n^\delta)$. This improves the sparsity of the constructed graph compared to a known similar result by Hesse [SODA 2003]. We also consider the sourcewise setting for shortcut sets: given a graph $G=(V,E)$, a set $S\subseteq V$, how much can we decrease the sourcewise diameter of $G$, $\max_{(s, v) \in S \times V, \text{dist}(s, v) < \infty} \text{dist}(s,v)$ by adding a set of edges $H$ from the transitive closure of $G$? We show that for any integer $d \ge 2$, there exists a graph $G=(V, E)$ on $n$ vertices and $S \subseteq V$ with $|S| = \widetilde{\Theta}(n^{3/(d+3)})$, such that when adding $O(n)$ or $O(m)$ shortcuts, the sourcewise diameter is $\widetilde{\Omega}(|S|^{1/3})$.

Attention · MoDELS · 圖 · CHAP · Integration ·

2023 年 10 月 18 日

AMR Parsing with Causal Hierarchical Attention and Pointers

Chao Lou,Kewei Tu

from arxiv, EMNLP 2023

Translation-based AMR parsers have recently gained popularity due to their simplicity and effectiveness. They predict linearized graphs as free texts, avoiding explicit structure modeling. However, this simplicity neglects structural locality in AMR graphs and introduces unnecessary tokens to represent coreferences. In this paper, we introduce new target forms of AMR parsing and a novel model, CHAP, which is equipped with causal hierarchical attention and the pointer mechanism, enabling the integration of structures into the Transformer decoder. We empirically explore various alternative modeling options. Experiments show that our model outperforms baseline models on four out of five benchmarks in the setting of no additional data.

最大似然估計 · 極大似然 · 向量空間 · 估計/估計量 · 可理解性 ·

2023 年 10 月 18 日

Optimising Distributions with Natural Gradient Surrogates

Jonathan So,Richard E. Turner

Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution, for which computing the natural gradient is easy. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. Our method expands the set of distributions that can be efficiently targeted with natural gradients. Furthermore, it is fast, easy to understand, simple to implement using standard autodiff software, and does not require lengthy model-specific derivations. We demonstrate our method on maximum likelihood estimation and variational inference tasks.

Performer · ChatGPT · Analysis · 有偏 · Performance ·

2023 年 10 月 18 日

Bias in Emotion Recognition with ChatGPT

Naoki Wake,Atsushi Kanehira,Kazuhiro Sasabuchi,Jun Takamatsu,Katsushi Ikeuchi

from arxiv, 5 pages, 4 figures, 6 tables

This technical report explores the ability of ChatGPT in recognizing emotions from text, which can be the basis of various applications like interactive chatbots, data annotation, and mental health analysis. While prior research has shown ChatGPT's basic ability in sentiment analysis, its performance in more nuanced emotion recognition is not yet explored. Here, we conducted experiments to evaluate its performance of emotion recognition across different datasets and emotion labels. Our findings indicate a reasonable level of reproducibility in its performance, with noticeable improvement through fine-tuning. However, the performance varies with different emotion labels and datasets, highlighting an inherent instability and possible bias. The choice of dataset and emotion labels significantly impacts ChatGPT's emotion recognition performance. This paper sheds light on the importance of dataset and label selection, and the potential of fine-tuning in enhancing ChatGPT's emotion recognition capabilities, providing a groundwork for better integration of emotion analysis in applications using ChatGPT.

Networking · SimPLe · 樣例 · 神經元 · Analysis ·

2023 年 10 月 16 日

First Steps Towards a Runtime Analysis of Neuroevolution

Paul Fischer,Emil Lundt Larsen,Carsten Witt

from arxiv, 27 pages; full version of paper published at FOGA 2023 and available at ACM

We consider a simple setting in neuroevolution where an evolutionary algorithm optimizes the weights and activation functions of a simple artificial neural network. We then define simple example functions to be learned by the network and conduct rigorous runtime analyses for networks with a single neuron and for a more advanced structure with several neurons and two layers. Our results show that the proposed algorithm is generally efficient on two example problems designed for one neuron and efficient with at least constant probability on the example problem for a two-layer network. In particular, the so-called harmonic mutation operator choosing steps of size $j$ with probability proportional to $1/j$ turns out as a good choice for the underlying search space. However, for the case of one neuron, we also identify situations with hard-to-overcome local optima. Experimental investigations of our neuroevolutionary algorithm and a state-of-the-art CMA-ES support the theoretical findings.

類別 · MoDELS · 特征空間 · motivation · CASES ·

2023 年 10 月 15 日

Generalized Neural Collapse for a Large Number of Classes

Jiachen Jiang,Jinxin Zhou,Peng Wang,Qing Qu,Dustin Mixon,Chong You,Zhihui Zhu

from arxiv, 32 pages, 12 figures

Neural collapse provides an elegant mathematical characterization of learned last layer representations (a.k.a. features) and classifier weights in deep classification models. Such results not only provide insights but also motivate new techniques for improving practical deep models. However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space. This paper extends neural collapse to cases where the number of classes are much larger than the dimension of feature space, which broadly occur for language models, retrieval systems, and face recognition applications. We show that the features and classifier exhibit a generalized neural collapse phenomenon, where the minimum one-vs-rest margins is maximized.We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks. Moreover, we provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint, under certain technical conditions on feature dimension and number of classes.

Analysis · 線性的 · Weight · motivation · Performer ·

2023 年 10 月 13 日

Mediation Analysis using Semi-parametric Shape-Restricted Regression with Applications

Qing Yin,Jong-Hyeon Jeong,Xu Qin,Shyamal D Peddada,Jennifer Adibi

Often linear regression is used to perform mediation analysis. However, in many instances, the underlying relationships may not be linear, as in the case of placental-fetal hormones and fetal development. Although, the exact functional form of the relationship may be unknown, one may hypothesize the general shape of the relationship. For these reasons, we develop a novel shape-restricted inference-based methodology for conducting mediation analysis. This work is motivated by an application in fetal endocrinology where researchers are interested in understanding the effects of pesticide application on birth weight, with human chorionic gonadotropin (hCG) as the mediator. We assume a practically plausible set of nonlinear effects of hCG on the birth weight and a linear relationship between pesticide exposure and hCG, with both exposure-outcome and exposure-mediator models being linear in the confounding factors. Using the proposed methodology on a population-level prenatal screening program data, with hCG as the mediator, we discovered that, while the natural direct effects suggest a positive association between pesticide application and birth weight, the natural indirect effects were negative.

散度 · 邊緣化 · 分解 · MoDELS · 確切的 ·

2023 年 10 月 13 日

Computing Marginal and Conditional Divergences between Decomposable Models with Applications

Loong Kuan Lee,Geoffrey I. Webb,Daniel F. Schmidt,Nico Piatkowski

from arxiv, 10 pages, 8 figures, Accepted at the IEEE International Conference on Data Mining (ICDM) 2023

The ability to compute the exact divergence between two high-dimensional distributions is useful in many applications but doing so naively is intractable. Computing the alpha-beta divergence -- a family of divergences that includes the Kullback-Leibler divergence and Hellinger distance -- between the joint distribution of two decomposable models, i.e chordal Markov networks, can be done in time exponential in the treewidth of these models. However, reducing the dissimilarity between two high-dimensional objects to a single scalar value can be uninformative. Furthermore, in applications such as supervised learning, the divergence over a conditional distribution might be of more interest. Therefore, we propose an approach to compute the exact alpha-beta divergence between any marginal or conditional distribution of two decomposable models. Doing so tractably is non-trivial as we need to decompose the divergence between these distributions and therefore, require a decomposition over the marginal and conditional distributions of these models. Consequently, we provide such a decomposition and also extend existing work to compute the marginal and conditional alpha-beta divergence between these decompositions. We then show how our method can be used to analyze distributional changes by first applying it to a benchmark image dataset. Finally, based on our framework, we propose a novel way to quantify the error in contemporary superconducting quantum computers. Code for all experiments is available at: //lklee.dev/pub/2023-icdm/code

TOOLS · 正則的 · 語言模型化 · MoDELS · 情景 ·

2023 年 10 月 13 日

On Tools for Completeness of Kleene Algebra with Hypotheses

Damien Pous,Jurriaan Rot,Jana Wagemaker

In the literature on Kleene algebra, a number of variants have been proposed which impose additional structure specified by a theory, such as Kleene algebra with tests (KAT) and the recent Kleene algebra with observations (KAO), or make specific assumptions about certain constants, as for instance in NetKAT. Many of these variants fit within the unifying perspective offered by Kleene algebra with hypotheses, which comes with a canonical language model constructed from a given set of hypotheses. For the case of KAT, this model corresponds to the familiar interpretation of expressions as languages of guarded strings. A relevant question therefore is whether Kleene algebra together with a given set of hypotheses is complete with respect to its canonical language model. In this paper, we revisit, combine and extend existing results on this question to obtain tools for proving completeness in a modular way. We showcase these tools by giving new and modular proofs of completeness for KAT, KAO and NetKAT, and we prove completeness for new variants of KAT: KAT extended with a constant for the full relation, KAT extended with a converse operation, and a version of KAT where the collection of tests only forms a distributive lattice.

MoDELS · 似然 · Better · 可約的 · 極大似然 ·

2023 年 10 月 12 日

Calibrating Likelihoods towards Consistency in Summarization Models

Polina Zablotskaia,Misha Khalman,Rishabh Joshi,Livio Baldini Soares,Shoshana Jakobovits,Joshua Maynez,Shashi Narayan

Despite the recent advances in abstractive text summarization, current summarization models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. We argue that the main reason for such behavior is that the summarization models trained with maximum likelihood objective assign high probability to plausible sequences given the context, but they often do not accurately rank sequences by their consistency. In this work, we solve this problem by calibrating the likelihood of model generated sequences to better align with a consistency metric measured by natural language inference (NLI) models. The human evaluation study and automatic metrics show that the calibrated models generate more consistent and higher-quality summaries. We also show that the models trained using our method return probabilities that are better aligned with the NLI scores, which significantly increase reliability of summarization models.