亚洲精品无码国产爽快A片百度_久久久久精品电影_2020色愉拍亚洲偷自拍_男人边吃奶边摸下面视频_在线观看免费人成视频色95_久久久久免费看黄A级毛片_国产免费网站在线看大片

A four-legged robot has learned to run on sand at faster pace than humans jog on solid ground. With low energy use and few failures, this rapid robot shows the value of combining data-driven learning with accurate yet simple models.

知識薈萃

精品入門和(he)進階教程、論(lun)文和(he)代碼整理等(deng)

查(cha)看相關(guan)VIP內(nei)容、論文、資訊(xun)等

變換 · 簇 · 樣例 · Learning · 曲率 ·

2023 年 9 月 27 日

Ellipsoid fitting with the Cayley transform

Omar Melikechi,David B. Dunson

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

學習率 · MoDELS · Learning · 對數幾率 · 縮放 ·

2023 年 9 月 25 日

Small-scale proxies for large-scale Transformer training instabilities

Mitchell Wortsman,Peter J. Liu,Lechao Xiao,Katie Everett,Alex Alemi,Ben Adlam,John D. Co-Reyes,Izzeddin Gur,Abhishek Kumar,Roman Novak,Jeffrey Pennington,Jascha Sohl-dickstein,Kelvin Xu,Jaehoon Lee,Justin Gilmer,Simon Kornblith

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training stability and instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the $\mu$Param (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model activation and gradient norms.

INFORMS · 表示 · Processing（編程語言） · 相互獨立的 · 平穩的 ·

2023 年 9 月 24 日

A recursive representation for decoupling time-state dependent jumps from jump-diffusion processes

Qinjing Qiu,Reiichiro Kawai

from arxiv, 25 pages, 3 figures

We establish a recursive representation that fully decouples jumps from a large class of multivariate inhomogeneous stochastic differential equations with jumps of general time-state dependent unbounded intensity, not of L\'evy-driven type that essentially benefits a lot from independent and stationary increments. The recursive representation, along with a few related ones, are derived by making use of a jump time of the underlying dynamics as an information relay point in passing the past on to a previous iteration step to fill in the missing information on the unobserved trajectory ahead. We prove that the proposed recursive representations are convergent exponentially fast in the limit, and can be represented in a similar form to Picard iterates under the probability measure with its jump component suppressed. On the basis of each iterate, we construct upper and lower bounding functions that are also convergent towards the true solution as the iterations proceed. We provide numerical results to justify our theoretical findings.

Learning · 通道 · 分離的 · 查準率/準確率 · 前饋 ·

2023 年 9 月 23 日

Tight bounds on Pauli channel learning without entanglement

Senrui Chen,Changhun Oh,Sisi Zhou,Hsin-Yuan Huang,Liang Jiang

from arxiv, 22 pages, 1 figure. Comments welcome!

Entanglement is a useful resource for learning, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize separable states, measurements, and operations between the main system of interest and an ancillary system. These algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. We prove a tight lower bound for learning Pauli channels without entanglement that closes a cubic gap between the best-known upper and lower bound. In particular, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ rounds of measurements. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for characterizing Pauli noise.

可辨認的 · entity · 命名實體識別 · 查準率/準確率 · Analysis ·

2023 年 9 月 23 日

Named entity recognition using GPT for identifying comparable companies

Eurico Covas

from arxiv, 10 pages, 1 figure, to be submited to a journal

For both public and private firms, comparable companies' analysis is widely used as a method for company valuation. In particular, the method is of great value for valuation of private equity companies. The several approaches to the comparable companies' method usually rely on a qualitative approach to identifying similar peer companies, which tend to use established industry classification schemes and/or analyst intuition and knowledge. However, more quantitative methods have started being used in the literature and in the private equity industry, in particular, machine learning clustering, and natural language processing (NLP). For NLP methods, the process consists of extracting product entities from e.g., the company's website or company descriptions from some financial database system and then to perform similarity analysis. Here, using companies' descriptions/summaries from publicly available companies' Wikipedia websites, we show that using large language models (LLMs), such as GPT from OpenAI, has a much higher precision and success rate than using the standard named entity recognition (NER) methods which use manual annotation. We demonstrate quantitatively a higher precision rate, and show that, qualitatively, it can be used to create appropriate comparable companies peer groups which could then be used for equity valuation.

Conformer · 語音識別 · 端到端 · MoDELS · 外部記憶 ·

2023 年 9 月 22 日

Memory-augmented conformer for improved end-to-end long-form ASR

Carlos Carvalho,Alberto Abad

Conformers have recently been proposed as a promising modelling approach for automatic speech recognition (ASR), outperforming recurrent neural network-based approaches and transformers. Nevertheless, in general, the performance of these end-to-end models, especially attention-based models, is particularly degraded in the case of long utterances. To address this limitation, we propose adding a fully-differentiable memory-augmented neural network between the encoder and decoder of a conformer. This external memory can enrich the generalization for longer utterances since it allows the system to store and retrieve more information recurrently. Notably, we explore the neural Turing machine (NTM) that results in our proposed Conformer-NTM model architecture for ASR. Experimental results using Librispeech train-clean-100 and train-960 sets show that the proposed system outperforms the baseline conformer without memory for long utterances.

Tensor · 秩 · 離散化 · Subspace · INFORMS ·

2023 年 9 月 22 日

Discreteness of asymptotic tensor ranks

Jop Bri?t,Matthias Christandl,Itai Leigh,Amir Shpilka,Jeroen Zuiddam

Tensor parameters that are amortized or regularized over large tensor powers, often called "asymptotic" tensor parameters, play a central role in several areas including algebraic complexity theory (constructing fast matrix multiplication algorithms), quantum information (entanglement cost and distillable entanglement), and additive combinatorics (bounds on cap sets, sunflower-free sets, etc.). Examples are the asymptotic tensor rank, asymptotic slice rank and asymptotic subrank. Recent works (Costa-Dalai, Blatter-Draisma-Rupniewski, Christandl-Gesmundo-Zuiddam) have investigated notions of discreteness (no accumulation points) or "gaps" in the values of such tensor parameters. We prove a general discreteness theorem for asymptotic tensor parameters of order-three tensors and use this to prove that (1) over any finite field (and in fact any finite set of coefficients in any field), the asymptotic subrank and the asymptotic slice rank have no accumulation points, and (2) over the complex numbers, the asymptotic slice rank has no accumulation points. Central to our approach are two new general lower bounds on the asymptotic subrank of tensors, which measures how much a tensor can be diagonalized. The first lower bound says that the asymptotic subrank of any concise three-tensor is at least the cube-root of the smallest dimension. The second lower bound says that any concise three-tensor that is "narrow enough" (has one dimension much smaller than the other two) has maximal asymptotic subrank. Our proofs rely on new lower bounds on the maximum rank in matrix subspaces that are obtained by slicing a three-tensor in the three different directions. We prove that for any concise tensor, the product of any two such maximum ranks must be large, and as a consequence there are always two distinct directions with large max-rank.

MoDELS · 預測器/決策函數 · Performer · MASS · 曲率 ·

2023 年 9 月 22 日

Predictor models for high-performance wheel loading

Koji Aoshima,Arvid F?lldin,Eddie Wadbro,Martin Servin

from arxiv, 22 pages, 19 figures

Autonomous wheel loading involves selecting actions that maximize the total performance over many repetitions. The actions should be well adapted to the current state of the pile and its future states. Selecting the best actions is difficult since the pile states are consequences of previous actions and thus are highly unknown. To aid the selection of actions, this paper investigates data-driven models to predict the loaded mass, time, work, and resulting pile state of a loading action given the initial pile state. Deep neural networks were trained on data using over 10,000 simulations to an accuracy of 91-97,% with the pile state represented either by a heightmap or by its slope and curvature. The net outcome of sequential loading actions is predicted by repeating the model inference at five milliseconds per loading. As errors accumulate during the inferences, long-horizon predictions need to be combined with a physics-based model.

統計量 · MoDELS · SimPLe · DATE · Networks ·

2023 年 9 月 21 日

Overcoming near-degeneracy in the autologistic actor attribute model

Alex Stivala

from arxiv, Added a paper to literature survey

The autologistic actor attribute model, or ALAAM, is the social influence counterpart of the better-known exponential-family random graph model (ERGM) for social selection. Extensive experience with ERGMs has shown that the problem of near-degeneracy which often occurs with simple models can be overcome by using "geometrically weighted" or "alternating" statistics. In the much more limited empirical applications of ALAAMs to date, the problem of near-degeneracy, although theoretically expected, appears to have been less of an issue. In this work I present a comprehensive survey of ALAAM applications, showing that this model has to date only been used with relatively small networks, in which near-degeneracy does not appear to be a problem. I show near-degeneracy does occur in simple ALAAM models of larger empirical networks, define some geometrically weighted ALAAM statistics analogous to those for ERGM, and demonstrate that models with these statistics do not suffer from near-degeneracy and hence can be estimated where they could not be with the simple statistics.

語言模型化 · MoDELS · IR · 似然 · 掩碼語言模型化 ·

2020 年 10 月 20 日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Xinyu Ma,Jiafeng Guo,Ruqing Zhang,Yixing Fan,Xiang Ji,Xueqi Cheng

from arxiv, Accepted by WSDM2021

Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to predict the pairwise preference between the two word sets, jointly with the Masked Language Model (MLM) objective. By further fine-tuning on a variety of representative downstream ad-hoc retrieval tasks, PROP achieves significant improvements over baselines without pre-training or with other pre-training methods. We also show that PROP can achieve exciting performance under both the zero- and low-resource IR settings. The code and pre-trained models are available at //github.com/Albert-Ma/PROP.