亚洲乱色熟女一区二区三区麻豆,啊灬啊灬啊灬快灬深用两性

Inspired by the developments in quantum computing, building quantum-inspired classical hardware to solve computationally hard problems has been receiving increasing attention. By introducing systematic sparsification techniques, we propose and demonstrate a massively parallel architecture, termed sIM or the sparse Ising Machine. Exploiting the sparsity of the resultant problem graphs, the sIM achieves ideal parallelism: the key figure of merit $-$ flips per second $-$ scales linearly with the total number of probabilistic bits (p-bit) in the system. This makes sIM up to 6 orders of magnitude faster than a CPU implementing standard Gibbs sampling. When compared to optimized implementations in TPUs and GPUs, the sIM delivers up to ~ 5 - 18x measured speedup. In benchmark combinatorial optimization problems such as integer factorization, the sIM can reliably factor semi-primes up to 32-bits, far larger than previous attempts from D-Wave and other probabilistic solvers. Strikingly, the sIM beats competition-winning SAT solvers (by up to ~ 4 - 700x in runtime to reach 95% accuracy) in solving hard instances of the 3SAT problem. A surprising observation is that even when the asynchronous sampling is made inexact with simultaneous updates using faster clocks, sIM can find the correct ground state with further speedup. The problem encoding and sparsification techniques we introduce can be readily applied to other Ising Machines (classical and quantum) and the asynchronous architecture we present can be used for scaling the demonstrated 5,000$-$10,000 p-bits to 1,000,000 or more through CMOS or emerging nanodevices.

相關內容

稀疏化

關注 0

Automator · 優化器 · Performer · Machine Learning · 可約的 ·

2021 年 11 月 29 日

Naive Automated Machine Learning

Felix Mohr,Marcel Wever

An essential task of Automated Machine Learning (AutoML) is the problem of automatically finding the pipeline with the best generalization performance on a given dataset. This problem has been addressed with sophisticated black-box optimization techniques such as Bayesian Optimization, Grammar-Based Genetic Algorithms, and tree search algorithms. Most of the current approaches are motivated by the assumption that optimizing the components of a pipeline in isolation may yield sub-optimal results. We present Naive AutoML, an approach that does precisely this: It optimizes the different algorithms of a pre-defined pipeline scheme in isolation. The finally returned pipeline is obtained by just taking the best algorithm of each slot. The isolated optimization leads to substantially reduced search spaces, and, surprisingly, this approach yields comparable and sometimes even better performance than current state-of-the-art optimizers.

CC · Processing（編程語言） · 易處理的 · 規范化的 · CASE ·

2021 年 11 月 28 日

Computational Complexity of Normalizing Constants for the Product of Determinantal Point Processes

Naoto Ohsaka,Tatsuya Matsuoka

from arxiv, 59 pages. This is an extended version of our conference paper presented at ICML 2020

We consider the product of determinantal point processes (DPPs), a point process whose probability mass is proportional to the product of principal minors of multiple matrices, as a natural, promising generalization of DPPs. We study the computational complexity of computing its normalizing constant, which is among the most essential probabilistic inference tasks. Our complexity-theoretic results (almost) rule out the existence of efficient algorithms for this task unless the input matrices are forced to have favorable structures. In particular, we prove the following: (1) Computing $\sum_S\det({\bf A}_{S,S})^p$ exactly for every (fixed) positive even integer $p$ is UP-hard and Mod$_3$P-hard, which gives a negative answer to an open question posed by Kulesza and Taskar. (2) $\sum_S\det({\bf A}_{S,S})\det({\bf B}_{S,S})\det({\bf C}_{S,S})$ is NP-hard to approximate within a factor of $2^{O(|I|^{1-\epsilon})}$ or $2^{O(n^{1/\epsilon})}$ for any $\epsilon>0$, where $|I|$ is the input size and $n$ is the order of the input matrix. This result is stronger than the #P-hardness for the case of two matrices derived by Gillenwater. (3) There exists a $k^{O(k)}n^{O(1)}$-time algorithm for computing $\sum_S\det({\bf A}_{S,S})\det({\bf B}_{S,S})$, where $k$ is the maximum rank of $\bf A$ and $\bf B$ or the treewidth of the graph formed by nonzero entries of $\bf A$ and $\bf B$. Such parameterized algorithms are said to be fixed-parameter tractable. These results can be extended to the fixed-size case. Further, we present two applications of fixed-parameter tractable algorithms given a matrix $\bf A$ of treewidth $w$: (4) We can compute a $2^{\frac{n}{2p-1}}$-approximation to $\sum_S\det({\bf A}_{S,S})^p$ for any fractional number $p>1$ in $w^{O(wp)}n^{O(1)}$ time. (5) We can find a $2^{\sqrt n}$-approximation to unconstrained MAP inference in $w^{O(w\sqrt n)}n^{O(1)}$ time.

Performer · 自編碼器 · 可約的 · 解碼 · Processing（編程語言） ·

2021 年 11 月 28 日

Scalable and Efficient Neural Speech Coding: A Hybrid Design

Kai Zhen,Jongmo Sung,Mi Suk Lee,Seungkwon Beak,Minje Kim

from arxiv, IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP), 2021 (Accepted for publication)

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (NWC) during its feedforward routine. The proposed NWC also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process. We achieve efficiency by introducing compact model components to NWC, such as gated residual networks and depthwise separable convolution. Furthermore, the proposed models are with a scalable architecture, cross-module residual learning (CMRL), to cover a wide range of bitrates. To this end, we employ the residual coding concept to concatenate multiple NWC autoencoding modules, where each NWC module performs residual coding to restore any reconstruction loss that its preceding modules have created. CMRL can scale down to cover lower bitrates as well, for which it employs linear predictive coding (LPC) module as its first autoencoder. The hybrid design integrates LPC and NWC by redefining LPC's quantization as a differentiable process, making the system training an end-to-end manner. The decoder of proposed system is with either one NWC (0.12 million parameters) in low to medium bitrate ranges (12 to 20 kbps) or two NWCs in the high bitrate (32 kbps). Although the decoding complexity is not yet as low as that of conventional speech codecs, it is significantly reduced from that of other neural speech coders, such as a WaveNet-based vocoder. For wide-band speech coding quality, our system yields comparable or superior performance to AMR-WB and Opus on TIMIT test utterances at low and medium bitrates. The proposed system can scale up to higher bitrates to achieve near transparent performance.

可約的 · 統計量 · 推斷 ·

2021 年 11 月 27 日

Is Causal Reasoning Harder than Probabilistic Reasoning?

Milan Mossé,Duligur Ibeling,Thomas Icard

Many tasks in statistical and causal inference can be construed as problems of \emph{entailment} in a suitable formal language. We ask whether those problems are more difficult, from a computational perspective, for \emph{causal} probabilistic languages than for pure probabilistic (or "associational") languages. Despite several senses in which causal reasoning is indeed more complex -- both expressively and inferentially -- we show that causal entailment (or satisfiability) problems can be systematically and robustly reduced to purely probabilistic problems. Thus there is no jump in computational complexity. Along the way we answer several open problems concerning the complexity of well known probability logics, in particular demonstrating the $\exists\mathbb{R}$-completeness of a polynomial probability calculus, as well as a seemingly much simpler system, the logic of comparative conditional probability.

優化器 · 可約的 · 穩健性 · 最大似然估計 · 鞍點 ·

2021 年 11 月 26 日

Robust and Efficient Optimization Using a Marquardt-Levenberg Algorithm with R Package marqLevAlg

Viviane Philipps,Boris P Hejblum,Mélanie Prague,Daniel Commenges,Cécile Proust-Lima

from arxiv, 20 pages, 4 figures

Implementations in R of classical general-purpose algorithms for local optimization generally have two major limitations which cause difficulties in applications to complex problems: too loose convergence criteria and too long calculation time. By relying on a Marquardt-Levenberg algorithm (MLA), a Newton-like method particularly robust for solving local optimization problems, we provide with marqLevAlg package an efficient and general-purpose local optimizer which (i) prevents convergence to saddle points by using a stringent convergence criterion based on the relative distance to minimum/maximum in addition to the stability of the parameters and of the objective function; and (ii) reduces the computation time in complex settings by allowing parallel calculations at each iteration. We demonstrate through a variety of cases from the literature that our implementation reliably and consistently reaches the optimum (even when other optimizers fail), and also largely reduces computational time in complex settings through the example of maximum likelihood estimation of different sophisticated statistical models.

圖 · 學成 · 學習的學習 · entity · 半正定 ·

2021 年 10 月 19 日

Learning to Learn Graph Topologies

Xingyue Pu,Tianyue Cao,Xiaoyun Zhang,Xiaowen Dong,Siheng Chen

from arxiv, Accepted at NeurIPS 2021

Learning a graph topology to reveal the underlying relationship between data entities plays an important role in various machine learning and data analysis tasks. Under the assumption that structured data vary smoothly over a graph, the problem can be formulated as a regularised convex optimisation over a positive semidefinite cone and solved by iterative algorithms. Classic methods require an explicit convex function to reflect generic topological priors, e.g. the $\ell_1$ penalty for enforcing sparsity, which limits the flexibility and expressiveness in learning rich topological structures. We propose to learn a mapping from node data to the graph structure based on the idea of learning to optimise (L2O). Specifically, our model first unrolls an iterative primal-dual splitting algorithm into a neural network. The key structural proximal projection is replaced with a variational autoencoder that refines the estimated graph with enhanced topological properties. The model is trained in an end-to-end fashion with pairs of node data and graph samples. Experiments on both synthetic and real-world data demonstrate that our model is more efficient than classic iterative algorithms in learning a graph with specific topological properties.

MoDELS · INFORMS · 分解的 · 推薦系統 · 剪枝 ·

2021 年 2 月 20 日

$FM^2$: Field-matrixed Factorization Machines for Recommender Systems

Yang Sun,Junwei Pan,Alex Zhang,Aaron Flores

from arxiv, In Proceedings of the Web Conference 2021 (WWW 2021), April 19-23, 2021, Ljubljana, Slovenia. 10 pages

Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named as Field-matrixed Factorization Machines (FmFM, or $FM^2$). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model's performance is also comparable to DNN models which require much more FLOPs in runtime.

Performer · 估計/估計量 · Softmax · 正則化項 · 注意力機制 ·

2020 年 9 月 30 日

Rethinking Attention with Performers

Krzysztof Choromanski,Valerii Likhosherstov,David Dohan,Xingyou Song,Andreea Gane,Tamas Sarlos,Peter Hawkins,Jared Davis,Afroz Mohiuddin,Lukasz Kaiser,David Belanger,Lucy Colwell,Adrian Weller

from arxiv, 36 pages. This is an updated version of a previous submission which can be found at arXiv:2006.03555. See //github.com/google-research/google-research/tree/master/protein_lm for protein language model code, and //github.com/google-research/google-research/tree/master/performer for Performer code

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), which may be of independent interest for scalable kernel methods. FAVOR+ can be also used to efficiently model kernelizable attention mechanisms beyond softmax. This representational power is crucial to accurately compare softmax with other kernels for the first time on large-scale tasks, beyond the reach of regular Transformers, and investigate optimal attention-kernels. Performers are linear architectures fully compatible with regular Transformers and with strong theoretical guarantees: unbiased or nearly-unbiased estimation of the attention matrix, uniform convergence and low estimation variance. We tested Performers on a rich set of tasks stretching from pixel-prediction through text models to protein sequence modeling. We demonstrate competitive results with other examined efficient sparse and dense attention methods, showcasing effectiveness of the novel attention-learning paradigm leveraged by Performers.

可約的 · Machine Translation · MoDELS · NMT · Extensibility ·

2018 年 5 月 29 日

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Xing Niu,Michael Denkowski,Marine Carpuat

from arxiv, Accepted at the 2nd Workshop on Neural Machine Translation and Generation (WNMT 2018)

Despite impressive progress in high-resource settings, Neural Machine Translation (NMT) still struggles in low-resource and out-of-domain scenarios, often failing to match the quality of phrase-based translation. We propose a novel technique that combines back-translation and multilingual NMT to improve performance in these difficult cases. Our technique trains a single model for both directions of a language pair, allowing us to back-translate source or target monolingual data without requiring an auxiliary model. We then continue training on the augmented parallel data, enabling a cycle of improvement for a single model that can incorporate any source, target, or parallel data to improve both translation directions. As a byproduct, these models can reduce training and deployment costs significantly compared to uni-directional models. Extensive experiments show that our technique outperforms standard back-translation in low-resource scenarios, improves quality on cross-domain tasks, and effectively reduces costs across the board.

注意力機制 · 稀疏 · Machine Translation · NMT · 變換 ·

2018 年 5 月 21 日

Sparse and Constrained Attention for Neural Machine Translation

Chaitanya Malaviya,Pedro Ferreira,André F. T. Martins

from arxiv, Proceedings of ACL 2018

In NMT, words are sometimes dropped from the source or generated repeatedly in the translation. We explore novel strategies to address the coverage problem that change only the attention transformation. Our approach allocates fertilities to source words, used to bound the attention each word can receive. We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse. Empirical evaluation is provided in three languages pairs.