亚洲黄色网站不卡免费,蝴蝶谷成人网,韩国免费A级在线观看观,亚洲欧美日韩国产中文字幕专区

Employing a matrix mask, a vector subdivision scheme is a fast iterative averaging algorithm to compute refinable vector functions for wavelet methods in numerical PDEs and to produce smooth curves in CAGD. In sharp contrast to the well-studied scalar subdivision schemes, vector subdivision schemes are much less well understood, e.g., Lagrange and (generalized) Hermite subdivision schemes are the only studied vector subdivision schemes in the literature. Because many wavelets used in numerical PDEs are derived from refinable vector functions whose matrix masks are not from Hermite subdivision schemes, it is necessary to introduce and study vector subdivision schemes for any general matrix masks in order to compute wavelets and refinable vector functions efficiently. For a general matrix mask, we show that there is only one meaningful way of defining a vector subdivision scheme. Motivated by vector cascade algorithms and recent study on Hermite subdivision schemes, we shall define a vector subdivision scheme for any arbitrary matrix mask and then we prove that the convergence of the newly defined vector subdivision scheme is equivalent to the convergence of its associated vector cascade algorithm. We also study convergence rates of vector subdivision schemes. The results of this paper not only bridge the gaps and establish intrinsic links between vector subdivision schemes and vector cascade algorithms but also strengthen and generalize current known results on Lagrange and (generalized) Hermite subdivision schemes. Several examples are provided to illustrate the results in this paper on various types of vector subdivision schemes with convergence rates.

相關內容

向量化

關注 0

2022 年 10 月 27 日

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Yu Bai,Chi Jin,Song Mei,Ziang Song,Tiancheng Yu

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the \emph{$\Phi$-Hedge} algorithm -- A generic algorithm capable of learning a large class of equilibria for NFGs. We show that $\Phi$-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the \emph{$\Phi$-Hedge} algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time. This new connection further allows us to design and analyze a new class of OMD algorithms based on modifying its log-partition function. In particular, we design an improved algorithm with balancing techniques that achieves a sharp $\widetilde{\mathcal{O}}(\sqrt{XAT})$ EFCE-regret under bandit-feedback in an EFG with $X$ information sets, $A$ actions, and $T$ episodes. To our best knowledge, this is the first such rate and matches the information-theoretic lower bound.

AUC · 值域 · 假陽性 · ROC · 性能度量 ·

2022 年 10 月 27 日

Large-scale Optimization of Partial AUC in a Range of False Positive Rates

Yao Yao,Qihang Lin,Tianbao Yang

The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. However, it summarizes the true positive rates (TPRs) over all false positive rates (FPRs) in the ROC space, which may include the FPRs with no practical relevance in some applications. The partial AUC, as a generalization of the AUC, summarizes only the TPRs over a specific range of the FPRs and is thus a more suitable performance measure in many real-world situations. Although partial AUC optimization in a range of FPRs had been studied, existing algorithms are not scalable to big data and not applicable to deep learning. To address this challenge, we cast the problem into a non-smooth difference-of-convex (DC) program for any smooth predictive functions (e.g., deep neural networks), which allowed us to develop an efficient approximated gradient descent method based on the Moreau envelope smoothing technique, inspired by recent advances in non-smooth DC optimization. To increase the efficiency of large data processing, we used an efficient stochastic block coordinate update in our algorithm. Our proposed algorithm can also be used to minimize the sum of ranked range loss, which also lacks efficient solvers. We established a complexity of $\tilde O(1/\epsilon^6)$ for finding a nearly $\epsilon$-critical solution. Finally, we numerically demonstrated the effectiveness of our proposed algorithms for both partial AUC maximization and sum of ranked range loss minimization.

Networking · Neural Networks · Performer · 層 · 對角矩陣 ·

2022 年 10 月 26 日

A Mini-Block Fisher Method for Deep Neural Networks

Achraf Bahamou,Donald Goldfarb,Yi Ren

Deep neural networks (DNNs) are currently predominantly trained using first-order methods. Some of these methods (e.g., Adam, AdaGrad, and RMSprop, and their variants) incorporate a small amount of curvature information by using a diagonal matrix to precondition the stochastic gradient. Recently, effective second-order methods, such as KFAC, K-BFGS, Shampoo, and TNT, have been developed for training DNNs, by preconditioning the stochastic gradient by layer-wise block-diagonal matrices. Here we propose a "mini-block Fisher (MBF)" preconditioned gradient method, that lies in between these two classes of methods. Specifically, our method uses a block-diagonal approximation to the empirical Fisher matrix, where for each layer in the DNN, whether it is convolutional or feed-forward and fully connected, the associated diagonal block is itself block-diagonal and is composed of a large number of mini-blocks of modest size. Our novel approach utilizes the parallelism of GPUs to efficiently perform computations on the large number of matrices in each layer. Consequently, MBF's per-iteration computational cost is only slightly higher than it is for first-order methods. The performance of our proposed method is compared to that of several baseline methods, on both autoencoder and CNN problems, to validate its effectiveness both in terms of time efficiency and generalization power. Finally, it is proved that an idealized version of MBF converges linearly.

近似 · 優化器 · 泛化理論 · Pivotal（公司） · Notability ·

2022 年 10 月 26 日

Approximations for Generalized Unsplittable Flow on Paths with Application to Power Systems Optimization

Areg Karapetyan,Khaled Elbassioni,Majid Khonji,Chi-Kin Chau

The Unsplittable Flow on a Path (UFP) problem has garnered considerable attention as a challenging combinatorial optimization problem with notable practical implications. Steered by its pivotal applications in power engineering, the present work formulates a novel generalization of UFP, wherein demands and capacities in the input instance are monotone step functions over the set of edges. As an initial step towards tackling this generalization, we draw on and extend ideas from prior research to devise a quasi-polynomial time approximation scheme (QPTAS) under the premise that the demands and capacities lie in a quasi-polynomial range. Second, retaining the same assumption, an efficient logarithmic approximation is introduced for the single-source variant of the problem. Finally, we round up the contributions by designing a (kind of) black-box reduction that, under some mild conditions, allows to translate LP-based approximation algorithms for the studied problem into their counterparts for the Alternating Current Optimal Power Flow (AC OPF) problem -- a fundamental workflow in operation and control of power systems.

CAD · Processing（編程語言） · Integration · 離散化 · 變換 ·

2022 年 10 月 25 日

Smart cloud collocation: geometry-aware adaptivity directly from CAD

Thibault Jacquemin,Pratik Suchde,Stéphane P. A. Bordas

Computer Aided Design (CAD) is widely used in the creation and optimization of various industrial systems and processes. Transforming a CAD geometry into a computational discretization that be used to solve PDEs requires care and a deep knowledge of the selected computational method. In this article, we present a novel integrated collocation scheme based on smart clouds. It allows us to transform a CAD geometry into a complete point collocation model, aware of the base geometry, with minimum effort. For this process, only the geometry of the domain, in the form of a STEP file, and the boundary conditions are needed. We also introduce an adaptive refinement process for the resultant smart cloud using an \textit{a posteriori} error indication. The scheme can be applied to any 2D or 3D geometry, to any PDE and can be applied to most point collocation approaches. We illustrate this with the meshfree Generalized Finite Difference (GFD) method applied to steady linear elasticity problems. We further show that each step of this process, from the initial discretization to the refinement strategy, is connected and is affected by the approach selected in the previous step, thus requiring an integrated scheme where the whole solution process should be considered at once.

Oracle · 增廣拉格朗日法 · 易處理的 · motivation · SimPLe ·

2022 年 10 月 25 日

Faster Projection-Free Augmented Lagrangian Methods via Weak Proximal Oracle

Dan Garber,Tsur Livney,Shoham Sabac

This paper considers a convex composite optimization problem with affine constraints, which includes problems that take the form of minimizing a smooth convex objective function over the intersection of (simple) convex sets, or regularized with multiple (simple) functions. Motivated by high-dimensional applications in which exact projection/proximal computations are not tractable, we propose a \textit{projection-free} augmented Lagrangian-based method, in which primal updates are carried out using a \textit{weak proximal oracle} (WPO). In an earlier work, WPO was shown to be more powerful than the standard \textit{linear minimization oracle} (LMO) that underlies conditional gradient-based methods (aka Frank-Wolfe methods). Moreover, WPO is computationally tractable for many high-dimensional problems of interest, including those motivated by recovery of low-rank matrices and tensors, and optimization over polytopes which admit efficient LMOs. The main result of this paper shows that under a certain curvature assumption (which is weaker than strong convexity), our WPO-based algorithm achieves an ergodic rate of convergence of $O(1/T)$ for both the objective residual and feasibility gap. This result, to the best of our knowledge, improves upon the $O(1/\sqrt{T})$ rate for existing LMO-based projection-free methods for this class of problems. Empirical experiments on a low-rank and sparse covariance matrix estimation task and the Max Cut semidefinite relaxation demonstrate the superiority of our method over state-of-the-art LMO-based Lagrangian-based methods.

局部極小 · 極小值 · 最優化 · CASE · 優化器 ·

2022 年 10 月 24 日

Noisy Low-rank Matrix Optimization: Geometry of Local Minima and Convergence Rate

Ziye Ma,Somayeh Sojoudi

This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted Isometry Property (RIP), leading to a wealth of results on the geometric landscape of the problem and the convergence rate of common algorithms. However, the existing results can handle the problem in the case with a general objective function subject to noisy data only when the RIP constant is close to 0. In this paper, we develop a new mathematical framework to solve the above-mentioned problem with a far less restrictive RIP constant. We prove that as long as the RIP constant of the noiseless objective is less than $1/3$, any spurious local solution of the noisy optimization problem must be close to the ground truth solution. By working through the strict saddle property, we also show that an approximate solution can be found in polynomial time. We characterize the geometry of the spurious local minima of the problem in a local region around the ground truth in the case when the RIP constant is greater than $1/3$. Compared to the existing results in the literature, this paper offers the strongest RIP bound and provides a complete theoretical analysis on the global and local optimization landscapes of general low-rank optimization problems under random corruptions from any finite-variance family.

相關系數 · AUC · 變換 · 示例 · 學成 ·

2021 年 6 月 2 日

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication

Zhuchen Shao,Hao Bian,Yang Chen,Yifeng Wang,Jian Zhang,Xiangyang Ji,Yongbing Zhang

Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. However, the current MIL methods are usually based on independent and identical distribution hypothesis, thus neglect the correlation among different instances. To address this problem, we proposed a new framework, called correlated MIL, and provided a proof for convergence. Based on this framework, we devised a Transformer based MIL (TransMIL), which explored both morphological and spatial information. The proposed TransMIL can effectively deal with unbalanced/balanced and binary/multiple classification with great visualization and interpretability. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods. The test AUC for the binary tumor classification can be up to 93.09% over CAMELYON16 dataset. And the AUC over the cancer subtypes classification can be up to 96.03% and 98.82% over TCGA-NSCLC dataset and TCGA-RCC dataset, respectively.

語言模型化 · MoDELS · IR · 似然 · 掩碼語言模型化 ·

2020 年 10 月 20 日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Xinyu Ma,Jiafeng Guo,Ruqing Zhang,Yixing Fan,Xiang Ji,Xueqi Cheng

from arxiv, Accepted by WSDM2021

Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to predict the pairwise preference between the two word sets, jointly with the Masked Language Model (MLM) objective. By further fine-tuning on a variety of representative downstream ad-hoc retrieval tasks, PROP achieves significant improvements over baselines without pre-training or with other pre-training methods. We also show that PROP can achieve exciting performance under both the zero- and low-resource IR settings. The code and pre-trained models are available at //github.com/Albert-Ma/PROP.

語言模型化 · MoDELS · 詞表 · 優化器 · state-of-the-art ·

2019 年 9 月 25 日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Sanqiang Zhao,Raghav Gupta,Yang Song,Denny Zhou

Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge devices. In particular, the input word embedding matrix accounts for a significant proportion of the model's memory footprint, due to the large input vocabulary and embedding dimensions. Knowledge distillation techniques have had success at compressing large neural network models, but they are ineffective at yielding student models with vocabularies different from the original teacher models. We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.