国产成人精品三级在线,操下面视频在线观看免费欧美,亚洲日本国产成人一区二区,精品亚洲色欲AV区二区三区,欧美韩国日本国产一区在线观看

Forgetting is an important concept in knowledge representation and automated reasoning with widespread applications across a number of disciplines. A standard forgetting operator, characterized in [Lin and Reiter'94] in terms of model-theoretic semantics and primarily focusing on the propositional case, opened up a new research subarea. In this paper, a new operator called weak forgetting, dual to standard forgetting, is introduced and both together are shown to offer a new more uniform perspective on forgetting operators in general. Both the weak and standard forgetting operators are characterized in terms of entailment and inference, rather than a model theoretic semantics. This naturally leads to a useful algorithmic perspective based on quantifier elimination and the use of Ackermman's Lemma and its fixpoint generalization. The strong formal relationship between standard forgetting and strongest necessary conditions and weak forgetting and weakest sufficient conditions is also characterized quite naturally through the entailment-based, inferential perspective used. The framework used to characterize the dual forgetting operators is also generalized to the first-order case and includes useful algorithms for computing first-order forgetting operators in special cases. Practical examples are also included to show the importance of both weak and standard forgetting in modeling and representation.

相關內容

操作

關注 0

極大似然 · 似然 · 自編碼器 · Performer · 規范化的 ·

2023 年 6 月 28 日

Maximum Likelihood Training of Autoencoders

Peter Sorrenson,Felix Draxler,Armand Rousselot,Sander Hummerich,Lea Zimmermann,Ullrich K?the

Maximum likelihood training has favorable statistical properties and is popular for generative modeling, especially with normalizing flows. On the other hand, generative autoencoders promise to be more efficient than normalizing flows due to the manifold hypothesis. In this work, we introduce successful maximum likelihood training of unconstrained autoencoders for the first time, bringing the two paradigms together. To do so, we identify and overcome two challenges: Firstly, existing maximum likelihood estimators for free-form networks are unacceptably slow, relying on iteration schemes whose cost scales linearly with latent dimension. We introduce an improved estimator which eliminates iteration, resulting in constant cost (roughly double the runtime per batch of a vanilla autoencoder). Secondly, we demonstrate that naively applying maximum likelihood to autoencoders can lead to divergent solutions and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model. We call our model the maximum likelihood autoencoder (MLAE).

維數災難 · 操作 · Learning · 泛函 · MoDELS ·

2023 年 6 月 28 日

The curse of dimensionality in operator learning

Samuel Lanthaler,Andrew M. Stuart

Neural operator architectures employ neural networks to approximate operators mapping between Banach spaces of functions; they may be used to accelerate model evaluations via emulation, or to discover models from data. Consequently, the methodology has received increasing attention over recent years, giving rise to the rapidly growing field of operator learning. The first contribution of this paper is to prove that for general classes of operators which are characterized only by their $C^r$- or Lipschitz-regularity, operator learning suffers from a curse of dimensionality, defined precisely here in terms of representations of the infinite-dimensional input and output function spaces. The result is applicable to a wide variety of existing neural operators, including PCA-Net, DeepONet and the FNO. The second contribution of the paper is to prove that the general curse of dimensionality can be overcome for solution operators defined by the Hamilton-Jacobi equation; this is achieved by leveraging additional structure in the underlying solution operator, going beyond regularity. To this end, a novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system. Error and complexity estimates are derived for HJ-Net which show that this architecture can provably beat the curse of dimensionality related to the infinite-dimensional input and output function spaces.

Networking · Neural Networks · 可理解性 · 模型評估 · 全 ·

2023 年 6 月 27 日

Understanding the Effect of the Long Tail on Neural Network Compression

Harvey Dam,Vinu Joseph,Aditya Bhaskara,Ganesh Gopalakrishnan,Saurav Muralidharan,Michael Garland

Network compression is now a mature sub-field of neural network research: over the last decade, significant progress has been made towards reducing the size of models and speeding up inference, while maintaining the classification accuracy. However, many works have observed that focusing on just the overall accuracy can be misguided. E.g., it has been shown that mismatches between the full and compressed models can be biased towards under-represented classes. This raises the important research question, can we achieve network compression while maintaining "semantic equivalence" with the original network? In this work, we study this question in the context of the "long tail" phenomenon in computer vision datasets observed by Feldman, et al. They argue that memorization of certain inputs (appropriately defined) is essential to achieving good generalization. As compression limits the capacity of a network (and hence also its ability to memorize), we study the question: are mismatches between the full and compressed models correlated with the memorized training data? We present positive evidence in this direction for image classification tasks, by considering different base architectures and compression schemes.

估計/估計量 · 操作 · 核化 · 情景 · 再生核希爾伯特空間 ·

2023 年 6 月 27 日

Conditional expectation using compactification operators

Suddhasattwa Das

The separate tasks of denoising, conditional expectation and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that are stable to numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.

多樣性 · 閾值 · 估計/估計量 · 變換 · Learning ·

2023 年 6 月 26 日

Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression

Allan Raventós,Mansheej Paul,Feng Chen,Surya Ganguli

from arxiv, The first two authors contributed equally

Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating any weights. This raises a foundational question: can ICL solve fundamentally $\textit{new}$ tasks that are very different from those seen during pretraining? To probe this question, we examine ICL's performance on linear regression while varying the diversity of tasks in the pretraining dataset. We empirically demonstrate a $\textit{task diversity threshold}$ for the emergence of ICL. Below this threshold, the pretrained transformer cannot solve unseen regression tasks as it behaves like a Bayesian estimator with the $\textit{non-diverse pretraining task distribution}$ as the prior. Beyond this threshold, the transformer significantly outperforms this estimator; its behavior aligns with that of ridge regression, corresponding to a Gaussian prior over $\textit{all tasks}$, including those not seen during pretraining. These results highlight that, when pretrained on data with task diversity greater than the threshold, transformers $\textit{can}$ solve fundamentally new tasks in-context. Importantly, this capability hinges on it deviating from the Bayes optimal estimator with the pretraining distribution as the prior. This study underscores, in a concrete example, the critical role of task diversity, alongside data and model scale, in the emergence of ICL. Code is available at //github.com/mansheej/icl-task-diversity.

正則化項 · Subspace · 線性的 · 可約的 · Principle ·

2023 年 6 月 26 日

Subspace Recycling for Sequences of Shifted Systems with Applications in Image Recovery

Misha E. Kilmer,Eric de Sturler

For many applications involving a sequence of linear systems with slowly changing system matrices, subspace recycling, which exploits relationships among systems and reuses search space information, can achieve huge gains in iterations across the total number of linear system solves in the sequence. However, for general (i.e., non-identity) shifted systems with the shift value varying over a wide range, the properties of the linear systems vary widely as well, which makes recycling less effective. If such a sequence of systems is embedded in a nonlinear iteration, the problem is compounded, and special approaches are needed to use recycling effectively. In this paper, we develop new, more efficient, Krylov subspace recycling approaches for large-scale image reconstruction and restoration techniques that employ a nonlinear iteration to compute a suitable regularization matrix. For each new regularization matrix, we need to solve regularized linear systems, ${\bf A} + \gamma_\ell {\bf E}_k$, for a sequence of regularization parameters, $\gamma_\ell$, to find the optimally regularized solution that, in turn, will be used to update the regularization matrix. In this paper, we analyze system and solution characteristics to choose appropriate techniques to solve each system rapidly. Specifically, we use an inner-outer recycling approach with a larger, principal recycle space for each nonlinear step and smaller recycle spaces for each shift. We propose an efficient way to obtain good initial guesses from the principle recycle space and smaller shift-specific recycle spaces that lead to fast convergence. Our method is substantially reduces the total number of matrix-vector products that would arise in a naive approach. Our approach is more generally applicable to sequences of shifted systems where the matrices in the sum are positive semi-definite.

正則化項 · 操作 · 奇異的 · AIM · 奇異值 ·

2023 年 6 月 26 日

Characterizations of Adjoint Sobolev Embedding Operators with Applications in Inverse Problems

Simon Hubmer,Ekaterina Sherina,Ronny Ramlau

from arxiv, 35 pages, 2 figures

We consider the Sobolev embedding operator $E_s : H^s(\Omega) \to L_2(\Omega)$ and its role in the solution of inverse problems. In particular, we collect various properties and investigate different characterizations of its adjoint operator $E_s^*$, which is a common component in both iterative and variational regularization methods. These include variational representations and connections to boundary value problems, Fourier and wavelet representations, as well as connections to spatial filters. Moreover, we consider characterizations in terms of Fourier series, singular value decompositions and frame decompositions, as well as representations in finite dimensional settings. While many of these results are already known to researchers from different fields, a detailed and general overview or reference work containing rigorous mathematical proofs is still missing. Hence, in this paper we aim to fill this gap by collecting, introducing and generalizing a large number of characterizations of $E_s^*$ and discuss their use in regularization methods for solving inverse problems. The resulting compilation can serve both as a reference as well as a useful guide for its efficient numerical implementation in practice.

優化器 · 估計/估計量 · 情景 · GM · 方差 ·

2023 年 6 月 23 日

Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables

Jakob Runge

from arxiv, 41 pages, published as spotlight paper in 35th Conference on Neural Information Processing Systems (NeurIPS 2021); this version has an updated Supplementary Material with corrected proofs (also updated in NeurIPS proceedings)

The problem of selecting optimal backdoor adjustment sets to estimate causal effects in graphical models with hidden and conditioned variables is addressed. Previous work has defined optimality as achieving the smallest asymptotic estimation variance and derived an optimal set for the case without hidden variables. For the case with hidden variables there can be settings where no optimal set exists and currently only a sufficient graphical optimality criterion of limited applicability has been derived. In the present work optimality is characterized as maximizing a certain adjustment information which allows to derive a necessary and sufficient graphical criterion for the existence of an optimal adjustment set and a definition and algorithm to construct it. Further, the optimal set is valid if and only if a valid adjustment set exists and has higher (or equal) adjustment information than the Adjust-set proposed in Perkovi{\'c} et al. [Journal of Machine Learning Research, 18: 1--62, 2018] for any graph. The results translate to minimal asymptotic estimation variance for a class of estimators whose asymptotic variance follows a certain information-theoretic relation. Numerical experiments indicate that the asymptotic results also hold for relatively small sample sizes and that the optimal adjustment set or minimized variants thereof often yield better variance also beyond that estimator class. Surprisingly, among the randomly created setups more than 90\% fulfill the optimality conditions indicating that also in many real-world scenarios graphical optimality may hold. Code is available as part of the python package \url{//github.com/jakobrunge/tigramite}.

優化器 · Processing（編程語言） · MoDELS · 學成 · 最優化 ·

2021 年 12 月 19 日

Introduction to Online Convex Optimization

Elad Hazan

from arxiv, arXiv admin note: text overlap with arXiv:1909.03550

This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.

Networking · 層 · MoDELS · tuning · Performance ·

2020 年 7 月 1 日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Denny Zhou,Mao Ye,Chen Chen,Tianjian Meng,Mingxing Tan,Xiaodan Song,Quoc Le,Qiang Liu,Dale Schuurmans

from arxiv, ICML 2020

For deploying a deep learning model into production, it needs to be both accurate and compact to meet the latency and memory constraints. This usually results in a network that is deep (to ensure performance) and yet thin (to improve computational efficiency). In this paper, we propose an efficient method to train a deep thin network with a theoretic guarantee. Our method is motivated by model compression. It consists of three stages. In the first stage, we sufficiently widen the deep thin network and train it until convergence. In the second stage, we use this well-trained deep wide network to warm up (or initialize) the original deep thin network. This is achieved by letting the thin network imitate the immediate outputs of the wide network from layer to layer. In the last stage, we further fine tune this well initialized deep thin network. The theoretical guarantee is established by using mean field analysis, which shows the advantage of layerwise imitation over traditional training deep thin networks from scratch by backpropagation. We also conduct large-scale empirical experiments to validate our approach. By training with our method, ResNet50 can outperform ResNet101, and BERT_BASE can be comparable with BERT_LARGE, where both the latter models are trained via the standard training procedures as in the literature.