亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.

相關內容

This article explains the usage of R package CausalModels, which is publicly available on the Comprehensive R Archive Network. While packages are available for sufficiently estimating causal effects, there lacks a package that provides a collection of structural models using the conventional statistical approach developed by Hern\'an and Robins (2020). CausalModels addresses this deficiency of software in R concerning causal inference by offering tools for methods that account for biases in observational data without requiring extensive statistical knowledge. These methods should not be ignored and may be more appropriate or efficient in solving particular problems. While implementations of these statistical models are distributed among a number of causal packages, CausalModels introduces a simple and accessible framework for a consistent modeling pipeline among a variety of statistical methods for estimating causal effects in a single R package. It consists of common methods including standardization, IP weighting, G-estimation, outcome regression, instrumental variables and propensity matching.

Suppose we have available individual data from an internal study and various types of summary statistics from relevant external studies. External summary statistics have been used as constraints on the internal data distribution, which promised to improve the statistical inference in the internal data; however, the additional use of external summary data may lead to paradoxical results: efficiency loss may occur if the uncertainty of summary statistics is not negligible and large estimation bias can emerge even if the bias of external summary statistics is small. We investigate these paradoxical results in a semiparametric framework. We establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is shown to be no larger than that using only internal data. We propose a data-fused efficient estimator that achieves this bound so that the efficiency paradox is resolved. Besides, a debiased estimator is further proposed which has selection consistency property by employing adaptive lasso penalty so that the resultant estimator can achieve the same asymptotic distribution as the oracle one that uses only unbiased summary statistics, which resolves the bias paradox. Simulations and application to a Helicobacter pylori infection dataset are used to illustrate the proposed methods.

Efficient and robust control using spiking neural networks (SNNs) is still an open problem. Whilst behaviour of biological agents is produced through sparse and irregular spiking patterns, which provide both robust and efficient control, the activity patterns in most artificial spiking neural networks used for control are dense and regular -- resulting in potentially less efficient codes. Additionally, for most existing control solutions network training or optimization is necessary, even for fully identified systems, complicating their implementation in on-chip low-power solutions. The neuroscience theory of Spike Coding Networks (SCNs) offers a fully analytical solution for implementing dynamical systems in recurrent spiking neural networks -- while maintaining irregular, sparse, and robust spiking activity -- but it's not clear how to directly apply it to control problems. Here, we extend SCN theory by incorporating closed-form optimal estimation and control. The resulting networks work as a spiking equivalent of a linear-quadratic-Gaussian controller. We demonstrate robust spiking control of simulated spring-mass-damper and cart-pole systems, in the face of several perturbations, including input- and system-noise, system disturbances, and neural silencing. As our approach does not need learning or optimization, it offers opportunities for deploying fast and efficient task-specific on-chip spiking controllers with biologically realistic activity.

The identification of choice models is crucial for understanding consumer behavior, designing marketing policies, and developing new products. The identification of parametric choice-based demand models, such as the multinomial choice model (MNL), is typically straightforward. However, nonparametric models, which are highly effective and flexible in explaining customer choices, may encounter the curse of the dimensionality and lose their identifiability. For example, the ranking-based model, which is a nonparametric model and designed to mirror the random utility maximization (RUM) principle, is known to be nonidentifiable from the collection of choice probabilities alone. In this paper, we develop a new class of nonparametric models that is not subject to the problem of nonidentifiability. Our model assumes bounded rationality of consumers, which results in symmetric demand cannibalization and intriguingly enables full identification. That is to say, we can uniquely construct the model based on its observed choice probabilities over assortments. We further propose an efficient estimation framework using a combination of column generation and expectation-maximization algorithms. Using a real-world data, we show that our choice model demonstrates competitive prediction accuracy compared to the state-of-the-art benchmarks, despite incorporating the assumption of bounded rationality which could, in theory, limit the representation power of our model.

In this paper, we extend the Generalized Finite Difference Method (GFDM) on unknown compact submanifolds of the Euclidean domain, identified by randomly sampled data that (almost surely) lie on the interior of the manifolds. Theoretically, we formalize GFDM by exploiting a representation of smooth functions on the manifolds with Taylor's expansions of polynomials defined on the tangent bundles. We illustrate the approach by approximating the Laplace-Beltrami operator, where a stable approximation is achieved by a combination of Generalized Moving Least-Squares algorithm and novel linear programming that relaxes the diagonal-dominant constraint for the estimator to allow for a feasible solution even when higher-order polynomials are employed. We establish the theoretical convergence of GFDM in solving Poisson PDEs and numerically demonstrate the accuracy on simple smooth manifolds of low and moderate high co-dimensions as well as unknown 2D surfaces. For the Dirichlet Poisson problem where no data points on the boundaries are available, we employ GFDM with the volume-constraint approach that imposes the boundary conditions on data points close to the boundary. When the location of the boundary is unknown, we introduce a novel technique to detect points close to the boundary without needing to estimate the distance of the sampled data points to the boundary. We demonstrate the effectiveness of the volume-constraint employed by imposing the boundary conditions on the data points detected by this new technique compared to imposing the boundary conditions on all points within a certain distance from the boundary, where the latter is sensitive to the choice of truncation distance and require the knowledge of the boundary location.

Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.

The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.

The Bayesian paradigm has the potential to solve core issues of deep neural networks such as poor calibration and data inefficiency. Alas, scaling Bayesian inference to large weight spaces often requires restrictive approximations. In this work, we show that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors. The other weights are kept as point estimates. This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets. In particular, we implement subnetwork linearized Laplace: We first obtain a MAP estimate of all weights and then infer a full-covariance Gaussian posterior over a subnetwork. We propose a subnetwork selection strategy that aims to maximally preserve the model's predictive uncertainty. Empirically, our approach is effective compared to ensembles and less expressive posterior approximations over full networks.

For deploying a deep learning model into production, it needs to be both accurate and compact to meet the latency and memory constraints. This usually results in a network that is deep (to ensure performance) and yet thin (to improve computational efficiency). In this paper, we propose an efficient method to train a deep thin network with a theoretic guarantee. Our method is motivated by model compression. It consists of three stages. In the first stage, we sufficiently widen the deep thin network and train it until convergence. In the second stage, we use this well-trained deep wide network to warm up (or initialize) the original deep thin network. This is achieved by letting the thin network imitate the immediate outputs of the wide network from layer to layer. In the last stage, we further fine tune this well initialized deep thin network. The theoretical guarantee is established by using mean field analysis, which shows the advantage of layerwise imitation over traditional training deep thin networks from scratch by backpropagation. We also conduct large-scale empirical experiments to validate our approach. By training with our method, ResNet50 can outperform ResNet101, and BERT_BASE can be comparable with BERT_LARGE, where both the latter models are trained via the standard training procedures as in the literature.

Spectral clustering (SC) is a popular clustering technique to find strongly connected communities on a graph. SC can be used in Graph Neural Networks (GNNs) to implement pooling operations that aggregate nodes belonging to the same cluster. However, the eigendecomposition of the Laplacian is expensive and, since clustering results are graph-specific, pooling methods based on SC must perform a new optimization for each new sample. In this paper, we propose a graph clustering approach that addresses these limitations of SC. We formulate a continuous relaxation of the normalized minCUT problem and train a GNN to compute cluster assignments that minimize this objective. Our GNN-based implementation is differentiable, does not require to compute the spectral decomposition, and learns a clustering function that can be quickly evaluated on out-of-sample graphs. From the proposed clustering method, we design a graph pooling operator that overcomes some important limitations of state-of-the-art graph pooling techniques and achieves the best performance in several supervised and unsupervised tasks.

北京阿比特科技有限公司