91婷婷国产精选国产色,亚洲主播福利视频网,国产小视频网址在线观看,尤物成人免费高清在线视频,中文字幕日韩两性色无码一区二区

In this article, we present an extension of the splitting algorithm proposed in [22] to networks of conservation laws with piecewise linear discontinuous flux functions in the unknown. We start with the discussion of a suitable Riemann solver at the junction and then describe a strategy how to use the splitting algorithm on the network. In particular, we focus on two types of junctions, i.e., junctions where the number of outgoing roads does not exceed the number of incoming roads (dispersing type) and junctions with two incoming and one outgoing road (merging type). Finally, numerical examples demonstrate the accuracy of the splitting algorithm by comparisons to the exact solution and other approaches used in the literature.

相關內容

泛函

關注 0

估計/估計量 · Performer · NLP · 可辨認的 · LogME ·

2022 年 10 月 20 日

Evidence > Intuition: Transferability Estimation for Encoder Selection

Elisa Bassignana,Max Müller-Eberstein,Mike Zhang,Barbara Plank

from arxiv, Accepted at EMNLP 2022 (main conference)

With the increase in availability of large pre-trained language models (LMs) in Natural Language Processing (NLP), it becomes critical to assess their fit for a specific target task a priori - as fine-tuning the entire space of available LMs is computationally prohibitive and unsustainable. However, encoder transferability estimation has received little to no attention in NLP. In this paper, we propose to generate quantitative evidence to predict which LM, out of a pool of models, will perform best on a target task without having to fine-tune all candidates. We provide a comprehensive study on LM ranking for 10 NLP tasks spanning the two fundamental problem types of classification and structured prediction. We adopt the state-of-the-art Logarithm of Maximum Evidence (LogME) measure from Computer Vision (CV) and find that it positively correlates with final LM performance in 94% of the setups. In the first study of its kind, we further compare transferability measures with the de facto standard of human practitioner ranking, finding that evidence from quantitative metrics is more robust than pure intuition and can help identify unexpected LM candidates.

Networking · GM · MoDELS · 推斷 · motivation ·

2022 年 10 月 20 日

Graphical model inference with external network data

Jack Jewson,Li Li,Laura Battaglia,Stephen Hansen,David Rossell,Piotr Zwiernik

A frequent challenge when using graphical models in applications is that the sample size is limited relative to the number of parameters to be learned. Our motivation stems from applications where one has external data, in the form of networks between variables, that provides valuable information to help improve inference. Specifically, we depict the relation between COVID-19 cases and social and geographical network data, and between stock market returns and economic and policy networks extracted from text data. We propose a graphical LASSO framework where likelihood penalties are guided by the external network data. We also propose a spike-and-slab prior framework that depicts how partial correlations depend on the networks, which helps interpret the fitted graphical model and its relationship to the network. We develop computational schemes and software implementations in R and probabilistic programming languages. Our applications show how incorporating network data can significantly improve interpretation, statistical accuracy, and out-of-sample prediction, in some instances using significantly sparser graphical models than would have otherwise been estimated.

優化器 · 多樣性 · 機器人 · 值域 · 泛函 ·

2022 年 10 月 20 日

Discovering Many Diverse Solutions with Bayesian Optimization

Natalie Maus,Kaiwen Wu,David Eriksson,Jacob Gardner

Bayesian optimization (BO) is a popular approach for sample-efficient optimization of black-box objective functions. While BO has been successfully applied to a wide range of scientific applications, traditional approaches to single-objective BO only seek to find a single best solution. This can be a significant limitation in situations where solutions may later turn out to be intractable. For example, a designed molecule may turn out to violate constraints that can only be reasonably evaluated after the optimization process has concluded. To address this issue, we propose Rank-Ordered Bayesian Optimization with Trust-regions (ROBOT) which aims to find a portfolio of high-performing solutions that are diverse according to a user-specified diversity metric. We evaluate ROBOT on several real-world applications and show that it can discover large sets of high-performing diverse solutions while requiring few additional function evaluations compared to finding a single best solution.

MoDELS · 縮放 · 優化器 · Learning · 強化學習 ·

2022 年 10 月 19 日

Scaling Laws for Reward Model Overoptimization

Leo Gao,John Schulman,Jacob Hilton

In reinforcement learning from human feedback, it is common to optimize against a reward model trained to predict human preferences. Because the reward model is an imperfect proxy, optimizing its value too much can hinder ground truth performance, in accordance with Goodhart's law. This effect has been frequently observed, but not carefully measured due to the expense of collecting human preference data. In this work, we use a synthetic setup in which a fixed "gold-standard" reward model plays the role of humans, providing labels used to train a proxy reward model. We study how the gold reward model score changes as we optimize against the proxy reward model using either reinforcement learning or best-of-$n$ sampling. We find that this relationship follows a different functional form depending on the method of optimization, and that in both cases its coefficients scale smoothly with the number of reward model parameters. We also study the effect on this relationship of the size of the reward model dataset, the number of reward model and policy parameters, and the coefficient of the KL penalty added to the reward in the reinforcement learning setup. We explore the implications of these empirical results for theoretical considerations in AI alignment.

泛函 · 近似 · 激活函數 · Networking · Performer ·

2022 年 10 月 19 日

A new activation for neural networks and its approximation

Jianfei Li,Han Feng,Ding-Xuan Zhou

Deep learning with deep neural networks (DNNs) has attracted tremendous attention from various fields of science and technology recently. Activation functions for a DNN define the output of a neuron given an input or set of inputs. They are essential and inevitable in learning non-linear transformations and performing diverse computations among successive neuron layers. Thus, the design of activation functions is still an important topic in deep learning research. Meanwhile, theoretical studies on the approximation ability of DNNs with activation functions have been investigated within the last few years. In this paper, we propose a new activation function, named as "DLU", and investigate its approximation ability for functions with various smoothness and structures. Our theoretical results show that DLU networks can process competitive approximation performance with rational and ReLU networks, and have some advantages. Numerical experiments are conducted comparing DLU with the existing activations-ReLU, Leaky ReLU, and ELU, which illustrate the good practical performance of DLU.

Performer · MoDELS · 可辨認的 · 可理解性 · 泛函 ·

2022 年 10 月 18 日

Anticipating Performativity by Predicting from Predictions

Celestine Mendler-Dünner,Frances Ding,Yixin Wang

from arxiv, to appear at NeurIPS 2022, revision corresponds to camera ready version

Predictions about people, such as their expected educational achievement or their credit risk, can be performative and shape the outcome that they aim to predict. Understanding the causal effect of these predictions on the eventual outcomes is crucial for foreseeing the implications of future predictive models and selecting which models to deploy. However, this causal estimation task poses unique challenges: model predictions are usually deterministic functions of input features and highly correlated with outcomes. This can make the causal effects of predictions on outcomes impossible to disentangle from the direct effect of the covariates. We study this problem through the lens of causal identifiability, and despite the hardness of this problem in full generality, we highlight three natural scenarios where the causal relationship between covariates, predictions and outcomes can be identified from observational data: randomization in predictions, overparameterization of the predictive model deployed during data collection, and discrete prediction outputs. Empirically we show that given our identifiability conditions hold, standard variants of supervised learning that predict from predictions by treating the prediction as an input feature can indeed find transferable functional relationships that allow for conclusions about newly deployed predictive models. These positive results fundamentally rely on model predictions being recorded during data collection, bringing forward the importance of rethinking standard data collection practices to enable progress towards a better understanding of social outcomes and performative feedback loops.

Integration · 正則化項 · 可約的 · 規范化的 · 離散化 ·

2022 年 10 月 18 日

Analyses of the contour integral method for time fractional subdiffusion-normal transport equation

Fugui Ma,Lijing Zhao,Weihua Deng,Yejuan Wang

from arxiv, 34 pages,4 figures, 12 tables

In this work, we theoretically and numerically discuss the time fractional subdiffusion-normal transport equation, which depicts a crossover from sub-diffusion (as $t\rightarrow 0$) to normal diffusion (as $t\rightarrow \infty$). Firstly, the well-posedness and regularities of the model are studied by using the bivariate Mittag-Leffler function. Theoretical results show that after introducing the first-order derivative operator, the regularity of the solution can be improved in substance. Then, a numerical scheme with high-precision is developed no matter the initial value is smooth or non-smooth. More specifically, we use the contour integral method (CIM) with parameterized hyperbolic contour to approximate the temporal local and non-local operators, and employ the standard Galerkin finite element method for spacial discretization. Rigorous error estimates show that the proposed numerical scheme has spectral accuracy in time and optimal convergence order in space. Besides, we further improve the algorithm and reduce the computational cost by using the barycentric Lagrange interpolation. Finally, the obtained theoretical results as well as the acceleration algorithm are verified by several 1-D and 2-D numerical experiments, which also show that the numerical scheme developed in this paper is effective and robust.

BART · 混合時間 · 混合 · MCMC · Performer ·

2022 年 10 月 17 日

A Mixing Time Lower Bound for a Simplified Version of BART

Omer Ronen,Theo Saarinen,Yan Shuo Tan,James Duncan,Bin Yu

Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression algorithm. The posterior is a distribution over sums of decision trees, and predictions are made by averaging approximate samples from the posterior. The combination of strong predictive performance and the ability to provide uncertainty measures has led BART to be commonly used in the social sciences, biostatistics, and causal inference. BART uses Markov Chain Monte Carlo (MCMC) to obtain approximate posterior samples over a parameterized space of sums of trees, but it has often been observed that the chains are slow to mix. In this paper, we provide the first lower bound on the mixing time for a simplified version of BART in which we reduce the sum to a single tree and use a subset of the possible moves for the MCMC proposal distribution. Our lower bound for the mixing time grows exponentially with the number of data points. Inspired by this new connection between the mixing time and the number of data points, we perform rigorous simulations on BART. We show qualitatively that BART's mixing time increases with the number of data points. The slow mixing time of the simplified BART suggests a large variation between different runs of the simplified BART algorithm and a similar large variation is known for BART in the literature. This large variation could result in a lack of stability in the models, predictions, and posterior intervals obtained from the BART MCMC samples. Our lower bound and simulations suggest increasing the number of chains with the number of data points.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

有向 · 圖 · Networking · 向量化 · Neural Networks ·

2020 年 12 月 10 日

Directional Graph Networks

Dominique Beaini,Saro Passaro,Vincent Létourneau,William L. Hamilton,Gabriele Corso,Pietro Liò

from arxiv, 9 pages, 11 pages appendix, 6 figures, subtitle: Anisotropic aggregation in graph neural networks via directional vector fields

In order to overcome the expressive limitations of graph neural networks (GNNs), we propose the first method that exploits vector flows over graphs to develop globally consistent directional and asymmetric aggregation functions. We show that our directional graph networks (DGNs) generalize convolutional neural networks (CNNs) when applied on a grid. Whereas recent theoretical works focus on understanding local neighbourhoods, local structures and local isomorphism with no global information flow, our novel theoretical framework allows directional convolutional kernels in any graph. First, by defining a vector field in the graph, we develop a method of applying directional derivatives and smoothing by projecting node-specific messages into the field. Then we propose the use of the Laplacian eigenvectors as such vector field, and we show that the method generalizes CNNs on an n-dimensional grid, and is provably more discriminative than standard GNNs regarding the Weisfeiler-Lehman 1-WL test. Finally, we bring the power of CNN data augmentation to graphs by providing a means of doing reflection, rotation and distortion on the underlying directional field. We evaluate our method on different standard benchmarks and see a relative error reduction of 8\% on the CIFAR10 graph dataset and 11% to 32% on the molecular ZINC dataset. An important outcome of this work is that it enables to translate any physical or biological problems with intrinsic directional axes into a graph network formalism with an embedded directional field.