久久久久精品电影,久久男人免费视频,在线观看亚洲国产成人精品

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However, few large-scale analyses have been performed on this topic, mostly because of the lack of robust race-disambiguation algorithms. Identifying author information does not generally include the author's race. Therefore, an algorithm needs to be employed, using known information about authors, i.e., their names, to infer their perceived race. Nevertheless, as any other algorithm, the process of racial inference can generate biases if it is not carefully considered. When the research is focused on the understanding of racial-based inequalities, such biases undermine the objectives of the investigation and may perpetuate inequities. The goal of this article is to assess the biases introduced by the different approaches used name-based racial inference. We use information from US census and mortgage applications to infer the race of US author names in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name-based inference varies by race and ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article fills an important research gap that will allow more systematic and unbiased studies on racial disparity in science.

相關內容

INFORMS

關注 10

《計算機信息》雜志發表高質量的論文，擴大了運籌學和計算的范圍，尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文，以及描述新的和有用的軟件工具的論文。官網鏈接： · IRC · Performer · Integration · 正則的 ·

2021 年 6 月 22 日

Sparsistent Model Discovery

Georges Tod,Gert-Jan Both,Remy Kusters

Discovering the partial differential equations underlying a spatio-temporal datasets from very limited observations is of paramount interest in many scientific fields. However, it remains an open question to know when model discovery algorithms based on sparse regression can actually recover the underlying physical processes. We trace back the poor of performance of Lasso based model discovery algorithms to its potential variable selection inconsistency: meaning that even if the true model is present in the library, it might not be selected. By first revisiting the irrepresentability condition (IRC) of the Lasso, we gain some insights of when this might occur. We then show that the adaptive Lasso will have more chances of verifying the IRC than the Lasso and propose to integrate it within a deep learning model discovery framework with stability selection and error control. Experimental results show we can recover several nonlinear and chaotic canonical PDEs with a single set of hyperparameters from a very limited number of samples at high noise levels.

協變量偏移 · 模型平均 · 近似推斷 · 近似 · 推斷 ·

2021 年 6 月 22 日

Dangers of Bayesian Model Averaging under Covariate Shift

Pavel Izmailov,Patrick Nicholson,Sanae Lotfi,Andrew Gordon Wilson

Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.

INFORMS · 數學 · arXiv · 信息檢索 · Engineering ·

2021 年 6 月 22 日

Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations

Andre Greiner-Petter,Moritz Schubotz,Fabian Mueller,Corinna Breitinger,Howard S. Cohl,Akiko Aizawa,Bela Gipp

from arxiv, Proceedings of The Web Conference 2020 (WWW'20), April 20--24, 2020, Taipei, Taiwan

Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking $P_{n}^{(\alpha, \beta)}\!\left(x\right)$ with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data.

生成模型 · MoDELS · 最大平均偏差 · 模型評估 · 可約的 ·

2021 年 6 月 22 日

Discrepancy-based Inference for Intractable Generative Models using Quasi-Monte Carlo

Ziang Niu,Johanna Meier,Fran?ois-Xavier Briol

Intractable generative models are models for which the likelihood is unavailable but sampling is possible. Most approaches to parameter inference in this setting require the computation of some discrepancy between the data and the generative model. This is for example the case for minimum distance estimation and approximate Bayesian computation. These approaches require sampling a high number of realisations from the model for different parameter values, which can be a significant challenge when simulating is an expensive operation. In this paper, we propose to enhance this approach by enforcing "sample diversity" in simulations of our models. This will be implemented through the use of quasi-Monte Carlo (QMC) point sets. Our key results are sample complexity bounds which demonstrate that, under smoothness conditions on the generator, QMC can significantly reduce the number of samples required to obtain a given level of accuracy when using three of the most common discrepancies: the maximum mean discrepancy, the Wasserstein distance, and the Sinkhorn divergence. This is complemented by a simulation study which highlights that an improved accuracy is sometimes also possible in some settings which are not covered by the theory.

衰減 · 模態 · 估計/估計量 · 評論員 · 張成子空間 ·

2021 年 6 月 20 日

Life-cycle assessment for flutter probability of a long-span suspension bridge based on field monitoring data

Xiaolei Chu,Hung Nguyen Sinh,Wei Cui,Lin Zhao,Yaojun Ge

Assessment of structural safety status is of paramount importance for existing bridges, where accurate evaluation of flutter probability is essential for long-span bridges. In current engineering practice, at the design stage, flutter critical wind speed is usually estimated by the wind tunnel test, which is sensitive to modal frequencies and damping ratios. After construction, structural properties of existing structures will change with time due to various factors, such as structural deteriorations and periodic environments. The structural dynamic properties, such as modal frequencies and damping ratios, cannot be considered as the same values as the initial ones, and the deteriorations should be included when estimating the life-cycle flutter probability. This paper proposes an evaluation framework to assess the life-cycle flutter probability of long-span bridges considering the deteriorations of structural properties, based on field monitoring data. The Bayesian approach is employed for modal identification of a suspension bridge with the main span of 1650 m, and the field monitoring data during 2010-2015 is analyzed to determine the deterioration functions of modal frequencies and damping ratios, as well as their inter-seasonal fluctuations. According to the historical trend, the long-term structural properties can be predicted, and the probability distributions of flutter critical wind speed for each year in the long term are calculated. Consequently, the life-cycle flutter probability is estimated, based on the predicted modal frequencies and damping ratios.

簇 · 可約的 · Processing（編程語言） · GPU · 泛函 ·

2021 年 6 月 18 日

Providing Meaningful Data Summarizations Using Exemplar-based Clustering in Industry 4.0

Philipp-Jan Honysz,Alexander Schulze-Struchtrup,Sebastian Buschj?ger,Katharina Morik

from arxiv, arXiv admin note: substantial text overlap with arXiv:2101.08763

Data summarizations are a valuable tool to derive knowledge from large data streams and have proven their usefulness in a great number of applications. Summaries can be found by optimizing submodular functions. These functions map subsets of data to real values, which indicate their "representativeness" and which should be maximized to find a diverse summary of the underlying data. In this paper, we studied Exemplar-based clustering as a submodular function and provide a GPU algorithm to cope with its high computational complexity. We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision computation compared to conventional CPU algorithms. We also show, that the GPU algorithm not only provides remarkable runtime benefits with workstation-grade GPUs but also with low-power embedded computation units for which speedups of up to 35x are possible. Furthermore, we apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts. Beyond pure speedup considerations, we show, that our approach can provide summaries within reasonable time frames for this kind of industrial, real-world data.

成對型 · 閾值 · Extensibility · 線性相關 · CASE ·

2021 年 6 月 18 日

Inconsistency thresholds for incomplete pairwise comparison matrices

Kolos Csaba ágoston,László Csató

from arxiv, 13 pages, 2 figures, 4 tables

Pairwise comparison matrices are increasingly used in settings where some pairs are missing. However, there exist few inconsistency indices for similar incomplete data sets and no reasonable measure has an associated threshold. This paper generalises the famous rule of thumb for the acceptable level of inconsistency, proposed by Saaty, to incomplete pairwise comparison matrices. The extension is based on choosing the missing elements such that the maximal eigenvalue of the incomplete matrix is minimised. Consequently, the well-established values of the random index cannot be adopted: the inconsistency of random matrices is found to be the function of matrix size and the number of missing elements, with a nearly linear dependence in the case of the latter variable. Our results can be directly built into decision-making software and used by practitioners as a statistical criterion for accepting or rejecting an incomplete pairwise comparison matrix.

Networking · Neural Networks · 優化器 · contrastive · CASE ·

2018 年 8 月 3 日

A Dual Approach to Scalable Verification of Deep Networks

Krishnamurthy, Dvijotham,Robert Stanforth,Sven Gowal,Timothy Mann,Pushmeet Kohli

This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.

state-of-the-art · 學成 · Next · 分解的 · 優化器 ·

2018 年 8 月 3 日

Causal Embeddings for Recommendation

Stephen Bonner,Flavian Vasile

from arxiv, Accepted as a long paper at the Twelfth ACM Conference on Recommender Systems (RecSys '18), October 2--7, 2018, Vancouver, BC, Canada

Many current applications use recommendations in order to modify the natural user behavior, such as to increase the number of sales or the time spent on a website. This results in a gap between the final recommendation objective and the classical setup where recommendation candidates are evaluated by their coherence with past user behavior, by predicting either the missing entries in the user-item matrix, or the most likely next event. To bridge this gap, we optimize a recommendation policy for the task of increasing the desired outcome versus the organic user behavior. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy. To this end, we propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements.

推斷 · 測試數據 · 基準 · 學習器 · 模型評估 ·

2018 年 3 月 2 日

Baselines and test data for cross-lingual inference

?eljko Agi?,Natalie Schluter

from arxiv, To appear at LREC 2018

The recent years have seen a revival of interest in textual entailment, sparked by i) the emergence of powerful deep neural network learners for natural language processing and ii) the timely development of large-scale evaluation datasets such as SNLI. Recast as natural language inference, the problem now amounts to detecting the relation between pairs of statements: they either contradict or entail one another, or they are mutually neutral. Current research in natural language inference is effectively exclusive to English. In this paper, we propose to advance the research in SNLI-style natural language inference toward multilingual evaluation. To that end, we provide test data for four major languages: Arabic, French, Spanish, and Russian. We experiment with a set of baselines. Our systems are based on cross-lingual word embeddings and machine translation. While our best system scores an average accuracy of just over 75%, we focus largely on enabling further research in multilingual inference.