曰本中文字幕一区二区三区高清,欧美亚洲视频一区二区三区老牛

Sports betting's recent federal legalisation in the USA coincides with the golden age of machine learning. If bettors can leverage data to reliably predict the probability of an outcome, they can recognise when the bookmaker's odds are in their favour. As sports betting is a multi-billion dollar industry in the USA alone, identifying such opportunities could be extremely lucrative. Many researchers have applied machine learning to the sports outcome prediction problem, generally using accuracy to evaluate the performance of predictive models. We hypothesise that for the sports betting problem, model calibration is more important than accuracy. To test this hypothesis, we train models on NBA data over several seasons and run betting experiments on a single season, using published odds. We show that optimising the predictive model for calibration leads to greater returns than optimising for accuracy, on average (return on investment of $+34.69\%$ versus $-35.17\%$) and in the best case ($+36.93\%$ versus $+5.56\%$). These findings suggest that for sports betting (or any probabilistic decision-making problem), calibration is a more important metric than accuracy. Sports bettors who wish to increase profits should therefore optimise their predictive model for calibration.

相關內容

模型評估

關注 1730

機器學習系統設計系統評估標準

奇異的 · Learning · MoDELS · DNN · Neural Networks ·

2023 年 8 月 23 日

Quantifying degeneracy in singular models via the learning coefficient

Edmund Lau,Daniel Murfet,Susan Wei

from arxiv, 22 pages, 10 figures

Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies. In this work, we illustrate how a quantity known as the \emph{learning coefficient} introduced in singular learning theory quantifies precisely the degree of degeneracy in deep neural networks. Importantly, we will demonstrate that degeneracy in DNN cannot be accounted for by simply counting the number of "flat" directions. We propose a computationally scalable approximation of a localized version of the learning coefficient using stochastic gradient Langevin dynamics. To validate our approach, we demonstrate its accuracy in low-dimensional models with known theoretical values. Importantly, the local learning coefficient can correctly recover the ordering of degeneracy between various parameter regions of interest. An experiment on MNIST shows the local learning coefficient can reveal the inductive bias of stochastic opitmizers for more or less degenerate critical points.

Attention · altmetrics · 相互獨立的 · 相關系數 · 成比例 ·

2023 年 8 月 22 日

Does society show differential attention to researchers based on gender and field?

Sara M. González-Betancor,Pablo Dorta-González

from arxiv, 23 pages, 5 figures, 7 tables

While not all researchers prioritize social impact, it is undeniably a crucial aspect that adds significance to their work. The objective of this paper is to explore potential gender differences in the social attention paid to researchers and to examine their association with specific fields of study. To achieve this goal, the paper analyzes four dimensions of social influence and examines three measures of social attention to researchers. The dimensions are media influence (mentions in mainstream news), political influence (mentions in public policy reports), social media influence (mentions in Twitter), and educational influence (mentions in Wikipedia). The measures of social attention to researchers are: proportion of publications with social mentions (social attention orientation), mentions per publication (level of social attention), and mentions per mentioned publication (intensity of social attention). By analyzing the rankings of authors -- for the four dimensions with the three measures in the 22 research fields of the Web of Science database -- and by using Spearman correlation coefficients, we conclude that: 1) significant differences are observed between fields; 2) the dimensions capture different and independent aspects of the social impact. Finally, we use non-parametric means comparison tests to detect gender bias in social attention. We conclude that for most fields and dimensions with enough non-zero altmetrics data, gender differences in social attention are not predominant, but are still present and vary across fields.

CASES · Performer · 估計/估計量 · Learning · 主動學習 ·

2023 年 8 月 21 日

Test-time augmentation-based active learning and self-training for label-efficient segmentation

Bella Specktor-Fadida,Anna Levchakov,Dana Schonberger,Liat Ben-Sira,Dafna Ben-Bashat,Leo Joskowicz

from arxiv, Accepted to MICCAI MILLanD workshop 2023

Deep learning techniques depend on large datasets whose annotation is time-consuming. To reduce annotation burden, the self-training (ST) and active-learning (AL) methods have been developed as well as methods that combine them in an iterative fashion. However, it remains unclear when each method is the most useful, and when it is advantageous to combine them. In this paper, we propose a new method that combines ST with AL using Test-Time Augmentations (TTA). First, TTA is performed on an initial teacher network. Then, cases for annotation are selected based on the lowest estimated Dice score. Cases with high estimated scores are used as soft pseudo-labels for ST. The selected annotated cases are trained with existing annotated cases and ST cases with border slices annotations. We demonstrate the method on MRI fetal body and placenta segmentation tasks with different data variability characteristics. Our results indicate that ST is highly effective for both tasks, boosting performance for in-distribution (ID) and out-of-distribution (OOD) data. However, while self-training improved the performance of single-sequence fetal body segmentation when combined with AL, it slightly deteriorated performance of multi-sequence placenta segmentation on ID data. AL was helpful for the high variability placenta data, but did not improve upon random selection for the single-sequence body data. For fetal body segmentation sequence transfer, combining AL with ST following ST iteration yielded a Dice of 0.961 with only 6 original scans and 2 new sequence scans. Results using only 15 high-variability placenta cases were similar to those using 50 cases. Code is available at: //github.com/Bella31/TTA-quality-estimation-ST-AL

可辨認的 · Taxonomy · motivation · AI · 人工智能 ·

2023 年 8 月 21 日

Hybrid classical-quantum computing: are we forgetting the classical part in the binomial?

Esther Villar-Rodriguez,Aitor Gomez-Tejedor,Eneko Osaba

from arxiv, 2 pages, 1 figure, paper accepted for being presented in the upcoming IEEE International Conference on Quantum Computing and Engineering - IEEE QCE 2023

The expectations arising from the latest achievements in the quantum computing field are causing that researchers coming from classical artificial intelligence to be fascinated by this new paradigm. In turn, quantum computing, on the road towards usability, needs classical procedures. Hybridization is, in these circumstances, an indispensable step but can also be seen as a promising new avenue to get the most from both computational worlds. Nonetheless, hybrid approaches have now and will have in the future many challenges to face, which, if ignored, will threaten the viability or attractiveness of quantum computing for real-world applications. To identify them and pose pertinent questions, a proper characterization of the hybrid quantum computing field, and especially hybrid solvers, is compulsory. With this motivation in mind, the main purpose of this work is to propose a preliminary taxonomy for classifying hybrid schemes, and bring to the fore some questions to stir up researchers minds about the real challenges regarding the application of quantum computing.

Performer · GPT-4 · 數據集 · 講稿 · 語言模型化 ·

2023 年 8 月 18 日

How susceptible are LLMs to Logical Fallacies?

Amirreza Payandeh,Dan Pluth,Jordan Hosier,Xuesu Xiao,Vijay K. Gurbani

This paper investigates the rational thinking capability of Large Language Models (LLMs) in multi-round argumentative debates by exploring the impact of fallacious arguments on their logical reasoning performance. More specifically, we present Logic Competence Measurement Benchmark (LOGICOM), a diagnostic benchmark to assess the robustness of LLMs against logical fallacies. LOGICOM involves two agents: a persuader and a debater engaging in a multi-round debate on a controversial topic, where the persuader tries to convince the debater of the correctness of its claim. First, LOGICOM assesses the potential of LLMs to change their opinions through reasoning. Then, it evaluates the debater's performance in logical reasoning by contrasting the scenario where the persuader employs logical fallacies against one where logical reasoning is used. We use this benchmark to evaluate the performance of GPT-3.5 and GPT-4 using a dataset containing controversial topics, claims, and reasons supporting them. Our findings indicate that both GPT-3.5 and GPT-4 can adjust their opinion through reasoning. However, when presented with logical fallacies, GPT-3.5 and GPT-4 are erroneously convinced 41% and 69% more often, respectively, compared to when logical reasoning is used. Finally, we introduce a new dataset containing over 5k pairs of logical vs. fallacious arguments. The source code and dataset of this work are made publicly available.

DBpedia · RDF · 知識 (knowledge) · Continuity · ONCE ·

2023 年 8 月 18 日

Anna Formica,Francesco Taglino

from arxiv, 37 pages, 16 figures

Evaluating semantic relatedness of Web resources is still an open challenge. This paper focuses on knowledge-based methods, which represent an alternative to corpus-based approaches, and rely in general on the availability of knowledge graphs. In particular, we have selected 10 methods from the existing literature, that have been organized according to it adjacent resources, triple patterns, and triple weights-based methods. They have been implemented and evaluated by using DBpedia as reference RDF knowledge graph. Since DBpedia is continuously evolving, the experimental results provided by these methods in the literature are not comparable. For this reason, in this work, such methods have been experimented by running them all at once on the same DBpedia release and against 14 well-known golden datasets. On the basis of the correlation values with human judgment obtained according to the experimental results, weighting the RDF triples in combination with evaluating all the directed paths linking the compared resources is the best strategy in order to compute semantic relatedness in DBpedia.

變換 · 層 · 線性的 · 操作 · Performer ·

2023 年 8 月 18 日

How important are specialized transforms in Neural Operators?

Ritam Majumdar,Shirish Karande,Lovekesh Vig

from arxiv, 8 pages, 3 figures, 4 tables

Simulating physical systems using Partial Differential Equations (PDEs) has become an indispensible part of modern industrial process optimization. Traditionally, numerical solvers have been used to solve the associated PDEs, however recently Transform-based Neural Operators such as the Fourier Neural Operator and Wavelet Neural Operator have received a lot of attention for their potential to provide fast solutions for systems of PDEs. In this work, we investigate the importance of the transform layers to the reported success of transform based neural operators. In particular, we record the cost in terms of performance, if all the transform layers are replaced by learnable linear layers. Surprisingly, we observe that linear layers suffice to provide performance comparable to the best-known transform-based layers and seem to do so with a compute time advantage as well. We believe that this observation can have significant implications for future work on Neural Operators, and might point to other sources of efficiencies for these architectures.

泛函 · Learning · 均值 · MoDELS · 視覺問答 ·

2023 年 8 月 16 日

Learning the meanings of function words from grounded language using a visual question answering model

Eva Portelance,Michael C. Frank,Dan Jurafsky

Interpreting a seemingly-simple function word like "or", "behind", or "more" can require logical, numerical, and relational reasoning. How are such words learned by children? Prior acquisition theories have often relied on positing a foundation of innate knowledge. Yet recent neural-network based visual question answering models apparently can learn to use function words as part of answering questions about complex visual scenes. In this paper, we study what these models learn about function words, in the hope of better understanding how the meanings of these words can be learnt by both models and children. We show that recurrent models trained on visually grounded language learn gradient semantics for function words requiring spacial and numerical reasoning. Furthermore, we find that these models can learn the meanings of logical connectives "and" and "or" without any prior knowledge of logical reasoning, as well as early evidence that they can develop the ability to reason about alternative expressions when interpreting language. Finally, we show that word learning difficulty is dependent on frequency in models' input. Our findings offer evidence that it is possible to learn the meanings of function words in visually grounded context by using non-symbolic general statistical learning algorithms, without any prior knowledge of linguistic meaning.

知識 (knowledge) · INFORMS · 語言表示 · MoDELS · Extensibility ·

2022 年 7 月 28 日

MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base

Hui Li,Xuekang Yang,Xin Zhao,Lin Yu,Jiping Zheng,Wei Sun

from arxiv, 11 pages, 6 figures

Incorporating prior knowledge into pre-trained language models has proven to be effective for knowledge-driven NLP tasks, such as entity typing and relation extraction. Current pre-training procedures usually inject external knowledge into models by using knowledge masking, knowledge fusion and knowledge replacement. However, factual information contained in the input sentences have not been fully mined, and the external knowledge for injecting have not been strictly checked. As a result, the context information cannot be fully exploited and extra noise will be introduced or the amount of knowledge injected is limited. To address these issues, we propose MLRIP, which modifies the knowledge masking strategies proposed by ERNIE-Baidu, and introduce a two-stage entity replacement strategy. Extensive experiments with comprehensive analyses illustrate the superiority of MLRIP over BERT-based models in military knowledge-driven NLP tasks.

過擬合 · SimPLe · Principle · 模型評估 · 統計量 ·

2021 年 3 月 16 日

Deep learning: a statistical viewpoint

Peter L. Bartlett,Andrea Montanari,Alexander Rakhlin

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.