日本人体黄色三级视频_国产在线看不卡一区二区_最新中文字幕免费视频了_男人J桶女人P免费视频国产_亚洲一区二区三区手机版_全国亚洲最大的AV网站久久久_久久中文字幕人妻熟AV夜夜嗨

Vast amounts of (open) data are increasingly used to make arguments about crisis topics such as climate change and global pandemics. Data visualizations are central to bringing these viewpoints to broader publics. However, visualizations often conceal the many contexts involved in their production, ranging from decisions made in research labs about collecting and sharing data to choices made in editorial rooms about which data stories to tell. In this paper, we examine how data visualizations about climate change and COVID-19 are produced in popular science magazines, using Scientific American, an established English-language popular science magazine, as a case study. To do this, we apply the analytical concept of data journeys (Leonelli, 2020) in a mixed methods study that centers on interviews with Scientific American staff and is supplemented by a visualization analysis of selected charts. In particular, we discuss the affordances of working with open data, the role of collaborative data practices, and how the magazine works to counter misinformation and increase transparency. This work provides an empirical contribution by providing insight into the data (visualization) practices of science communicators and demonstrating how the concept of data journeys can be used as an analytical framework.

相關內容

數據可(ke)視(shi)化

關注 1085

數(shu)據可視(shi)化是關于數(shu)據之視(shi)覺表現形式的研究。

Integration · 優化器 · 設計 · Tensor · 估計/估計量 ·

2024 年 5 月 9 日

Quasi-Monte Carlo for Bayesian design of experiment problems governed by parametric PDEs

Vesa Kaarnioja,Claudia Schillings

from arxiv, 43 pages, 3 figures

This paper contributes to the study of optimal experimental design for Bayesian inverse problems governed by partial differential equations (PDEs). We derive estimates for the parametric regularity of multivariate double integration problems over high-dimensional parameter and data domains arising in Bayesian optimal design problems. We provide a detailed analysis for these double integration problems using two approaches: a full tensor product and a sparse tensor product combination of quasi-Monte Carlo (QMC) cubature rules over the parameter and data domains. Specifically, we show that the latter approach significantly improves the convergence rate, exhibiting performance comparable to that of QMC integration of a single high-dimensional integral. Furthermore, we numerically verify the predicted convergence rates for an elliptic PDE problem with an unknown diffusion coefficient in two spatial dimensions, offering empirical evidence supporting the theoretical results and highlighting practical applicability.

特化 · 控制器 · FAST · 優化器 · Learning ·

2024 年 5 月 9 日

Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes

Ruihao Gong,Yang Yong,Zining Wang,Jinyang Guo,Xiuying Wei,Yuqing Ma,Xianglong Liu

Neural network sparsity has attracted many research interests due to its similarity to biological schemes and high energy efficiency. However, existing methods depend on long-time training or fine-tuning, which prevents large-scale applications. Recently, some works focusing on post-training sparsity (PTS) have emerged. They get rid of the high training cost but usually suffer from distinct accuracy degradation due to neglect of the reasonable sparsity rate at each layer. Previous methods for finding sparsity rates mainly focus on the training-aware scenario, which usually fails to converge stably under the PTS setting with limited data and much less training cost. In this paper, we propose a fast and controllable post-training sparsity (FCPTS) framework. By incorporating a differentiable bridge function and a controllable optimization objective, our method allows for rapid and accurate sparsity allocation learning in minutes, with the added assurance of convergence to a predetermined global sparsity rate. Equipped with these techniques, we can surpass the state-of-the-art methods by a large margin, e.g., over 30\% improvement for ResNet-50 on ImageNet under the sparsity rate of 80\%. Our plug-and-play code and supplementary materials are open-sourced at //github.com/ModelTC/FCPTS.

泛函 · Analysis · 高斯混合（模型） · 勢函數 · 統計量 ·

2024 年 5 月 8 日

A quantitative and typological study of Early Slavic participle clauses and their competition

Nilo Pedrazzini

from arxiv, 259 pages, 138 figures. DPhil Thesis in Linguistics submitted and defended at the University of Oxford (December 2023). This manuscript is a version formatted for improved readability and broader dissemination

This thesis is a corpus-based, quantitative, and typological analysis of the functions of Early Slavic participle constructions and their finite competitors ($jegda$-'when'-clauses). The first part leverages detailed linguistic annotation on Early Slavic corpora at the morphosyntactic, dependency, information-structural, and lexical levels to obtain indirect evidence for different potential functions of participle clauses and their main finite competitor and understand the roles of compositionality and default discourse reasoning as explanations for the distribution of participle constructions and $jegda$-clauses in the corpus. The second part uses massively parallel data to analyze typological variation in how languages express the semantic space of English $when$, whose scope encompasses that of Early Slavic participle constructions and $jegda$-clauses. Probabilistic semantic maps are generated and statistical methods (including Kriging, Gaussian Mixture Modelling, precision and recall analysis) are used to induce cross-linguistically salient dimensions from the parallel corpus and to study conceptual variation within the semantic space of the hypothetical concept WHEN.

MoDELS · GPS · 估計/估計量 · Continuity · Integration ·

2024 年 5 月 8 日

A joint model for DHS and MICS surveys: Spatial modeling with anonymized locations

John Paige,Geir-Arne Fuglstad,Andrea Riebler

from arxiv, main manuscript: 31 pages, 6 figures, 2 tables; supplemental materials: 10 pages, 4 figures, 7 tables

Anonymizing the GPS locations of observations can bias a spatial model's parameter estimates and attenuate spatial predictions when improperly accounted for, and is relevant in applications from public health to paleoseismology. In this work, we demonstrate that a newly introduced method for geostatistical modeling in the presence of anonymized point locations can be extended to account for more general kinds of positional uncertainty due to location anonymization, including both jittering (a form of random perturbations of GPS coordinates) and geomasking (reporting only the name of the area containing the true GPS coordinates). We further provide a numerical integration scheme that flexibly accounts for the positional uncertainty as well as spatial and covariate information. We apply the method to women's secondary education completion data in the 2018 Nigeria demographic and health survey (NDHS) containing jittered point locations, and the 2016 Nigeria multiple indicator cluster survey (NMICS) containing geomasked locations. We show that accounting for the positional uncertainty in the surveys can improve predictions in terms of their continuous rank probability score.

Networking · INFORMS · 可辨認的 · INTERACT · 塑造 ·

2024 年 5 月 8 日

Verified authors shape X/Twitter discursive communities

Stefano Guarino,Ayoub Mounim,Guido Caldarelli,Fabio Saracco

Community detection algorithms try to extract a mesoscale structure from the available network data, generally avoiding any explicit assumption regarding the quantity and quality of information conveyed by specific sets of edges. In this paper, we show that the core of ideological/discursive communities on X/Twitter can be effectively identified by uncovering the most informative interactions in an authors-audience bipartite network through a maximum-entropy null model. The analysis is performed considering three X/Twitter datasets related to the main political events of 2022 in Italy, using as benchmarks four state-of-the-art algorithms - three descriptive, one inferential -, and manually annotating nearly 300 verified users based on their political affiliation. In terms of information content, the communities obtained with the entropy-based algorithm are comparable to those obtained with some of the benchmarks. However, such a methodology on the authors-audience bipartite network: uses just a small sample of the available data to identify the central users of each community; returns a neater partition of the user set in just a few, easy to interpret, communities; clusters well-known political figures in a way that better matches the political alliances when compared with the benchmarks. Our results provide an important insight into online debates, highlighting that online interaction networks are mostly shaped by the activity of a small set of users who enjoy public visibility even outside social media.

近似 · GROUP · 估計/估計量 · 行 · 規范化的 ·

2024 年 5 月 7 日

Quantum complexity of the Kronecker coefficients

Sergey Bravyi,Anirban Chowdhury,David Gosset,Vojtech Havlicek,Guanyu Zhu

from arxiv, Added Journal Reference

Whether or not the Kronecker coefficients of the symmetric group count some set of combinatorial objects is a longstanding open question. In this work we show that a given Kronecker coefficient is proportional to the rank of a projector that can be measured efficiently using a quantum computer. In other words a Kronecker coefficient counts the dimension of the vector space spanned by the accepting witnesses of a QMA verifier, where QMA is the quantum analogue of NP. This implies that approximating the Kronecker coefficients to within a given relative error is not harder than a certain natural class of quantum approximate counting problems that captures the complexity of estimating thermal properties of quantum many-body systems. A second consequence is that deciding positivity of Kronecker coefficients is contained in QMA, complementing a recent NP-hardness result of Ikenmeyer, Mulmuley and Walter. We obtain similar results for the related problem of approximating row sums of the character table of the symmetric group. Finally, we discuss an efficient quantum algorithm that approximates normalized Kronecker coefficients to inverse-polynomial additive error.

Performer · 穩健性 · MoDELS · 全 · 錯誤率 ·

2024 年 5 月 7 日

NeuroIDBench: An Open-Source Benchmark Framework for the Standardization of Methodology in Brainwave-based Authentication Research

Avinash Kumar Chaurasia,Matin Fallahi,Thorsten Strufe,Philipp Terh?rst,Patricia Arias Cabarcos

from arxiv, 21 pages, 5 Figures, 3 tables, Submitted to the Journal of Information Security and Applications

Biometric systems based on brain activity have been proposed as an alternative to passwords or to complement current authentication techniques. By leveraging the unique brainwave patterns of individuals, these systems offer the possibility of creating authentication solutions that are resistant to theft, hands-free, accessible, and potentially even revocable. However, despite the growing stream of research in this area, faster advance is hindered by reproducibility problems. Issues such as the lack of standard reporting schemes for performance results and system configuration, or the absence of common evaluation benchmarks, make comparability and proper assessment of different biometric solutions challenging. Further, barriers are erected to future work when, as so often, source code is not published open access. To bridge this gap, we introduce NeuroIDBench, a flexible open source tool to benchmark brainwave-based authentication models. It incorporates nine diverse datasets, implements a comprehensive set of pre-processing parameters and machine learning algorithms, enables testing under two common adversary models (known vs unknown attacker), and allows researchers to generate full performance reports and visualizations. We use NeuroIDBench to investigate the shallow classifiers and deep learning-based approaches proposed in the literature, and to test robustness across multiple sessions. We observe a 37.6% reduction in Equal Error Rate (EER) for unknown attacker scenarios (typically not tested in the literature), and we highlight the importance of session variability to brainwave authentication. All in all, our results demonstrate the viability and relevance of NeuroIDBench in streamlining fair comparisons of algorithms, thereby furthering the advancement of brainwave-based authentication through robust methodological practices.

Machine Learning · Learning · MoDELS · Performer · 機器學習建模 ·

2024 年 5 月 7 日

Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs

Martin Marzidov?ek,Janja Francé,Vid Podpe?an,Stanka Vadnjal,Jo?ica Dolenc,Patricija Mozeti?

In this study, explainable machine learning techniques are applied to predict the toxicity of mussels in the Gulf of Trieste (Adriatic Sea) caused by harmful algal blooms. By analysing a newly created 28-year dataset containing records of toxic phytoplankton in mussel farming areas and toxin concentrations in mussels (Mytilus galloprovincialis), we train and evaluate the performance of ML models to accurately predict diarrhetic shellfish poisoning (DSP) events. The random forest model provided the best prediction of positive toxicity results based on the F1 score. Explainability methods such as permutation importance and SHAP identified key species (Dinophysis fortii and D. caudata) and environmental factors (salinity, river discharge and precipitation) as the best predictors of DSP outbreaks. These findings are important for improving early warning systems and supporting sustainable aquaculture practices.

Performer · 偽標記 · 泛化理論 · 高斯混合（模型） · 線性的 ·

2024 年 5 月 7 日

The Role of Pseudo-labels in Self-training Linear Classifiers on High-dimensional Gaussian Mixture Data

Takashi Takahashi

from arxiv, 65 pages, 13 figures

Self-training (ST) is a simple yet effective semi-supervised learning method. However, why and how ST improves generalization performance by using potentially erroneous pseudo-labels is still not well understood. To deepen the understanding of ST, we derive and analyze a sharp characterization of the behavior of iterative ST when training a linear classifier by minimizing the ridge-regularized convex loss on binary Gaussian mixtures, in the asymptotic limit where input dimension and data size diverge proportionally. The results show that ST improves generalization in different ways depending on the number of iterations. When the number of iterations is small, ST improves generalization performance by fitting the model to relatively reliable pseudo-labels and updating the model parameters by a large amount at each iteration. This suggests that ST works intuitively. On the other hand, with many iterations, ST can gradually improve the direction of the classification plane by updating the model parameters incrementally, using soft labels and small regularization. It is argued that this is because the small update of ST can extract information from the data in an almost noiseless way. However, in the presence of label imbalance, the generalization performance of ST underperforms supervised learning with true labels. To overcome this, two heuristics are proposed to enable ST to achieve nearly compatible performance with supervised learning even with significant label imbalance.

Continuity · state-of-the-art · 學成 · Extensibility · Networking ·

2021 年 4 月 16 日

A continual learning survey: Defying forgetting in classification tasks

Matthias De Lange,Rahaf Aljundi,Marc Masana,Sarah Parisot,Xu Jia,Ales Leonardis,Gregory Slabaugh,Tinne Tuytelaars

from arxiv, Accepted TPAMI paper, including Appendix, code publicly available

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.