亚洲AV永久无码精品九之_国产无遮挡又黄又爽不要VIP软_亚洲不卡一二三在线观看_国产盗摄视频精品一区二区_亚洲国产午夜精品理论片不卡_中文日本欧美一级视频在线观看_日本熟妇在线观看

Unlike most bibliometric studies focusing on publications, taking Big Data research as a case study, we introduce a novel bibliometric approach to unfold the status of a given scientific community from an individual level perspective. We study the academic age, production, and research focus of the community of authors active in Big Data research. Artificial Intelligence (AI) is selected as a reference area for comparative purposes. Results show that the academic realm of "Big Data" is a growing topic with an expanding community of authors, particularly of new authors every year. Compared to AI, Big Data attracts authors with a longer academic age, who can be regarded to have accumulated some publishing experience before entering the community. Despite the highly skewed distribution of productivity amongst researchers in both communities, Big Data authors have higher values of both research focus and production than those of AI. Considering the community size, overall academic age, and persistence of publishing on the topic, our results support the idea of Big Data as a research topic with attractiveness for researchers. We argue that the community-focused indicators proposed in this study could be generalized to investigate the development and dynamics of other research fields and topics.

相關內容

CASE

關注 1

binary · Performer · 泛函 · 軟件工程 ·

2021 年 8 月 6 日

Verifying Time Complexity of Binary Search using Dafny

Shiri Morshtein,Ran Ettinger,Shmuel Tyszberowicz

from arxiv, In Proceedings F-IDE 2021, arXiv:2108.02369

Formal software verification techniques are widely used to specify and prove the functional correctness of programs. However, nonfunctional properties such as time complexity are usually carried out with pen and paper. Inefficient code in terms of time complexity may cause massive performance problems in large-scale complex systems. We present a proof of concept for using the Dafny verification tool to specify and verify the worst-case time complexity of binary search. This approach can also be used for academic purposes as a new way to teach algorithms and complexity.

COVID-19 · CASES · 情景 · Hive · Processing（編程語言） ·

2021 年 8 月 6 日

Scalable Analysis for Covid-19 and Vaccine Data

Chris Collins,Roxana Cuevas,Edward Hernandez,Reece Hernandez,Breanna Le,Jongwook Woo

This paper explains the scalable methods used for extracting and analyzing the Covid-19 vaccine data. Using Big Data such as Hadoop and Hive, we collect and analyze the massive data set of the confirmed, the fatality, and the vaccination data set of Covid-19. The data size is about 3.2 Giga-Byte. We show that it is possible to store and process massive data with Big Data. The paper proceeds tempo-spatial analysis, and visual maps, charts, and pie charts visualize the result of the investigation. We illustrate that the more vaccinated, the fewer the confirmed cases.

COVID-19 · MoDELS · CASE · 統計量 · Processing（編程語言） ·

2021 年 8 月 4 日

COVID-19 Modeling: A Review

Longbing Cao,Qing Liu

from arxiv, 73 pages, 3 figures, 9 tables

The SARS-CoV-2 virus and COVID-19 disease have posed unprecedented and overwhelming demand, challenges and opportunities to domain, model and data driven modeling. This paper provides a comprehensive review of the challenges, tasks, methods, progress, gaps and opportunities in relation to modeling COVID-19 problems, data and objectives. It constructs a research landscape of COVID-19 modeling tasks and methods, and further categorizes, summarizes, compares and discusses the related methods and progress of modeling COVID-19 epidemic transmission processes and dynamics, case identification and tracing, infection diagnosis and medical treatments, non-pharmaceutical interventions and their effects, drug and vaccine development, psychological, economic and social influence and impact, and misinformation, etc. The modeling methods involve mathematical and statistical models, domain-driven modeling by epidemiological compartmental models, medical and biomedical analysis, AI and data science in particular shallow and deep machine learning, simulation modeling, social science methods, and hybrid modeling.

perforce · Networking · prototype · Next · Networks ·

2021 年 8 月 4 日

A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research

Frederik Hauser,Marco H?berle,Daniel Merling,Steffen Lindner,Vladimir Gurevich,Florian Zeiger,Reinhard Frank,Michael Menth

Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane application programming interfaces (APIs) which may be leveraged by user-defined software-defined networking (SDN) control. This offers great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community, and it is supported by various software and hardware platforms. In the first part of this paper we give a tutorial of data plane programming models, the P4 programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we categorize a large body of literature of P4-based applied research into different research domains, summarize the contributions of these papers, and extract prototypes, target platforms, and source code availability. For each research domain, we analyze how the reviewed works benefit from P4's core features. Finally, we discuss potential next steps based on our findings.

收縮 · 可辨認的 · 統計量 · 分離的 · 標注 ·

2021 年 8 月 4 日

Practices of public procurement and the risk of corrupt behavior before and after the government transition in México

Andrea Falcón-Cortés,Andrés Aldana,Hernán Larralde

from arxiv, 18 pages of main text and 14 of supplemental material. This manuscript has been submitted for revision to Plos One journal

Corruption has a huge impact on economic growth, democracy, inequality, and its consequences at the human level are incalculable. However, a government turnover may be expected to generate significant changes in the way public contracting is done, and thus, in the levels and types of corruption involved in public procurement. In this respect, M\'exico lived a historical government transition in 2018. In this work, we analyze data from more than 1.5 million contracts corresponding from 2013 to 2020, to study to what extent this change of government affected the characteristics of public contracting, and we try to determine whether these changes affect how corruption takes place. To do this, we propose a statistical framework to compare the characteristics of the contracting practices within each administration, separating the contracts in different classes depending on whether or not they were made with companies that have now been identified as being involved in corrupt practices. We found that, even when the total number of contracts and the amount of resources spent in contracts with corrupt companies decreased after the government transition, many of the patterns followed to contract suppliers labeled as corrupt were maintained, and those in which changes did occur, are suggestive of a larger risk of corruption.

Performer · 聯邦學習 · Extensibility · 學成 · 分解的 ·

2021 年 2 月 21 日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Chengxu Yang,QiPeng Wang,Mengwei Xu,Zhenpeng Chen,Kaigui Bian,Yunxin Liu,Xuanzhe Liu

Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.

學成 · 深度學習 · MoDELS · Better · CASES ·

2020 年 3 月 26 日

A Survey of Deep Learning for Scientific Discovery

Maithra Raghu,Eric Schmidt

Over the past few years, we have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. At the same time, the amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity. Taken together, this suggests many exciting opportunities for deep learning applications in scientific settings. But a significant challenge to this is simply knowing where to start. The sheer breadth and diversity of different deep learning techniques makes it difficult to determine what scientific problems might be most amenable to these methods, or which specific combination of methods might offer the most promising first approach. In this survey, we focus on addressing this central issue, providing an overview of many widely used deep learning models, spanning visual, sequential and graph structured data, associated tasks and different training methods, along with techniques to use deep learning with less data and better interpret these complex models --- two central considerations for many scientific use cases. We also include overviews of the full design process, implementation tips, and links to a plethora of tutorials, research summaries and open-sourced deep learning pipelines and pretrained models, developed by the community. We hope that this survey will help accelerate the use of deep learning across different scientific domains.

優化器 · Extensibility · 最優化 · Automator · Neural Networks ·

2020 年 3 月 12 日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Tong Yu,Hong Zhu

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.

MoDELS · SimPLe · CC · 模型評估 · 高斯混合（模型） ·

2018 年 2 月 24 日

The Search Problem in Mixture Models

Avik Ray,Joe Neeman,Sujay Sanghavi,Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.

情感分析 · INFORMS · Processing（編程語言） · MINE · Computational Linguistics ·

2018 年 1 月 23 日

SentiPers: A Sentiment Analysis Corpus for Persian

Pedram Hosseini,Ali Ahmadian Ramaki,Hassan Maleki,Mansoureh Anvari,Seyed Abolghasem Mirroshandel

Sentiment Analysis (SA) is a major field of study in natural language processing, computational linguistics and information retrieval. Interest in SA has been constantly growing in both academia and industry over the recent years. Moreover, there is an increasing need for generating appropriate resources and datasets in particular for low resource languages including Persian. These datasets play an important role in designing and developing appropriate opinion mining platforms using supervised, semi-supervised or unsupervised methods. In this paper, we outline the entire process of developing a manually annotated sentiment corpus, SentiPers, which covers formal and informal written contemporary Persian. To the best of our knowledge, SentiPers is a unique sentiment corpus with such a rich annotation in three different levels including document-level, sentence-level, and entity/aspect-level for Persian. The corpus contains more than 26000 sentences of users opinions from digital product domain and benefits from special characteristics such as quantifying the positiveness or negativity of an opinion through assigning a number within a specific range to any given sentence. Furthermore, we present statistics on various components of our corpus as well as studying the inter-annotator agreement among the annotators. Finally, some of the challenges that we faced during the annotation process will be discussed as well.