国产乱伦对白刺激视频_亚日韩中文无码视频_亚洲国产一区三区四区二区_高清精品一区二区三区_亚洲一区二区三区四区中文_十八禁黄色视频在线观看_国产丝袜大乱交在线播放

Ashwin Singh,Mallika Subramanian,Anmol Agarwal,Pratyush Priyadarshi,Shrey Gupta,Kiran Garimella,Sanjeev Kumar,Ritesh Kumar,Lokesh Garg,Erica Arya,Ponnurangam Kumaraguru

The past decade has witnessed a rapid increase in technology ownership across rural areas of India, signifying the potential for ICT initiatives to empower rural households. In our work, we focus on the web infrastructure of one such ICT - Digital Green that started in 2008. Following a participatory approach for content production, Digital Green disseminates instructional agricultural videos to smallholder farmers via human mediators to improve the adoption of farming practices. Their web-based data tracker, CoCo, captures data related to these processes, storing the attendance and adoption logs of over 2.3 million farmers across three continents and twelve countries. Using this data, we model the components of the Digital Green ecosystem involving the past attendance-adoption behaviours of farmers, the content of the videos screened to them and their demographic features across five states in India. We use statistical tests to identify different factors which distinguish farmers with higher adoption rates to understand why they adopt more than others. Our research finds that farmers with higher adoption rates adopt videos of shorter duration and belong to smaller villages. The co-attendance and co-adoption networks of farmers indicate that they greatly benefit from past adopters of a video from their village and group when it comes to adopting practices from the same video. Following our analysis, we model the adoption of practices from a video as a prediction problem to identify and assist farmers who might face challenges in adoption in each of the five states. We experiment with different model architectures and achieve macro-f1 scores ranging from 79% to 89% using a Random Forest classifier. Finally, we measure the importance of different features using SHAP values and provide implications for improving the adoption rates of nearly a million farmers across five states in India.

相關內容

可辨認的

關注 4

維基百科 · Notability · 近似 · Better · 可理解性 ·

2022 年 1 月 3 日

Wikipedia Reader Navigation: When Synthetic Data Is Enough

Akhil Arora,Martin Gerlach,Tiziano Piccardi,Alberto García-Durán,Robert West

from arxiv, WSDM 2022, 10 pages, 16 figures

Every day millions of people read Wikipedia. When navigating the vast space of available topics using embedded hyperlinks, readers follow different trajectories in terms of the sequence of articles. Understanding these navigation patterns is crucial to better serve readers' needs and address structural biases and knowledge gaps. However, systematic studies of navigation in Wikipedia are limited because of a lack of publicly available data due to the commitment to protect readers' privacy by not storing or sharing potentially sensitive data. In this paper, we address the question: how well navigation of readers can be approximated by using publicly available resources, most notably the Wikipedia clickstream data? We systematically quantify the difference between real and synthetic navigation sequences generated from the clickstream data, through 6 different experiments across 8 Wikipedia language versions. Overall, we find that these differences are statistically significant but the effect sizes are small often well within 10%. We thus provide quantitative evidence for the utility of the Wikipedia clickstream data as a public resource by showing that it can closely capture reader navigation on Wikipedia, and constitute a sufficient approximation for most practical downstream applications relying on data from readers. More generally, our study provides an example for how clickstream-like data can empower broader research on navigation in other online platforms while protecting users' privacy.

Automator · Better · state-of-the-art · 優化器 · 回合 ·

2022 年 1 月 1 日

Usability and Aesthetics: Better Together for Automated Repair of Web Pages

Thanh Le-Cong,Xuan Bach D. Le,Quyet-Thang Huynh,Phi-Le Nguyen

from arxiv, Accepted to ISSRE 2021, Research Track

With the recent explosive growth of mobile devices such as smartphones or tablets, guaranteeing consistent web appearance across all environments has become a significant problem. This happens simply because it is hard to keep track of the web appearance on different sizes and types of devices that render the web pages. Therefore, fixing the inconsistent appearance of web pages can be difficult, and the cost incurred can be huge, e.g., poor user experience and financial loss due to it. Recently, automated web repair techniques have been proposed to automatically resolve inconsistent web page appearance, focusing on improving usability. However, generated patches tend to disrupt the webpage's layout, rendering the repaired webpage aesthetically unpleasing, e.g., distorted images or misalignment of components. In this paper, we propose an automated repair approach for web pages based on meta-heuristic algorithms that can assure both usability and aesthetics. The key novelty that empowers our approach is a novel fitness function that allows us to optimistically evolve buggy web pages to find the best solution that optimizes both usability and aesthetics at the same time. Empirical evaluations show that our approach is able to successfully resolve mobile-friendly problems in 94% of the evaluation subjects, significantly outperforming state-of-the-art baseline techniques in terms of both usability and aesthetics.

數據增強 · 可穿戴設備 · Networking · 蒸餾 · 縮放 ·

2022 年 1 月 1 日

Role of Data Augmentation Strategies in Knowledge Distillation for Wearable Sensor Data

Eun Som Jeon,Anirudh Som,Ankita Shukla,Kristina Hasanaj,Matthew P. Buman,Pavan Turaga

Deep neural networks are parametrized by several thousands or millions of parameters, and have shown tremendous success in many classification problems. However, the large number of parameters makes it difficult to integrate these models into edge devices such as smartphones and wearable devices. To address this problem, knowledge distillation (KD) has been widely employed, that uses a pre-trained high capacity network to train a much smaller network, suitable for edge devices. In this paper, for the first time, we study the applicability and challenges of using KD for time-series data for wearable devices. Successful application of KD requires specific choices of data augmentation methods during training. However, it is not yet known if there exists a coherent strategy for choosing an augmentation approach during KD. In this paper, we report the results of a detailed study that compares and contrasts various common choices and some hybrid data augmentation strategies in KD based human activity analysis. Research in this area is often limited as there are not many comprehensive databases available in the public domain from wearable devices. Our study considers databases from small scale publicly available to one derived from a large scale interventional study into human activity and sedentary behavior. We find that the choice of data augmentation techniques during KD have a variable level of impact on end performance, and find that the optimal network choice as well as data augmentation strategies are specific to a dataset at hand. However, we also conclude with a general set of recommendations that can provide a strong baseline performance across databases.

INTERACT · MoDELS · Airfoil · 可辨認的 · 估計/估計量 ·

2021 年 12 月 30 日

Bayesian Calibration for Large-Scale Fluid Structure Interaction Problems Under Embedded/Immersed Boundary Framework

Shunxiang Cao,Daniel Zhengyu Huang

from arxiv, 24pages, 14 figures

Bayesian calibration is widely used for inverse analysis and uncertainty analysis for complex systems in the presence of both computer models and observation data. In the present work, we focus on large-scale fluid-structure interaction systems characterized by large structural deformations. Numerical methods to solve these problems, including embedded/immersed boundary methods, are typically not differentiable and lack smoothness. We propose a framework that is built on unscented Kalman filter/inversion to efficiently calibrate and provide uncertainty estimations of such complicated models with noisy observation data. The approach is derivative-free and non-intrusive, and is of particular value for the forward model that is computationally expensive and provided as a black box which is impractical to differentiate. The framework is demonstrated and validated by successfully calibrating the model parameters of a piston problem and identifying the damage field of an airfoil under transonic buffeting.

Networking · 標注 · 數據集 · 統計量 · Processing（編程語言） ·

2021 年 12 月 30 日

Datasets are not Enough: Challenges in Labeling Network Traffic

Jorge Guerra,Carlos Catania,Eduardo Veas

In contrast to previous surveys, the present work is not focused on reviewing the datasets used in the network security field. The fact is that many of the available public labeled datasets represent the network behavior just for a particular time period. Given the rate of change in malicious behavior and the serious challenge to label, and maintain these datasets, they become quickly obsolete. Therefore, this work is focused on the analysis of current labeling methodologies applied to network-based data. In the field of network security, the process of labeling a representative network traffic dataset is particularly challenging and costly since very specialized knowledge is required to classify network traces. Consequently, most of the current traffic labeling methods are based on the automatic generation of synthetic network traces, which hides many of the essential aspects necessary for a correct differentiation between normal and malicious behavior. Alternatively, a few other methods incorporate non-experts users in the labeling process of real traffic with the help of visual and statistical tools. However, after conducting an in-depth analysis, it seems that all current methods for labeling suffer from fundamental drawbacks regarding the quality, volume, and speed of the resulting dataset. This lack of consistent methods for continuously generating a representative dataset with an accurate and validated methodology must be addressed by the network security research community. Moreover, a consistent label methodology is a fundamental condition for helping in the acceptance of novel detection approaches based on statistical and machine learning techniques.

Neural Networks · 學成 · Networking · MoDELS · 深度學習 ·

2021 年 10 月 5 日

Deep Neural Networks and Tabular Data: A Survey

Vadim Borisov,Tobias Leemann,Kathrin Se?ler,Johannes Haug,Martin Pawelczyk,Gjergji Kasneci

Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous data sets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their application to modeling tabular data (inference or generation) remains highly challenging. This work provides an overview of state-of-the-art deep learning methods for tabular data. We start by categorizing them into three groups: data transformations, specialized architectures, and regularization models. We then provide a comprehensive overview of the main approaches in each group. A discussion of deep learning approaches for generating tabular data is complemented by strategies for explaining deep models on tabular data. Our primary contribution is to address the main research streams and existing methodologies in this area, while highlighting relevant challenges and open research questions. To the best of our knowledge, this is the first in-depth look at deep learning approaches for tabular data. This work can serve as a valuable starting point and guide for researchers and practitioners interested in deep learning with tabular data.

entity · 可辨認的 · INTERACT · Performer · state-of-the-art ·

2021 年 1 月 7 日

Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking

Yingjie Gu,Xiaoye Qu,Zhefeng Wang,Baoxing Huai,Nicholas Jing Yuan,Xiaolin Gui

from arxiv, Accepted at AAAI 2021

Entity linking (EL) for the rapidly growing short text (e.g. search queries and news titles) is critical to industrial applications. Most existing approaches relying on adequate context for long text EL are not effective for the concise and sparse short text. In this paper, we propose a novel framework called Multi-turn Multiple-choice Machine reading comprehension (M3}) to solve the short text EL from a new perspective: a query is generated for each ambiguous mention exploiting its surrounding context, and an option selection module is employed to identify the golden entity from candidates using the query. In this way, M3 framework sufficiently interacts limited context with candidate entities during the encoding process, as well as implicitly considers the dissimilarities inside the candidate bunch in the selection stage. In addition, we design a two-stage verifier incorporated into M3 to address the commonly existed unlinkable problem in short text. To further consider the topical coherence and interdependence among referred entities, M3 leverages a multi-turn fashion to deal with mentions in a sequence manner by retrospecting historical cues. Evaluation shows that our M3 framework achieves the state-of-the-art performance on five Chinese and English datasets for the real-world short text EL.

MINE · CC · 數據挖掘 · LD · MoDELS ·

2018 年 10 月 5 日

Semantics of Data Mining Services in Cloud Computing

Manuel Parra-Royon,Ghislain Atemezing,J. M. Benítez

from arxiv, In-depth review. Fixed mistakes

In recent years with the rise of Cloud Computing (CC), many companies providing services in the cloud, are empowered a new series of services to their catalog, such as data mining (DM) and data processing, taking advantage of the vast computing resources available to them. Different service definition proposals have been proposed to address the problem of describing services in CC in a comprehensive way. Bearing in mind that each provider has its own definition of the logic of its services, and specifically of DM services, it should be pointed out that the possibility of describing services in a flexible way between providers is fundamental in order to maintain the usability and portability of this type of CC services. The use of semantic technologies based on the proposal offered by Linked Data (LD) for the definition of services, allows the design and modelling of DM services, achieving a high degree of interoperability. In this article a schema for the definition of DM services on CC is presented, in addition are considered all key aspects of service in CC, such as prices, interfaces, Software Level Agreement, instances or workflow of experimentation, among others. The proposal presented is based on LD, so that it reuses other schemata obtaining a best definition of the service. For the validation of the schema, a series of DM services have been created where some of the best known algorithms such as \textit{Random Forest} or \textit{KMeans} are modeled as services.

命名實體識別 · entity · 數據集 · CoNLL · Performer ·

2018 年 1 月 30 日

PEYMA: A Tagged Corpus for Persian Named Entities

Mahsa Sadat Shahshahani,Mahdi Mohseni,Azadeh Shakery,Heshaam Faili

from arxiv, 2017, Signal and Data Processing Journal

The goal in the NER task is to classify proper nouns of a text into classes such as person, location, and organization. This is an important preprocessing step in many NLP tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art NER systems have reached performances of higher than 90 percent in terms of F1 measure, there are very few research studies for this task in Persian. One of the main important causes of this may be the lack of a standard Persian NER dataset to train and test NER systems. In this research we create a standard, big-enough tagged Persian NER dataset which will be distributed for free for research purposes. In order to construct such a standard dataset, we studied standard NER datasets which are constructed for English researches and found out that almost all of these datasets are constructed using news texts. So we collected documents from ten news websites. Later, in order to provide annotators with some guidelines to tag these documents, after studying guidelines used for constructing CoNLL and MUC standard English datasets, we set our own guidelines considering the Persian linguistic rules.

判別器 · Color · Oracle · 多媒體 · 模式識別 ·

2012 年 11 月 20 日

Content based video retrieval

B. V. Patel,B. B. Meshram

Content based video retrieval is an approach for facilitating the searching and browsing of large image collections over World Wide Web. In this approach, video analysis is conducted on low level visual properties extracted from video frame. We believed that in order to create an effective video retrieval system, visual perception must be taken into account. We conjectured that a technique which employs multiple features for indexing and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate this claim, content based indexing and retrieval systems were implemented using color histogram, various texture features and other approaches. Videos were stored in Oracle 9i Database and a user study measured correctness of response.