三级电影一区二区三区_午夜剧场成年免费视_国产在线操大学生_国产亚洲A片在线观看_91网站在线观看免费最新_中文字幕一区二区三区视频在线_国产一区二区99在线观看

In the past redaction involved the use of black or white markers or paper cut-outs to obscure content on physical paper. Today many redactions take place on digital PDF documents and redaction is often performed by software tools. Typical redaction tools remove text from PDF documents and draw a black or white rectangle in its place, mimicking a physical redaction. This practice is thought to be secure when the redacted text is removed and cannot be "copy-pasted" from the PDF document. We find this common conception is false -- existing PDF redactions can be broken by precise measurements of non-redacted character positioning information. We develop a deredaction tool for automatically finding and breaking these vulnerable redactions. We report on 11 different redaction tools, finding the majority do not remove redaction-breaking information, including some Adobe Acrobat workflows. We empirically measure the information leaks, finding some redactions leak upwards of 15 bits of information, creating a 32,768-fold reduction in the space of potential redacted texts. We demonstrate a lower bound on the impact of these leaks via a 22,120 document study, including 18,975 Office of the Inspector General (OIG) investigation reports, where we find 769 vulnerable named-entity redactions. We find leaked information reduces the contents for 164 of these redacted names to less than 494 possibilities from a 7 million name dictionary. We show these findings impact by breaking redactions from the Epstein/Maxwell case, Manafort case, and a released Snowden document. Moreover, we develop an efficient algorithm for locating copy-pastable redactions and find over 100,000 poorly redacted words in US court documents. Current PDF text redaction methods are insufficient for named entity protection.

相關內容

TOOLS

關注 1

這個新版本的工具會議系列恢復了從1989年到2012年的50個會議的傳統。工具最初是“面向對象語言和系統的技術”，后來發展到包括軟件技術的所有創新方面。今天許多最重要的軟件概念都是在這里首次引入的。2019年TOOLS 50+1在俄羅斯喀山附近舉行，以同樣的創新精神、對所有與軟件相關的事物的熱情、科學穩健性和行業適用性的結合以及歡迎該領域所有趨勢和社區的開放態度，延續了該系列。官網鏈接： · Networking · 可理解性 · 門控 · 確切的 ·

2022 年 7 月 21 日

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Andrew M. Saxe,Shagun Sodhani,Sam Lewallen

from arxiv, ICML 2022; 23 pages; 10 figures

Our theoretical understanding of deep learning has not kept pace with its empirical success. While network architecture is known to be critical, we do not yet understand its effect on learned representations and network behavior, or how this architecture should reflect task structure.In this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. Crucially, because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our analysis demonstrates that the learning dynamics in structured networks can be conceptualized as a neural race with an implicit bias towards shared representations, which then govern the model's ability to systematically generalize, multi-task, and transfer. We validate our key insights on naturalistic datasets and with relaxed assumptions. Taken together, our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures and the role of modularity and compositionality in solving real-world problems. The code and results are available at //www.saxelab.org/gated-dln .

劃分 · Weight · Extensibility · CASES · 向量化 ·

2022 年 7 月 20 日

Reflexivity of Partitions Induced by Weighted Poset Metric and Combinatorial Metric

Yang Xu,Haibin Kan,Guangyue Han

Let $\mathbf{H}$ be the Cartesian product of a family of finite abelian groups. Via a polynomial approach, we give sufficient conditions for a partition of $\mathbf{H}$ induced by weighted poset metric to be reflexive, which also become necessary for some special cases. Moreover, by examining the roots of the Krawtchouk polynomials, we establish non-reflexive partitions of $\mathbf{H}$ induced by combinatorial metric. When $\mathbf{H}$ is a vector space over a finite field $\mathbb{F}$, we consider the property of admitting MacWilliams identity (PAMI) and the MacWilliams extension property (MEP) for partitions of $\mathbf{H}$. With some invariance assumptions, we show that two partitions of $\mathbf{H}$ admit MacWilliams identity if and only if they are mutually dual and reflexive, and any partition of $\mathbf{H}$ satisfying the MEP is in fact an orbit partition induced by some subgroup of $\Aut_{\mathbb{F}}(\mathbf{H})$, which is necessarily reflexive. As an application of the aforementioned results, we establish partitions of $\mathbf{H}$ induced by combinatorial metric that do not satisfy the MEP, which further enable us to provide counter-examples to a conjecture proposed by Pinheiro, Machado and Firer in \cite{39}.

Continuity · INFORMS · 可理解性 · SimPLe · 值域 ·

2022 年 7 月 19 日

Identification and characterization of misinformation superspreaders on social media

Matthew R. DeVerna,Rachith Aiyappa,Diogo Pacheco,John Bryden,Filippo Menczer

The world's digital information ecosystem continues to struggle with the spread of misinformation. Prior work has suggested that users who consistently disseminate a disproportionate amount of low-credibility content -- so-called superspreaders -- are at the center of this problem. We quantitatively confirm this hypothesis and introduce simple metrics to predict the top misinformation superspreaders several months into the future. We then conduct a qualitative review to characterize the most prolific superspreaders and analyze their sharing behaviors. Superspreaders include pundits with large followings, low-credibility media outlets, personal accounts affiliated with those media outlets, and a range of influencers. They are primarily political in nature and use more toxic language than the typical user sharing misinformation. We also find concerning evidence suggesting that Twitter may be overlooking prominent superspreaders. We hope this work will further public understanding of bad actors and promote steps to mitigate their negative impacts on healthy digital discourse.

Stack Overflow · 上溢 · 可交換的 · Processing（編程語言） · 知識 (knowledge) ·

2022 年 7 月 19 日

An empirical study of question discussions on Stack Overflow

Wenhan Zhu,Haoxiang Zhang,Ahmed E. Hassan,Michael W. Godfrey

from arxiv, 27 pages, 9 figures

Stack Overflow provides a means for developers to exchange knowledge. While much previous research on Stack Overflow has focused on questions and answers (Q&A), recent work has shown that discussions in comments also contain rich information. On Stack Overflow, discussions through comments and chat rooms can be tied to questions or answers. In this paper, we conduct an empirical study that focuses on the nature of question discussions. We observe that: (1) Question discussions occur at all phases of the Q&A process, with most beginning before the first answer is received. (2) Both askers and answerers actively participate in question discussions; the likelihood of their participation increases as the number of comments increases. (3) There is a strong correlation between the number of question comments and the question answering time (i.e., more discussed questions receive answers more slowly); also, questions with a small number of comments are likely to be answered more quickly than questions with no discussion. Our findings suggest that question discussions contain a rich trove of data that is integral to the Q&A processes on Stack Overflow. We further suggest how future research can leverage the information in question discussions, along with the commonly studied Q&A information.

Agent · 成比例 · Facebook AI Research · motivation · CASE ·

2022 年 7 月 19 日

On Existence of Truthful Fair Cake Cutting Mechanisms

Biaoshuai Tao

from arxiv, 32 pages, 6 figures in one table

We study the fair division problem on divisible heterogeneous resources (the cake cutting problem) with strategic agents, where each agent can manipulate his/her private valuation in order to receive a better allocation. A (direct-revelation) mechanism takes agents' reported valuations as input and outputs an allocation that satisfies a given fairness requirement. A natural and fundamental open problem, first raised by [Chen et al., 2010] and subsequently raised by [Procaccia, 2013] [Aziz and Ye, 2014] [Branzei and Miltersen, 2015] [Menon and Larson, 2017] [Bei et al., 2017] [Bei et al., 2020], etc., is whether there exists a deterministic, truthful and envy-free (or even proportional) cake cutting mechanism. In this paper, we resolve this open problem by proving that there does not exist a deterministic, truthful and proportional cake cutting mechanism, even in the special case where all of the following hold: 1. there are only two agents; 2. each agent's valuation is a piecewise-constant function; 3. each agent is hungry: each agent has a strictly positive value on any part of the cake. The impossibility result extends to the case where the mechanism is allowed to leave some part of the cake unallocated. To circumvent this impossibility result, we aim to design mechanisms that possess a certain degree of truthfulness. Motivated by the kind of truthfulness possessed by the classical I-cut-you-choose protocol, we propose a weaker notion of truthfulness: the proportional risk-averse truthfulness. We show that the well-known moving-knife (Dubins-Spanier) procedure and Even-Paz algorithm do not have this truthful property. We propose a mechanism that is proportionally risk-averse truthful and envy-free, and a mechanism that is proportionally risk-averse truthful that always outputs allocations with connected pieces.

統計量 · TEAM · 比特幣 (Bitcoin) · 區塊鏈 · 論文 ·

2022 年 7 月 19 日

A Survey on EOSIO Systems Security: Vulnerability, Attack, and Mitigation

Ningyu He,Haoyu Wang,Lei Wu,Xiapu Luo,Yao Guo,Xiangqun Chen

from arxiv, 34 pages, 12 figures

EOSIO, as one of the most representative blockchain 3.0 platforms, involves lots of new features, e.g., delegated proof of stake consensus algorithm and updatable smart contracts, enabling a much higher transaction per second and the prosperous decentralized applications (DApps) ecosystem. According to the statistics, it has reached nearly 18 billion USD, taking the third place of the whole cryptocurrency market, following Bitcoin and Ethereum. Loopholes, however, are hiding in the shadows. EOSBet, a famous gambling DApp, was attacked twice within a month and lost more than 1 million USD. No existing work has surveyed the EOSIO from a security researcher perspective. To fill this gap, in this paper, we collected all occurred attack events against EOSIO, and systematically studied their root causes, i.e., vulnerabilities lurked in all relying components for EOSIO, as well as the corresponding attacks and mitigations. We also summarized some best practices for DApp developers, EOSIO official team, and security researchers for future directions.

Analysis · 情感分析 · 數據集 · INFORMS · 可理解性 ·

2022 年 7 月 19 日

Urdu Speech and Text Based Sentiment Analyzer

Waqar Ahmad,Maryam Edalati

from arxiv, Sentiment Analysis, Opinion Mining, Urdu language, polarity assessment, lexicon-based method

Discovering what other people think has always been a key aspect of our information-gathering strategy. People can now actively utilize information technology to seek out and comprehend the ideas of others, thanks to the increased availability and popularity of opinion-rich resources such as online review sites and personal blogs. Because of its crucial function in understanding people's opinions, sentiment analysis (SA) is a crucial task. Existing research, on the other hand, is primarily focused on the English language, with just a small amount of study devoted to low-resource languages. For sentiment analysis, this work presented a new multi-class Urdu dataset based on user evaluations. The tweeter website was used to get Urdu dataset. Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative. The primary purpose of this research is to construct a manually annotated dataset for Urdu sentiment analysis and to establish the baseline result. Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.

Learning · 多峰值 · 表示學習 · Vision · Attention ·

2022 年 6 月 8 日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Shohreh Deldari,Hao Xue,Aaqib Saeed,Jiayuan He,Daniel V. Smith,Flora D. Salim

from arxiv, 36 pages, 5 figures, 9 tables, Survey paper

Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in the field of computer vision, speech, natural language processing (NLP), and recently, with other types of modalities, including time series from sensors. The popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training. Acquiring annotated data can be a difficult and costly process. Self-supervised methods have been introduced to improve the efficiency of training data through discriminative pre-training of models using supervisory signals that have been freely obtained from the raw data. Unlike existing reviews of SSRL that have pre-dominately focused upon methods in the fields of CV or NLP for a single modality, we aim to provide the first comprehensive review of multimodal self-supervised learning methods for temporal data. To this end, we 1) provide a comprehensive categorization of existing SSRL methods, 2) introduce a generic pipeline by defining the key components of a SSRL framework, 3) compare existing models in terms of their objective function, network architecture and potential applications, and 4) review existing multimodal techniques in each category and various modalities. Finally, we present existing weaknesses and future opportunities. We believe our work develops a perspective on the requirements of SSRL in domains that utilise multimodal and/or temporal data

語言表示 · 知識神經元 · MoDELS · 圖 · 知識圖譜 ·

2019 年 9 月 17 日

K-BERT: Enabling Language Representation with Knowledge Graph

Weijie Liu,Peng Zhou,Zhe Zhao,Zhiruo Wang,Qi Ju,Haotang Deng,Ping Wang

from arxiv, 8 pages, 20190917

Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by equipped with a KG without pre-training by-self because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.

學成 · RNN · 門控 · INFORMS · Performer ·

2018 年 10 月 25 日

Learning with Interpretable Structure from RNN

Bo-Jian Hou,Zhi-Hua Zhou

In structure learning, the output is generally a structure that is used as supervision information to achieve good performance. Considering the interpretation of deep learning models has raised extended attention these years, it will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks (RNNs) whose inner mechanism is still not clearly understood. We find that Finite State Automaton (FSA) that processes sequential data has more interpretable inner mechanism and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. We first give the graphical illustration of FSA for human beings to follow, which shows the interpretability. From the FSA's point of view, we then analyze how the performance of RNNs are affected by the number of gates, as well as the semantic meaning behind the transition of numerical hidden states. Our results suggest that RNNs with simple gated structure such as Minimal Gated Unit (MGU) is more desirable and the transitions in FSA leading to specific classification result are associated with corresponding words which are understandable by human beings.