亚洲精品无码黄色网站在线观看_啊灬啊灬啊灬快灬深用两性_女女啪啪激烈高潮喷出网站免费_蘑菇视频免费版在线观看版下载_曰韩精品国产亚洲色变态另类_久久久久久久久_欧美日韩亚洲成色二本道三区

The way the media presents events can significantly affect public perception, which in turn can alter people's beliefs and views. Media bias describes a one-sided or polarizing perspective on a topic. This article summarizes the research on computational methods to detect media bias by systematically reviewing 3140 research papers published between 2019 and 2022. To structure our review and support a mutual understanding of bias across research domains, we introduce the Media Bias Taxonomy, which provides a coherent overview of the current state of research on media bias from different perspectives. We show that media bias detection is a highly active research field, in which transformer-based classification approaches have led to significant improvements in recent years. These improvements include higher classification accuracy and the ability to detect more fine-granular types of bias. However, we have identified a lack of interdisciplinarity in existing projects, and a need for more awareness of the various types of media bias to support methodologically thorough performance evaluations of media bias detection systems. Concluding from our analysis, we see the integration of recent machine learning advancements with reliable and diverse bias assessment strategies from other research areas as the most promising area for future research contributions in the field.

相關內容

有(you)偏

關注 0

AI · MoDELS · Responsible AI · Principle · 操作 ·

2024 年 2 月 13 日

Evolving AI Risk Management: A Maturity Model based on the NIST AI Risk Management Framework

Ravit Dotan,Borhane Blili-Hamelin,Ravi Madhavan,Jeanna Matthews,Joshua Scarpino

Researchers, government bodies, and organizations have been repeatedly calling for a shift in the responsible AI community from general principles to tangible and operationalizable practices in mitigating the potential sociotechnical harms of AI. Frameworks like the NIST AI RMF embody an emerging consensus on recommended practices in operationalizing sociotechnical harm mitigation. However, private sector organizations currently lag far behind this emerging consensus. Implementation is sporadic and selective at best. At worst, it is ineffective and can risk serving as a misleading veneer of trustworthy processes, providing an appearance of legitimacy to substantively harmful practices. In this paper, we provide a foundation for a framework for evaluating where organizations sit relative to the emerging consensus on sociotechnical harm mitigation best practices: a flexible maturity model based on the NIST AI RMF.

AI · Guidance · CASES · Performer · motivation ·

2024 年 2 月 13 日

Taking Training Seriously: Human Guidance and Management-Based Regulation of Artificial Intelligence

Cary Coglianese,Colton R. Crum

from arxiv, 12 pages, 1 figure

Fervent calls for more robust governance of the harms associated with artificial intelligence (AI) are leading to the adoption around the world of what regulatory scholars have called a management-based approach to regulation. Recent initiatives in the United States and Europe, as well as the adoption of major self-regulatory standards by the International Organization for Standardization, share in common a core management-based paradigm. These management-based initiatives seek to motivate an increase in human oversight of how AI tools are trained and developed. Refinements and systematization of human-guided training techniques will thus be needed to fit within this emerging era of management-based regulatory paradigm. If taken seriously, human-guided training can alleviate some of the technical and ethical pressures on AI, boosting AI performance with human intuition as well as better addressing the needs for fairness and effective explainability. In this paper, we discuss the connection between the emerging management-based regulatory frameworks governing AI and the need for human oversight during training. We broadly cover some of the technical components involved in human-guided training and then argue that the kinds of high-stakes use cases for AI that appear of most concern to regulators should lean more on human-guided training than on data-only training. We hope to foster a discussion between legal scholars and computer scientists involving how to govern a domain of technology that is vast, heterogenous, and dynamic in its applications and risks.

HCI · TOCHI · 可辨認的 · ACM Conference on Computer Supported Cooperative Work and Social Computing · DIS ·

2024 年 2 月 12 日

Cruising Queer HCI on the DL: A Literature Review of LGBTQ+ People in HCI

Jordan Taylor,Ellen Simpson,Anh-Ton Tran,Jed Brubaker,Sarah Fox,Haiyi Zhu

LGBTQ+ people have received increased attention in HCI research, paralleling a greater emphasis on social justice in recent years. However, there has not been a systematic review of how LGBTQ+ people are researched or discussed in HCI. In this work, we review all research mentioning LGBTQ+ people across the HCI venues of CHI, CSCW, DIS, and TOCHI. Since 2014, we find a linear growth in the number of papers substantially about LGBTQ+ people and an exponential increase in the number of mentions. Research about LGBTQ+ people tends to center experiences of being politicized, outside the norm, stigmatized, or highly vulnerable. LGBTQ+ people are typically mentioned as a marginalized group or an area of future research. We identify gaps and opportunities for (1) research about and (2) the discussion of LGBTQ+ in HCI and provide a dataset to facilitate future Queer HCI research.

語言模型化 · 類別 · 大語言模型 · MoDELS · 過擬合 ·

2024 年 2 月 12 日

NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes

Lizhou Fan,Wenyue Hua,Lingyao Li,Haoyang Ling,Yongfeng Zhang

from arxiv, 23 pages, 7 figures, 2 tables

Complex reasoning ability is one of the most important features of current LLMs, which has also been leveraged to play an integral role in complex decision-making tasks. Therefore, the investigation into the reasoning capabilities of Large Language Models (LLMs) is critical: numerous benchmarks have been established to assess the reasoning abilities of LLMs. However, current benchmarks are inadequate in offering a rigorous evaluation of the full extent of reasoning abilities that LLMs are capable of achieving. They are also prone to the risk of overfitting, as these benchmarks, being publicly accessible and static, allow models to potentially tailor their responses to specific benchmark metrics, thereby inflating their performance. Addressing these limitations, our research introduces a new benchmark, named NPHardEval. This benchmark is designed to evaluate the reasoning abilities of LLMs across a broad spectrum of 900 algorithmic questions, extending up to the NP-Hard complexity class. These questions are meticulously chosen to represent a wide range of complexity class below the NP-hard complexity class, offering a rigorous measure of the reasoning ability of LLMs. Through this study, we shed light on the current state of reasoning in LLMs, providing an objective and rigorous perspective through the comparison of LLMs' performance across complex classes. Moreover, this benchmark is designed with a dynamic update mechanism, where the datapoints are refreshed on a monthly basis. Such regular updates play a crucial role in mitigating the risk of LLMs overfitting to the benchmark, promoting a more accurate and reliable assessment of their reasoning capabilities. The benchmark dataset and code of NPHardEval are available at //github.com/casmlab/NPHardEval.

MoDELS · Pair · 方陣 · 均值 · Microsoft Windows ·

2024 年 2 月 12 日

Analyzing Currency Fluctuations: A Comparative Study of GARCH, EWMA, and IV Models for GBP/USD and EUR/GBP Pairs

Narayan Tondapu

In this study, we examine the fluctuation in the value of the Great Britain Pound (GBP). We focus particularly on its relationship with the United States Dollar (USD) and the Euro (EUR) currency pairs. Utilizing data from June 15, 2018, to June 15, 2023, we apply various mathematical models to assess their effectiveness in predicting the 20-day variation in the pairs' daily returns. Our analysis involves the implementation of Exponentially Weighted Moving Average (EWMA), Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and Implied Volatility (IV) models. To evaluate their performance, we compare the accuracy of their predictions using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) metrics. We delve into the intricacies of GARCH models, examining their statistical characteristics when applied to the provided dataset. Our findings suggest the existence of asymmetric returns in the EUR/GBP pair, while such evidence is inconclusive for the GBP/USD pair. Additionally, we observe that GARCH-type models better fit the data when assuming residuals follow a standard t-distribution rather than a standard normal distribution. Furthermore, we investigate the efficacy of different forecasting techniques within GARCH-type models. Comparing rolling window forecasts to expanding window forecasts, we find no definitive superiority in either approach across the tested scenarios. Our experiments reveal that for the GBP/USD pair, the most accurate volatility forecasts stem from the utilization of GARCH models employing a rolling window methodology. Conversely, for the EUR/GBP pair, optimal forecasts are derived from GARCH models and Ordinary Least Squares (OLS) models incorporating the annualized implied volatility of the exchange rate as an independent variable.

CASE · MoDELS · Taxonomy · NLP · 可理解性 ·

2024 年 2 月 11 日

Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification

Shanshan Xu,T. Y. S. S Santosh,Oana Ichim,Barbara Plank,Matthias Grabmair

In legal decisions, split votes (SV) occur when judges cannot reach a unanimous decision, posing a difficulty for lawyers who must navigate diverse legal arguments and opinions. In high-stakes domains, understanding the alignment of perceived difficulty between humans and AI systems is crucial to build trust. However, existing NLP calibration methods focus on a classifier's awareness of predictive performance, measured against the human majority class, overlooking inherent human label variation (HLV). This paper explores split votes as naturally observable human disagreement and value pluralism. We collect judges' vote distributions from the European Court of Human Rights (ECHR), and present SV-ECHR, a case outcome classification (COC) dataset with SV information. We build a taxonomy of disagreement with SV-specific subcategories. We further assess the alignment of perceived difficulty between models and humans, as well as confidence- and human-calibration of COC models. We observe limited alignment with the judge vote distribution. To our knowledge, this is the first systematic exploration of calibration to human judgements in legal NLP. Our study underscores the necessity for further research on measuring and enhancing model calibration considering HLV in legal decision tasks.

邊緣化 · SimPLe · 有向 · MoDELS · 樣本 ·

2024 年 2 月 9 日

The Decisive Power of Indecision: Low-Variance Risk-Limiting Audits and Election Contestation via Marginal Mark Recording

Benjamin Fuller,Rashmi Pai,Alexander Russell

from arxiv, 24 pages, no prior publication

Risk-limiting audits (RLAs) are the established techniques for verifying large elections. While they provide rigorous guarantees of correctness, widespread adoption has been impeded by both efficiency concerns and the fact they offer statistical, rather than absolute, conclusions. We define new families of audits that help to address these issues. Our new audits are enabled by revisiting the standard notion of a cast-vote record so that it can declare multiple possible mark interpretations rather than a single decision; this can reflect the presence of ambiguous marks, which appear regularly on hand-marked ballots. We show that this simple expedient can offer significant efficiency improvements with only minor changes to existing auditing infrastructure. We establish that these "Bayesian" comparison audits are indeed risk-limiting in the formal sense of (Fuller, Harrison, and Russell, 2022). We then define a new type of post-election audit we call a contested audit. These call for each candidate to provide a cast-vote record table advancing their own claim to victory. We prove that these audits offer remarkable sample efficiency: they guarantee negligible risk with only a constant number of ballot inspections. This is a first for an audit with provable soundness. These results are formulated in a game-based security model that specify quantitative soundness and completeness guarantees. Finally, we observe that these audits provide a direct means to handle contestation of election results affirmed by conventional RLAs.

echo回聲（移動應用） · 回合 · 3D · 估計/估計量 · Vision ·

2024 年 2 月 9 日

Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision

Lingyu Zhu,Esa Rahtu,Hang Zhao

This paper focuses on perceiving and navigating 3D environments using echoes and RGB image. In particular, we perform depth estimation by fusing RGB image with echoes, received from multiple orientations. Unlike previous works, we go beyond the field of view of the RGB and estimate dense depth maps for substantially larger parts of the environment. We show that the echoes provide holistic and in-expensive information about the 3D structures complementing the RGB image. Moreover, we study how echoes and the wide field-of-view depth maps can be utilised in robot navigation. We compare the proposed methods against recent baselines using two sets of challenging realistic 3D environments: Replica and Matterport3D. The implementation and pre-trained models will be made publicly available.

Networking · INTERACT · Agent · Networks · MoDELS ·

2024 年 2 月 9 日

United We Fall: On the Nash Equilibria of Multiplex and Multilayer Network Games

Raman Ebrahimi,Parinaz Naghizadeh

Network games provide a framework to study strategic decision making processes that are governed by structured interdependencies among agents. However, existing models do not account for environments in which agents simultaneously interact over multiple networks, or when agents operate over multiple action dimensions. In this paper, we propose new models of multiplex network games to capture the different modalities of interactions among strategic agents, and multilayer network games to capture their interactions over multiple action dimensions. We explore how the properties of the constituent networks of a multiplex/multilayer network can undermine or support the existence, uniqueness, and stability of the game's Nash equilibria. Notably, we highlight that both the largest and smallest eigenvalues of the constituent networks (reflecting their connectivity and two-sidedness, respectively) are instrumental in determining the uniqueness of the multiplex/multilayer network game's equilibrium. Together, our findings shed light on the reasons for the fragility of equilibria when agents interact over networks of networks, and point out potential interventions to alleviate them.

學成 · 大數據 · 相同 · 人工智能 · 統計方法 ·

2020 年 5 月 5 日

A Survey of Learning Causality with Data: Problems and Methods

Ruocheng Guo,Lu Cheng,Jundong Li,P. Richard Hahn,Huan Liu

from arxiv, 35 pages, accepted by ACM CSUR

This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.