For over two decades Internet access has been a topic of research and debate. Up-to-date evidence about key predictors such as age is important considering not only the complexities of access to the online medium but also the ever-changing nature of the Internet. This paper attempts to provide a stocktake of current trends in Internet access in New Zealand and their association with age. It relies on secondary analysis of data from a larger online panel survey of 1,001 adult users. Chi-square test of Independence and Cramer's V were used for analysis. A key finding uncovers an emerging gap in the quality of Internet access. While fibre is the predominant type of broadband connection at home, older adults are significantly less likely to have it, and more likely to adopt wireless broadband. Also, a large majority across all age groups have a positive view of the Internet. This was higher among older adults who, interestingly, were slightly more likely to say that their concern about the security of their personal details online has increased in the last year. The implications of the results are discussed and some directions for future research are proposed.
A simple way of obtaining robust estimates of the "center" (or the "location") and of the "scatter" of a dataset is to use the maximum likelihood estimate with a class of heavy-tailed distributions, regardless of the "true" distribution generating the data. We observe that the maximum likelihood problem for the Cauchy distributions, which have particularly heavy tails, is geodesically convex and therefore efficiently solvable (Cauchy distributions are parametrized by the upper half plane, i.e. by the hyperbolic plane). Moreover, it has an appealing geometrical meaning: the datapoints, living on the boundary of the hyperbolic plane, are attracting the parameter by unit forces, and we search the point where these forces are in equilibrium. This picture generalizes to several classes of multivariate distributions with heavy tails, including, in particular, the multivariate Cauchy distributions. The hyperbolic plane gets replaced by symmetric spaces of noncompact type. Geodesic convexity gives us an efficient numerical solution of the maximum likelihood problem for these distribution classes. This can then be used for robust estimates of location and spread, thanks to the heavy tails of these distributions.
The diffusion of AI and big data is reshaping decision-making processes by increasing the amount of information that supports decisions while reducing direct interaction with data and empirical evidence. This paradigm shift introduces new sources of uncertainty, as limited data observability results in ambiguity and a lack of interpretability. The need for the proper analysis of data-driven strategies motivates the search for new models that can describe this type of bounded access to knowledge. This contribution presents a novel theoretical model for uncertainty in knowledge representation and its transfer mediated by agents. We provide a dynamical description of knowledge states by endowing our model with a structure to compare and combine them. Specifically, an update is represented through combinations, and its explainability is based on its consistency in different dimensional representations. We look at inequivalent knowledge representations in terms of multiplicity of inferences, preference relations, and information measures. Furthermore, we define a formal analogy with two scenarios that illustrate non-classical uncertainty in terms of ambiguity (Ellsberg's model) and reasoning about knowledge mediated by other agents observing data (Wigner's friend). Finally, we discuss some implications of the proposed model for data-driven strategies, with special attention to reasoning under uncertainty about business value dimensions and the design of measurement tools for their assessment.
Document-based Question-Answering (QA) tasks are crucial for precise information retrieval. While some existing work focus on evaluating large language model's performance on retrieving and answering questions from documents, assessing the LLMs' performance on QA types that require exact answer selection from predefined options and numerical extraction is yet to be fully assessed. In this paper, we specifically focus on this underexplored context and conduct empirical analysis of LLMs (GPT-4 and GPT 3.5) on question types, including single-choice, yes-no, multiple-choice, and number extraction questions from documents. We use the Cogtale dataset for evaluation, which provide human expert-tagged responses, offering a robust benchmark for precision and factual grounding. We found that LLMs, particularly GPT-4, can precisely answer many single-choice and yes-no questions given relevant context, demonstrating their efficacy in information retrieval tasks. However, their performance diminishes when confronted with multiple-choice and number extraction formats, lowering the overall performance of the model on this task, indicating that these models may not be reliable for the task. This limits the applications of LLMs on applications demanding precise information extraction from documents, such as meta-analysis tasks. However, these findings hinge on the assumption that the retrievers furnish pertinent context necessary for accurate responses, emphasizing the need for further research on the efficacy of retriever mechanisms in enhancing question-answering performance. Our work offers a framework for ongoing dataset evaluation, ensuring that LLM applications for information retrieval and document analysis continue to meet evolving standards.
We define morphological operators and filters for directional images whose pixel values are unit vectors. This requires an ordering relation for unit vectors which is obtained by using depth functions. They provide a centre-outward ordering with respect to a specified centre vector. We apply our operators on synthetic directional images and compare them with classical morphological operators for grey-scale images. As application examples, we enhance the fault region in a compressed glass foam and segment misaligned fibre regions of glass fibre reinforced polymers.
A central task in knowledge compilation is to compile a CNF-SAT instance into a succinct representation format that allows efficient operations such as testing satisfiability, counting, or enumerating all solutions. Useful representation formats studied in this area range from ordered binary decision diagrams (OBDDs) to circuits in decomposable negation normal form (DNNFs). While it is known that there exist CNF formulas that require exponential size representations, the situation is less well studied for other types of constraints than Boolean disjunctive clauses. The constraint satisfaction problem (CSP) is a powerful framework that generalizes CNF-SAT by allowing arbitrary sets of constraints over any finite domain. The main goal of our work is to understand for which type of constraints (also called the constraint language) it is possible to efficiently compute representations of polynomial size. We answer this question completely and prove two tight characterizations of efficiently compilable constraint languages, depending on whether target format is structured. We first identify the combinatorial property of ``strong blockwise decomposability'' and show that if a constraint language has this property, we can compute DNNF representations of linear size. For all other constraint languages we construct families of CSP-instances that provably require DNNFs of exponential size. For a subclass of ``strong uniformly blockwise decomposable'' constraint languages we obtain a similar dichotomy for structured DNNFs. In fact, strong (uniform) blockwise decomposability even allows efficient compilation into multi-valued analogs of OBDDs and FBDDs, respectively. Thus, we get complete characterizations for all knowledge compilation classes between O(B)DDs and DNNFs.
Researchers rely on academic web search engines to find scientific sources, but search engine mechanisms may selectively present content that aligns with biases embedded in the queries. This study examines whether confirmation-biased queries prompted into Google Scholar and Semantic Scholar will yield skewed results. Six queries (topics across health and technology domains such as "vaccines" or "internet use") were analyzed for disparities in search results. We confirm that biased queries (targeting "benefits" or "risks") affect search results in line with the bias, with technology-related queries displaying more significant disparities. Overall, Semantic Scholar exhibited fewer disparities than Google Scholar. Topics rated as more polarizing did not consistently show more skewed results. Academic search results that perpetuate confirmation bias have strong implications for both researchers and citizens searching for evidence. More research is needed to explore how scientific inquiry and academic search engines interact.
The diffusion of AI and big data is reshaping decision-making processes by increasing the amount of information that supports decisions while reducing direct interaction with data and empirical evidence. This paradigm shift introduces new sources of uncertainty, as limited data observability results in ambiguity and a lack of interpretability. The need for the proper analysis of data-driven strategies motivates the search for new models that can describe this type of bounded access to knowledge. This contribution presents a novel theoretical model for uncertainty in knowledge representation and its transfer mediated by agents. We provide a dynamical description of knowledge states by endowing our model with a structure to compare and combine them. Specifically, an update is represented through combinations, and its explainability is based on its consistency in different dimensional representations. We look at inequivalent knowledge representations in terms of multiplicity of inferences, preference relations, and information measures. Furthermore, we define a formal analogy with two scenarios that illustrate non-classical uncertainty in terms of ambiguity (Ellsberg's model) and reasoning about knowledge mediated by other agents observing data (Wigner's friend). Finally, we discuss some implications of the proposed model for data-driven strategies, with special attention to reasoning under uncertainty about business value dimensions and the design of measurement tools for their assessment.
Consider the family of power divergence statistics based on $n$ trials, each leading to one of $r$ possible outcomes. This includes the log-likelihood ratio and Pearson's statistic as important special cases. It is known that in certain regimes (e.g., when $r$ is of order $n^2$ and the allocation is asymptotically uniform as $n\to\infty$) the power divergence statistic converges in distribution to a linear transformation of a Poisson random variable. We establish explicit error bounds in the Kolmogorov (or uniform) metric to complement this convergence result, which may be applied for any values of $n$, $r$ and the index parameter $\lambda$ for which such a finite-sample bound is meaningful. We further use this Poisson approximation result to derive error bounds in Gaussian approximation of the power divergence statistics.
The trace plot is seldom used in meta-analysis, yet it is a very informative plot. In this article we define and illustrate what the trace plot is, and discuss why it is important. The Bayesian version of the plot combines the posterior density of tau, the between-study standard deviation, and the shrunken estimates of the study effects as a function of tau. With a small or moderate number of studies, tau is not estimated with much precision, and parameter estimates and shrunken study effect estimates can vary widely depending on the correct value of tau. The trace plot allows visualization of the sensitivity to tau along with a plot that shows which values of tau are plausible and which are implausible. A comparable frequentist or empirical Bayes version provides similar results. The concepts are illustrated using examples in meta-analysis and meta-regression; implementaton in R is facilitated in a Bayesian or frequentist framework using the bayesmeta and metafor packages, respectively.
Orthogonal meta-learners, such as DR-learner, R-learner and IF-learner, are increasingly used to estimate conditional average treatment effects. They improve convergence rates relative to na\"{\i}ve meta-learners (e.g., T-, S- and X-learner) through de-biasing procedures that involve applying standard learners to specifically transformed outcome data. This leads them to disregard the possibly constrained outcome space, which can be particularly problematic for dichotomous outcomes: these typically get transformed to values that are no longer constrained to the unit interval, making it difficult for standard learners to guarantee predictions within the unit interval. To address this, we construct orthogonal meta-learners for the prediction of counterfactual outcomes which respect the outcome space. As such, the obtained i-learner or imputation-learner is more generally expected to outperform existing learners, even when the outcome is unconstrained, as we confirm empirically in simulation studies and an analysis of critical care data. Our development also sheds broader light onto the construction of orthogonal learners for other estimands.