Despite decision-making being a vital goal of data visualization, little work has been done to differentiate the decision-making tasks within our field. While visualization task taxonomies and typologies exist, they are often too granular for describing complex decision goals and decision-making processes, thus limiting their potential use in designing decision-support tools. In this paper, we contribute a typology of decision-making tasks that were iteratively refined from a list of design goals distilled from a literature review. Our typology is concise and consists of only three tasks: choose, activate, and create. Originally proposed by the scientific community, we extend and provide definitions for these tasks that are suitable for the visualization community. Our proposed typology offers two benefits. First, it facilitates the composition of decisions using these three tasks, allowing for flexible and clear descriptions across varying complexities and domains. Second, diagrams created using this typology encourage productive discourse between visualization designers and domain experts by abstracting the intricacies of data, thereby promoting clarity and rigorous analysis of decision-making processes. We motivate the use of our typology through four case studies and demonstrate the benefits of our approach through semi-structured interviews conducted with experienced members of the visualization community, comprising academic and industry experts, who have contributed to developing or publishing decision support systems for domain experts. Our interviewees composed diagrams using our typology to delineate the decision-making processes that drive their decision-support tools, demonstrating its descriptive capacity and effectiveness.
A major challenge when describing the origin of life is to explain how instructional information control systems emerge naturally and spontaneously from mere molecular dynamics. So far, no one has clarified how information control emerged ab initio and how primitive control mechanisms in life might have evolved, becoming increasingly refined. Based on recent experimental results showing that chemical computation does not require the presence of life-related chemistry, we elucidate the origin and early evolution of information handling by chemical automata, from information processing (computation) to information storage (memory) and information transmission (communication). In contrast to other theories that assume the existence of initial complex structures, our narrative starts from trivial self-replicators whose interaction leads to the arising of more powerful molecular machines. By describing precisely the primordial transitions in chemistry-based computation, our metaphor is capable of explaining the above-mentioned gaps and can be translated to other models of computation, which allow us to explore biological phenomena at multiple spatial and temporal scales. At the end of our manuscript, we propose some ways to extend our ideas, including experimental validation of our theory (both in vitro and in silico).
In the field of crowd counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper discusses how to construct high-performance crowd counting models using only simple structures. We proposes the Fuss-Free Network (FFNet) that is characterized by its simple and efficieny structure, consisting of only a backbone network and a multi-scale feature fusion structure. The multi-scale feature fusion structure is a simple structure consisting of three branches, each only equipped with a focus transition module, and combines the features from these branches through the concatenation operation. Our proposed crowd counting model is trained and evaluated on four widely used public datasets, and it achieves accuracy that is comparable to that of existing complex models. Furthermore, we conduct a comprehensive evaluation by replacing the existing backbones of various models such as FFNet and CCTrans with different networks, including MobileNet-v3, ConvNeXt-Tiny, and Swin-Transformer-Small. The experimental results further indicate that excellent crowd counting performance can be achieved with the simplied structure proposed by us.
This work investigates the performance of intelligent reflective surfaces (IRSs) assisted uplink non-orthogonal multiple access (NOMA) in energy-constrained networks. Specifically, we formulate and solve two optimization problems; the first aims at minimizing the sum of users' transmit power, while the second targets maximizing the system level energy efficiency (EE). The two problems are solved by jointly optimizing the users' transmit powers and the beamforming coefficients at IRS subject to the users' individual uplink rate and transmit power constraints. A novel and low complexity algorithm is developed to optimize the IRS beamforming coefficients by optimizing the objective function over a \textit{complex circle manifold} (CCM). To efficiently optimize the IRS phase shifts over the manifold, the optimization problem is reformulated into a feasibility expansion problem which is reduced to a max-min signal-plus-interference-ratio (SINR). Then, with the aid of a smoothing technique, the exact penalty method is applied to transform the problem from constrained to unconstrained. The proposed solution is compared against three semi-definite programming (SDP)-based benchmarks which are semi-definite relaxation (SDR), SDP-difference of convex (SDP-DC) and sequential rank-one constraint relaxation (SROCR). The results show that the manifold algorithm provides better performance than the SDP-based benchmarks, and at a much lower computational complexity for both the transmit power minimization and EE maximization problems. The results also reveal that IRS-NOMA is only superior to orthogonal multiple access (OMA) when the users' target achievable rate requirements are relatively high.
Min-max problems are important in multi-agent sequential decision-making because they improve the performance of the worst-performing agent in the network. However, solving the multi-agent min-max problem is challenging. We propose a modular, distributed, online planning-based algorithm that is able to approximate the solution of the min-max objective in networked Markov games, assuming that the agents communicate within a network topology and the transition and reward functions are neighborhood-dependent. This set-up is encountered in the multi-robot setting. Our method consists of two phases at every planning step. In the first phase, each agent obtains sample returns based on its local reward function, by performing online planning. Using the samples from online planning, each agent constructs a concave approximation of its underlying local return as a function of only the action of its neighborhood at the next planning step. In the second phase, the agents deploy a distributed optimization framework that converges to the optimal immediate next action for each agent, based on the function approximations of the first phase. We demonstrate our algorithm's performance through formation control simulations.
We propose a novel performance metric for articulated robots with distributed directional sensors called the sensor observability analysis (SOA). These robot-mounted distributed directional sensors (e.g., joint torque sensors) change their individual sensing directions as the joints move. SOA transforms individual sensors axes in joint space to provide the cumulative sensing quality of these sensors to observe each task-space axis, akin to forward kinematics for sensors. For example, certain joint configurations may align joint torque sensors in such a way that they are unable to observe interaction forces in one or more task-space axes. The resultant sensor observability performance metrics can then be used in optimization and in null-space control to avoid sensor observability singular configurations or to maximize sensor observability in particular directions. We use the specific case of force sensing in serial robot manipulators to showcase the analysis. Parallels are drawn between sensor observability and the traditional kinematic manipulability; SOA is shown to be more generalizable in terms of analysing non-joint-mounted sensors and can potentially be applied to sensor types other than for force sensing. Simulations and experiments using a custom 3-DOF robot and the Baxter robot demonstrate the utility and importance of sensor observability in physical interactions.
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.