New technologies for sensing and communication act as enablers for cooperative driving applications. Sensors are able to detect objects in the surrounding environment and information such as their current location is exchanged among vehicles. In order to cope with the vehicles' mobility, such information is required to be as fresh as possible for proper operation of cooperative driving applications. The age of information (AoI) has been proposed as a metric for evaluating freshness of information; recently also within the context of intelligent transportation systems (ITS). We investigate mechanisms to reduce the AoI of data transported in form of beacon messages while controlling their emission rate. We aim to balance packet collision probability and beacon frequency using the average peak age of information (PAoI) as a metric. This metric, however, only accounts for the generation time of the data but not for application-specific aspects, such as the location of the transmitting vehicle. We thus propose a new way of interpreting the AoI by considering information context, thereby incorporating vehicles' locations. As an example, we characterize such importance using the orientation and the distance of the involved vehicles. In particular, we introduce a weighting coefficient used in combination with the PAoI to evaluate the information freshness, thus emphasizing on information from more important neighbors. We further design the beaconing approach in a way to meet a given AoI requirement, thus, saving resources on the wireless channel while keeping the AoI minimal. We illustrate the effectiveness of our approach in Manhattan-like urban scenarios, reaching pre-specified targets for the AoI of beacon messages.
Line attributes such as width and dashing are commonly used to encode information. However, many questions on the perception of line attributes remain, such as how many levels of attribute variation can be distinguished or which line attributes are the preferred choices for which tasks. We conducted three studies to develop guidelines for using stylized lines to encode scalar data. In our first study, participants drew stylized lines to encode uncertainty information. Uncertainty is usually visualized alongside other data. Therefore, alternative visual channels are important for the visualization of uncertainty. Additionally, uncertainty -- e.g., in weather forecasts -- is a familiar topic to most people. Thus, we picked it for our visualization scenarios in study 1. We used the results of our study to determine the most common line attributes for drawing uncertainty: Dashing, luminance, wave amplitude, and width. While those line attributes were especially common for drawing uncertainty, they are also commonly used in other areas. In studies 2 and 3, we investigated the discriminability of the line attributes determined in study 1. Studies 2 and 3 did not require specific application areas; thus, their results apply to visualizing any scalar data in line attributes. We evaluated the just-noticeable differences (JND) and derived recommendations for perceptually distinct line levels. We found that participants could discriminate considerably more levels for the line attribute width than for wave amplitude, dashing, or luminance.
Rust is an emerging, strongly-typed programming language focusing on efficiency and memory safety. With increasing projects adopting Rust, knowing how to use Unsafe Rust is crucial for Rust security. We observed that the description of safety requirements needs to be unified in Unsafe Rust programming. Current unsafe API documents in the standard library exhibited variations, including inconsistency and insufficiency. To enhance Rust security, we suggest unsafe API documents to list systematic descriptions of safety requirements for users to follow. In this paper, we conducted the first comprehensive empirical study on safety requirements across unsafe boundaries. We studied unsafe API documents in the standard library and defined 19 safety properties (SP). We then completed the data labeling on 416 unsafe APIs while analyzing their correlation to find interpretable results. To validate the practical usability and SP coverage, we categorized existing Rust CVEs until 2023-07-08 and performed a statistical analysis of std unsafe API usage toward the crates.io ecosystem. In addition, we conducted a user survey to gain insights into four aspects from experienced Rust programmers. We finally received 50 valid responses and confirmed our classification with statistical significance.
Pan-sharpening, as one of the most commonly used techniques in remote sensing systems, aims to inject spatial details from panchromatic images into multispectral images (MS) to obtain high-resolution multispectral images. Since deep learning has received widespread attention because of its powerful fitting ability and efficient feature extraction, a variety of pan-sharpening methods have been proposed to achieve remarkable performance. However, current pan-sharpening methods usually require the paired panchromatic (PAN) and MS images as input, which limits their usage in some scenarios. To address this issue, in this paper we observe that the spatial details from PAN images are mainly high-frequency cues, i.e., the edges reflect the contour of input PAN images. This motivates us to develop a PAN-agnostic representation to store some base edges, so as to compose the contour for the corresponding PAN image via them. As a result, we can perform the pan-sharpening task with only the MS image when inference. To this end, a memory-based network is adapted to extract and memorize the spatial details during the training phase and is used to replace the process of obtaining spatial information from PAN images when inference, which is called Memory-based Spatial Details Network (MSDN). Finally, we integrate the proposed MSDN module into the existing deep learning-based pan-sharpening methods to achieve an end-to-end pan-sharpening network. With extensive experiments on the Gaofen1 and WorldView-4 satellites, we verify that our method constructs good spatial details without PAN images and achieves the best performance. The code is available at //github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.git.
Machine learning (ML) components are increasingly incorporated into software products, yet developers face challenges in transitioning from ML prototypes to products. Academic researchers struggle to propose solutions to these challenges and evaluate interventions because they often do not have access to close-sourced ML products from industry. In this study, we define and identify open-source ML products, curating a dataset of 262 repositories from GitHub, to facilitate further research and education. As a start, we explore six broad research questions related to different development activities and report 21 findings from a sample of 30 ML products from the dataset. Our findings reveal a variety of development practices and architectural decisions surrounding different types and uses of ML models that offer ample opportunities for future research innovations. We also find very little evidence of industry best practices such as model testing and pipeline automation within the open-source ML products, which leaves room for further investigation to understand its potential impact on the development and eventual end-user experience for the products.
The analysis of public affairs documents is crucial for citizens as it promotes transparency, accountability, and informed decision-making. It allows citizens to understand government policies, participate in public discourse, and hold representatives accountable. This is crucial, and sometimes a matter of life or death, for companies whose operation depend on certain regulations. Large Language Models (LLMs) have the potential to greatly enhance the analysis of public affairs documents by effectively processing and understanding the complex language used in such documents. In this work, we analyze the performance of LLMs in classifying public affairs documents. As a natural multi-label task, the classification of these documents presents important challenges. In this work, we use a regex-powered tool to collect a database of public affairs documents with more than 33K samples and 22.5M tokens. Our experiments assess the performance of 4 different Spanish LLMs to classify up to 30 different topics in the data in different configurations. The results shows that LLMs can be of great use to process domain-specific documents, such as those in the domain of public affairs.
Future wireless networks and sensing systems will benefit from access to large chunks of spectrum above 100 GHz, to achieve terabit-per-second data rates in 6th Generation (6G) cellular systems and improve accuracy and reach of Earth exploration and sensing and radio astronomy applications. These are extremely sensitive to interference from artificial signals, thus the spectrum above 100 GHz features several bands which are protected from active transmissions under current spectrum regulations. To provide more agile access to the spectrum for both services, active and passive users will have to coexist without harming passive sensing operations. In this paper, we provide the first, fundamental analysis of Radio Frequency Interference (RFI) that large-scale terrestrial deployments introduce in different satellite sensing systems now orbiting the Earth. We develop a geometry-based analysis and extend it into a data-driven model which accounts for realistic propagation, building obstruction, ground reflection, for network topology with up to $10^5$ nodes in more than $85$ km$^2$. We show that the presence of harmful RFI depends on several factors, including network load, density and topology, satellite orientation, and building density. The results and methodology provide the foundation for the development of coexistence solutions and spectrum policy towards 6G.
Before and after study frameworks are widely adopted to evaluate the effectiveness of transportation policies and emerging technologies. However, many factors such as seasonal factors, holidays, and lane closure might interfere with the evaluation process by inducing variation in traffic volume during the before and after periods. In practice, limited effort has been made to eliminate the effects of these factors. In this study, an extreme gradient boosting (XGBoost)-based propensity score matching method is proposed to reduce the biases caused by traffic volume variation during the before and after periods. In order to evaluate the effectiveness of the proposed method, a corridor in the City of Chandler, Arizona where an advanced traffic signal control system has been recently implemented was selected. The results indicated that the proposed method is able to effectively eliminate the variation in traffic volume caused by the COVID-19 global Pandemic during the evaluation process. In addition, the results of the t-test and Kolmogorov-Smirnov (KS) test demonstrated that the proposed method outperforms other conventional propensity score matching methods. The application of the proposed method is also transferrable to other before and after evaluation studies and can significantly assist the transportation engineers to eliminate the impacts of traffic volume variation on the evaluation process.
In many engineering applications it is useful to reason about "negative information". For example, in planning problems, providing an optimal solution is the same as giving a feasible solution (the "positive" information) together with a proof of the fact that there cannot be feasible solutions better than the one given (the "negative" information). We model negative information by introducing the concept of "norphisms", as opposed to the positive information of morphisms. A "nategory" is a category that has "nom"-sets in addition to hom-sets, and specifies the interaction between norphisms and morphisms. In particular, we have composition rules of the form morphism + norphism $\to$ norphism. Norphisms do not compose by themselves; rather, they use morphisms as catalysts. After providing several applied examples, we connect nategories to enriched category theory. Specifically, we prove that categories enriched in de Paiva's dialectica categories GC, in the case C = Set and equipped with a modified monoidal product, define nategories which satisfy additional regularity properties. This formalizes negative information categorically in a way that makes negative and positive morphisms equal citizens.
Convolutional networks are considered shift invariant, but it was demonstrated that their response may vary according to the exact location of the objects. In this paper we will demonstrate that most commonly investigated datasets have a bias, where objects are over-represented at the center of the image during training. This bias and the boundary condition of these networks can have a significant effect on the performance of these architectures and their accuracy drops significantly as an object approaches the boundary. We will also demonstrate how this effect can be mitigated with data augmentation techniques.
With the rapid proliferation of smart mobile devices, federated learning (FL) has been widely considered for application in wireless networks for distributed model training. However, data heterogeneity, e.g., non-independently identically distributions and different sizes of training data among clients, poses major challenges to wireless FL. Limited communication resources complicate the implementation of fair scheduling which is required for training on heterogeneous data, and further deteriorate the overall performance. To address this issue, this paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. Specifically, we first develop a closed-form expression for an upper bound on the FL loss function, with a particular emphasis on data heterogeneity described by a dataset size vector and a data divergence vector. Then we formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE). Next, via the Lyapunov drift technique, we transform the CRE optimization problem into a series of tractable problems. Extensive experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.