Energy is today the most critical environmental challenge. The amount of carbon emissions contributing to climate change is significantly influenced by both the production and consumption of energy. Measuring and reducing the energy consumption of services is a crucial step toward reducing adverse environmental effects caused by carbon emissions. Millions of websites rely on online advertisements to generate revenue, with most websites earning most or all of their revenues from ads. As a result, hundreds of billions of online ads are delivered daily to internet users to be rendered in their browsers. Both the delivery and rendering of each ad consume energy. This study investigates how much energy online ads use in the rendering process and offers a way for predicting it as part of rendering the ad. To the best of the authors' knowledge, this is the first study to calculate the energy usage of single advertisements in the rendering process. Our research further introduces different levels of consumption by which online ads can be classified based on energy efficiency. This classification will allow advertisers to add energy efficiency metrics and optimize campaigns towards consuming less possible.
The reconfigurable intelligent surface (RIS) is a promising technology that enables wireless communication systems to achieve improved performance by intelligently manipulating wireless channels. In this paper, we consider the sum-rate maximization problem in a downlink multi-user multi-input-single-output (MISO) channel via space-division multiple access (SDMA). Two major challenges of this problem are the high dimensionality due to the large number of RIS elements and the difficulty to obtain the full channel state information (CSI), which is assumed known in many algorithms proposed in the literature. Instead, we propose a hybrid machine learning approach using the weighted minimum mean squared error (WMMSE) precoder at the base station (BS) and a dedicated neural network (NN) architecture, RISnet, for RIS configuration. The RISnet has a good scalability to optimize 1296 RIS elements and requires partial CSI of only 16 RIS elements as input. We show it achieves a high performance with low requirement for channel estimation for geometric channel models obtained with ray-tracing simulation. The unsupervised learning lets the RISnet find an optimized RIS configuration by itself. Numerical results show that a trained model configures the RIS with low computational effort, considerably outperforms the baselines, and can work with discrete phase shifts.
In the field of monocular 3D detection, it is common practice to utilize scene geometric clues to enhance the detector's performance. However, many existing works adopt these clues explicitly such as estimating a depth map and back-projecting it into 3D space. This explicit methodology induces sparsity in 3D representations due to the increased dimensionality from 2D to 3D, and leads to substantial information loss, especially for distant and occluded objects. To alleviate this issue, we propose MonoNeRD, a novel detection framework that can infer dense 3D geometry and occupancy. Specifically, we model scenes with Signed Distance Functions (SDF), facilitating the production of dense 3D representations. We treat these representations as Neural Radiance Fields (NeRF) and then employ volume rendering to recover RGB images and depth maps. To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception. Extensive experiments conducted on the KITTI-3D benchmark and Waymo Open Dataset demonstrate the effectiveness of MonoNeRD. Codes are available at //github.com/cskkxjk/MonoNeRD.
Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model.
Grammatical error correction aims to correct ungrammatical sentences automatically. Recently, some work has demonstrated the excellent capabilities of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical error correction. However, the potential of open-source LLMs remains unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to preliminary explore its potential for native Chinese grammatical error correction. The core recipe of GrammarGPT is to leverage the hybrid dataset of ChatGPT-generated and human-annotated. For grammatical errors with clues, we proposed a heuristic method to guide ChatGPT to generate ungrammatical sentences by providing those clues. For grammatical errors without clues, we collected ungrammatical sentences from publicly available websites and manually corrected them. In addition, we employed an error-invariant augmentation method to enhance the ability of the model to correct native Chinese grammatical errors. We ultimately constructed about 1k parallel data and utilized these data to fine-tune open-source LLMs (e.g., Phoenix, released by The Chinese University of Hong Kong, Shenzhen) with instruction tuning. The experimental results show that GrammarGPT outperforms the existing SOTA system significantly. Although model parameters are 20x larger than the SOTA baseline, the required amount of data for instruction tuning is 1200x smaller, illustrating the potential of open-source LLMs on native CGEC. Our GrammarGPT ranks $3^{rd}$ on NLPCC2023 SharedTask1, demonstrating our approach's effectiveness. The code and data are available at \url{//github.com/FreedomIntelligence/GrammarGPT}.
Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Our framework encloses four modules: Pre-Scheduling, Initial Mapping, Fault Tolerance, and Dynamic Scheduler. This paper extends our previous work \cite{brum2022sbac} by formally describing the Multi-FedLS resource manager framework and its modules. Experiments were conducted with three Cross-Silo FL applications on CloudLab and a proof-of-concept confirms that Multi-FedLS can be executed on a multi-cloud composed by AWS and GCP, two commercial cloud providers. Results show that the problem of executing Cross-Silo FL applications in multi-cloud environments with preemptible VMs can be efficiently resolved using a mathematical formulation, fault tolerance techniques, and a simple heuristic to choose a new VM in case of revocation.
The emergence of new communication technologies allows us to expand our understanding of distributed control and consider collaborative decision-making paradigms. With collaborative algorithms, certain local decision-making entities (or agents) are enabled to communicate and collaborate on their actions with one another to attain better system behavior. By limiting the amount of communication, these algorithms exist somewhere between centralized and fully distributed approaches. To understand the possible benefits of this inter-agent collaboration, we model a multi-agent system as a common-interest game in which groups of agents can collaborate on their actions to jointly increase the system welfare. We specifically consider $k$-strong Nash equilibria as the emergent behavior of these systems and address how well these states approximate the system optimal, formalized by the $k$-strong price of anarchy ratio. Our main contributions are in generating tight bounds on the $k$-strong price of anarchy in finite resource allocation games as the solution to a tractable linear program. By varying $k$ --the maximum size of a collaborative coalition--we observe exactly how much performance is gained from inter-agent collaboration. To investigate further opportunities for improvement, we generate upper bounds on the maximum attainable $k$-strong price of anarchy when the agents' utility function can be designed.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.
User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can improve automatic evaluation metrics for open-domain dialogue systems, as shown by correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models.
Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.