This paper provides an overview of the Internet of Things (IoT) and its significance. It discusses the concept of Man-in-the-Middle (MitM) attacks in detail, including their causes, potential solutions, and challenges in detecting and preventing such attacks. The paper also addresses the current issues related to IoT security and explores future methods and facilities for improving detection and prevention mechanisms against MitM.
This paper deals with the Multi-robot Exploration (MRE) under communication constraints problem. We propose a novel intermittent rendezvous method that allows robots to explore an unknown environment while sharing maps at rendezvous locations through agreements. In our method, robots update the agreements to spread the rendezvous locations during the exploration and prioritize exploring unknown areas near them. To generate the agreements automatically, we reduced the MRE to instances of the Job Shop Scheduling Problem (JSSP) and ensured intermittent communication through a temporal connectivity graph. We evaluate our method in simulation in various virtual urban environments and a Gazebo simulation using the Robot Operating System (ROS). Our results suggest that our method can be better than using relays or maintaining intermittent communication with a base station since we can explore faster without additional hardware to create a relay network.
In constrained parameter estimation, the classical constrained Cramer-Rao bound (CCRB) and the recent Lehmann-unbiased CCRB (LU-CCRB) are lower bounds on the performance of mean-unbiased and Lehmann-unbiased estimators, respectively. Both the CCRB and the LU-CCRB require differentiability of the likelihood function, which can be a restrictive assumption. Additionally, these bounds are local bounds that are inappropriate for predicting the threshold phenomena of the constrained maximum likelihood (CML) estimator. The constrained Barankin-type bound (CBTB) is a nonlocal mean-squared-error (MSE) lower bound for constrained parameter estimation that does not require differentiability of the likelihood function. However, this bound requires a restrictive mean-unbiasedness condition in the constrained set. In this work, we propose the Lehmann-unbiased CBTB (LU-CBTB) on the weighted MSE (WMSE). This bound does not require differentiability of the likelihood function and assumes uniform Lehmann-unbiasedness, which is less restrictive than the CBTB uniform mean-unbiasedness. We show that the LU-CBTB is tighter than or equal to the LU-CCRB and coincides with the CBTB for linear constraints. For nonlinear constraints the LU-CBTB and the CBTB are different and the LU-CBTB can be a lower bound on the WMSE of constrained estimators in cases, where the CBTB is not. In the simulations, we consider direction-of-arrival estimation of an unknown constant modulus discrete signal. In this case, the likelihood function is not differentiable and constrained Cramer-Rao-type bounds do not exist, while CBTBs exist. It is shown that the LU-CBTB better predicts the CML estimator performance than the CBTB, since the CML estimator is Lehmann-unbiased but not mean-unbiased.
This study explores the capabilities of Large Language Models (LLMs), particularly OpenAI's ChatGPT, in addressing the challenges associated with software modeling, explicitly focusing on the bidirectional traceability problem between design models and code. The objective of this study is to demonstrate the proficiency of ChatGPT in understanding and integrating specific requirements into design models and code and its potential to offer solutions to the bidirectional traceability problem through a case study. The findings indicate that ChatGPT is capable of generating design models and code from natural language requirements, thereby bridging the gap between these requirements and software modeling. Despite its limitations in suggesting a specific method to resolve the problem using ChatGPT itself, it exhibited the capacity to provide corrections to be consistent between design models and code. As a result, the study concludes that achieving bidirectional traceability between design models and code is feasible using ChatGPT.
The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.
Misinformation has become a pressing issue. Fake media, in both visual and textual forms, is widespread on the web. While various deepfake detection and text fake news detection methods have been proposed, they are only designed for single-modality forgery based on binary classification, let alone analyzing and reasoning subtle forgery traces across different modalities. In this paper, we highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM^4). DGM^4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content, which requires deeper reasoning of multi-modal media manipulation. To support a large-scale investigation, we construct the first DGM^4 dataset, where image-text pairs are manipulated by various approaches, with rich annotation of diverse manipulations. Moreover, we propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities. HAMMER performs 1) manipulation-aware contrastive learning between two uni-modal encoders as shallow manipulation reasoning, and 2) modality-aware cross-attention by multi-modal aggregator as deep manipulation reasoning. Dedicated manipulation detection and grounding heads are integrated from shallow to deep levels based on the interacted multi-modal information. To exploit more fine-grained contrastive learning for cross-modal semantic alignment, we further integrate Manipulation-Aware Contrastive Loss with Local View and construct a more advanced model HAMMER++. Finally, we build an extensive benchmark and set up rigorous evaluation metrics for this new research problem. Comprehensive experiments demonstrate the superiority of HAMMER and HAMMER++.
With Ethereum's transition from Proof-of-Work to Proof-of-Stake in September 2022 came another paradigm shift, the Proposer-Builder Separation (PBS) scheme. PBS was introduced to decouple the roles of selecting and ordering transactions in a block (i.e., the builder), from those validating its contents and proposing the block to the network as the new head of the blockchain (i.e., the proposer). In this landscape, proposers are the validators in the Proof-of-Stake consensus protocol, while now relying on specialized block builders for creating blocks with the highest value for the proposer. Additionally, relays act as mediators between builders and proposers. We study PBS adoption and show that the current landscape exhibits significant centralization amongst the builders and relays. Further, we explore whether PBS effectively achieves its intended objectives of enabling hobbyist validators to maximize block profitability and preventing censorship. Our findings reveal that although PBS grants validators the opportunity to access optimized and competitive blocks, it tends to stimulate censorship rather than reduce it. Additionally, we demonstrate that relays do not consistently uphold their commitments and may prove unreliable. Specifically, proposers do not always receive the complete promised value, and the censorship or filtering capabilities pledged by relays exhibit significant gaps.
We present a novel approach to robust pose graph optimization based on Graduated Non-Convexity (GNC). Unlike traditional GNC-based methods, the proposed approach employs an adaptive shape function using B-spline to optimize the shape of the robust kernel. This aims to reduce GNC iterations, boosting computational speed without compromising accuracy. When integrated with the open-source riSAM algorithm, the method demonstrates enhanced efficiency across diverse datasets. Accompanying open-source code aims to encourage further research in this area. //github.com/SNU-DLLAB/AGNC-PGO
This paper explores the imperative need and methodology for developing a localized Large Language Model (LLM) tailored for Arabic, a language with unique cultural characteristics that are not adequately addressed by current mainstream models like ChatGPT. Key concerns additionally arise when considering cultural sensitivity and local values. To this end, the paper outlines a packaged solution, including further pre-training with Arabic texts, supervised fine-tuning (SFT) using native Arabic instructions and GPT-4 responses in Arabic, and reinforcement learning with AI feedback (RLAIF) using a reward model that is sensitive to local culture and values. The objective is to train culturally aware and value-aligned Arabic LLMs that can serve the diverse application-specific needs of Arabic-speaking communities. Extensive evaluations demonstrated that the resulting LLM called `AceGPT' is the SOTA open Arabic LLM in various benchmarks, including instruction-following benchmark (i.e., Arabic Vicuna-80 and Arabic AlpacaEval), knowledge benchmark (i.e., Arabic MMLU and EXAMs), as well as the newly-proposed Arabic cultural \& value alignment benchmark. Notably, AceGPT outperforms ChatGPT in the popular Vicuna-80 benchmark when evaluated with GPT-4, despite the benchmark's limited scale. % Natural Language Understanding (NLU) benchmark (i.e., ALUE) Codes, data, and models are in //github.com/FreedomIntelligence/AceGPT.
This paper updates the survey of AI accelerators and processors from past three years. This paper collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and power consumption numbers. The performance and power values are plotted on a scatter graph, and a number of dimensions and observations from the trends on this plot are again discussed and analyzed. Two new trends plots based on accelerator release dates are included in this year's paper, along with the additional trends of some neuromorphic, photonic, and memristor-based inference accelerators.
We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.