By opportunistically engaging mobile users (workers), mobile crowdsensing (MCS) networks have emerged as important approach to facilitate sharing of sensed/gathered data of heterogeneous mobile devices. To assign tasks among workers and ensure low overheads, a series of stable matching mechanisms is introduced in this paper, which are integrated into a novel hybrid service trading paradigm consisting of futures trading mode and spot trading mode to ensure seamless MCS service provisioning. In the futures trading mode, we determine a set of long-term workers for each task through an overbooking-enabled in-advance many-to-many matching (OIA3M) mechanism, while characterizing the associated risks under statistical analysis. In the spot trading mode, we investigate the impact of fluctuations in long-term workers' resources on the violation of service quality requirements of tasks, and formalize a spot trading mode for tasks with violated service quality requirements under practical budget constraints, where the task-worker mapping is carried out via onsite many-to-many matching (O3M) and onsite many-to-one matching (OMOM). We theoretically show that our proposed matching mechanisms satisfy stability, individual rationality, fairness and computational efficiency. Comprehensive evaluations also verify the satisfaction of these properties under practical network settings, while revealing commendable performance on running time, participators' interactions, and service quality.
The deployment of federated learning (FL) within vertical heterogeneous networks, such as those enabled by high-altitude platform station (HAPS), offers the opportunity to engage a wide array of clients, each endowed with distinct communication and computational capabilities. This diversity not only enhances the training accuracy of FL models but also hastens their convergence. Yet, applying FL in these expansive networks presents notable challenges, particularly the significant non-IIDness in client data distributions. Such data heterogeneity often results in slower convergence rates and reduced effectiveness in model training performance. Our study introduces a client selection strategy tailored to address this issue, leveraging user network traffic behaviour. This strategy involves the prediction and classification of clients based on their network usage patterns while prioritizing user privacy. By strategically selecting clients whose data exhibit similar patterns for participation in FL training, our approach fosters a more uniform and representative data distribution across the network. Our simulations demonstrate that this targeted client selection methodology significantly reduces the training loss of FL models in HAPS networks, thereby effectively tackling a crucial challenge in implementing large-scale FL systems.
With the increasing demand for computing capability given limited resource and power budgets, it is crucial to deploy applications to customized accelerators like FPGAs. However, FPGA programming is non-trivial. Although existing high-level synthesis (HLS) tools improve productivity to a certain extent, they are limited in scope and capability to support sufficient FPGA-oriented optimizations. This paper focuses on FPGA-based accelerators and proposes POM, an optimizing framework built on multi-level intermediate representation (MLIR). POM has several features which demonstrate its scope and capability of performance optimization. First, most HLS tools depend exclusively on a single-level IR to perform all the optimizations, introducing excessive information into the IR and making debugging an arduous task. In contrast, POM introduces three layers of IR to perform operations at suitable abstraction levels, streamlining the implementation and debugging process and exhibiting better flexibility, extensibility, and systematicness. Second, POM integrates the polyhedral model into MLIR, enabling advanced dependence analysis and various FPGA-oriented loop transformations. By representing nested loops with integer sets and maps, loop transformations can be conducted conveniently through manipulations on polyhedral semantics. Finally, to further relieve design effort, POM has a user-friendly programming interface (DSL) that allows a concise description of computation and includes a rich collection of scheduling primitives. An automatic design space exploration (DSE) engine is provided to search for high-performance optimization schemes efficiently and generate optimized accelerators automatically. Experimental results show that POM achieves a $6.46\times$ average speedup on typical benchmark suites and a $6.06\times$ average speedup on real-world applications compared to the state-of-the-art.
Service robots are integrating more and more into our daily lives to help us with various tasks. In such environments, robots frequently face new objects while working in the environment and need to learn them in an open-ended fashion. Furthermore, such robots must be able to recognize a wide range of object categories. In this paper, we present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem. In particular, we form ensemble methods based on deep representations and handcrafted 3D shape descriptors. To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly. The proposed model is suitable for open-ended learning scenarios where the number of 3D object categories is not fixed and can grow over time. We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios. For the evaluation purpose, in addition to real object datasets, we generate a large synthetic household objects dataset consisting of 27000 views of 90 objects. Experimental results demonstrate the effectiveness of the proposed method on online few-shot 3D object recognition tasks, as well as its superior performance over the state-of-the-art open-ended learning approaches. Furthermore, our results show that while ensemble learning is modestly beneficial in offline settings, it is significantly beneficial in lifelong few-shot learning situations. Additionally, we demonstrated the effectiveness of our approach in both simulated and real-robot settings, where the robot rapidly learned new categories from limited examples.
Clarifying user's information needs is an essential component of modern search systems. While most of the approaches for constructing clarifying prompts rely on query facets, the impact of the quality of the facets is relatively unexplored. In this work, we concentrate on facet quality through the notion of facet coherency and assess its importance for overall usefulness for clarification in search. We find that existing evaluation procedures do not account for facet coherency, as evident by the poor correlation of coherency with automated metrics. Moreover, we propose a coherency classifier and assess the prevalence of incoherent facets in a well-established dataset on clarification. Our findings can serve as motivation for future work on the topic.
Interactive user interfaces have increasingly explored AI's role in enhancing communication efficiency and productivity in collaborative tasks. AI tools such as chatbots and smart replies aim to enhance conversation quality and improve team performance. Early AI assistants, were limited by predefined knowledge bases and decision trees. However, the advent of Large Language Models (LLMs) such as ChatGPT has revolutionized AI assistants, employing advanced deep learning architecture to generate context-aware, coherent, and personalized responses. Consequently, ChatGPT-based AI assistants provide a more natural and efficient user experience across various tasks and domains. In this paper, we study how LLM models such as ChatGPT can be used to improve work efficiency in collaborative workplaces. Specifically, we present an LLM-based Smart Reply (LSR) system utilizing the ChatGPT to generate personalized responses in daily collaborative scenarios, while adapting to context and communication style based on prior responses. Our two-step process involves generating a preliminary response type (e.g., Agree, Disagree) to provide a generalized direction for message generation, thus reducing response drafting time. We conducted an experiment in which participants completed simulated work tasks, involving a Dual N-back test and subtask scheduling through Google Calendar while interacting with researchers posing as co-workers. Our findings indicate that the proposed LSR reduces overall workload, as measured by the NASA TLX, and improves work performance and productivity in the N-back task. We also provide qualitative feedback on participants' experiences as well as design recommendations so as to provide future directions for the design of these technologies.
Performance of modern trackers degrades substantially on transparent objects compared to opaque objects. This is largely due to two distinct reasons. Transparent objects are unique in that their appearance is directly affected by the background. Furthermore, transparent object scenes often contain many visually similar objects (distractors), which often lead to tracking failure. However, development of modern tracking architectures requires large training sets, which do not exist in transparent object tracking. We present two contributions addressing the aforementioned issues. We propose the first transparent object tracking training dataset Trans2k that consists of over 2k sequences with 104,343 images overall, annotated by bounding boxes and segmentation masks. Standard trackers trained on this dataset consistently improve by up to 16%. Our second contribution is a new distractor-aware transparent object tracker (DiTra) that treats localization accuracy and target identification as separate tasks and implements them by a novel architecture. DiTra sets a new state-of-the-art in transparent object tracking and generalizes well to opaque objects.
We investigate multiuser uplink communication from multiple single-antenna users to a base station (BS), which is equipped with a movable-antenna (MA) array and adopts zero-forcing receivers to decode multiple signals. We aim to optimize the MAs' positions at the BS, to minimize the total transmit power of all users subject to the minimum rate requirement. After applying transformations, we show that the problem is equivalent to minimizing the sum of each eigenvalue's reciprocal of a matrix, which is a function of all MAs' positions. Subsequently, the projected gradient descent (PGD) method is utilized to find a locally optimal solution. In particular, different from the latest related work, we exploit the eigenvalue decomposition to successfully derive a closed-form gradient for the PGD, which facilitates the practical implementation greatly. We demonstrate by simulations that via careful optimization for all MAs' positions in our proposed design, the total transmit power of all users can be decreased significantly as compared to competitive benchmarks.
Collaborative Edge Computing (CEC) is a new edge computing paradigm that enables neighboring edge servers to share computational resources with each other. Although CEC can enhance the utilization of computational resources, it still suffers from resource waste. The primary reason is that end-users from the same area are likely to offload similar tasks to edge servers, thereby leading to duplicate computations. To improve system efficiency, the computation results of previously executed tasks can be cached and then reused by subsequent tasks. However, most existing computation reuse algorithms only consider one edge server, which significantly limits the effectiveness of computation reuse. To address this issue, this paper applies computation reuse in CEC networks to exploit the collaboration among edge servers. We formulate an optimization problem that aims to minimize the overall task response time and decompose it into a caching subproblem and a scheduling subproblem. By analyzing the properties of optimal solutions, we show that the optimal caching decisions can be efficiently searched using the bisection method. For the scheduling subproblem, we utilize projected gradient descent and backtracking to find a local minimum. Numerical results show that our algorithm significantly reduces the response time in various situations.
Most existing knowledge graphs suffer from incompleteness, which can be alleviated by inferring missing links based on known facts. One popular way to accomplish this is to generate low-dimensional embeddings of entities and relations, and use these to make inferences. ConvE, a recently proposed approach, applies convolutional filters on 2D reshapings of entity and relation embeddings in order to capture rich interactions between their components. However, the number of interactions that ConvE can capture is limited. In this paper, we analyze how increasing the number of these interactions affects link prediction performance, and utilize our observations to propose InteractE. InteractE is based on three key ideas -- feature permutation, a novel feature reshaping, and circular convolution. Through extensive experiments, we find that InteractE outperforms state-of-the-art convolutional link prediction baselines on FB15k-237. Further, InteractE achieves an MRR score that is 9%, 7.5%, and 23% better than ConvE on the FB15k-237, WN18RR and YAGO3-10 datasets respectively. The results validate our central hypothesis -- that increasing feature interaction is beneficial to link prediction performance. We make the source code of InteractE available to encourage reproducible research.
Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at //github.com/google-research/google-research/tree/master/cluster_gcn.