Mental health is a pressing concern in today's digital age, particularly among youth who are deeply intertwined with technology. Despite the influx of technology solutions addressing mental health issues, youth often remain sidelined during the design process. While co-design methods have been employed to improve participation by youth, many such initiatives are limited to design activities and lack training for youth to research and develop solutions for themselves. In this case study, we detail our 8-week remote, collaborative research initiative called Youth WellTech, designed to facilitate remote co-design sprints aimed at equipping youth with the tools and knowledge to envision and design tech futures for their own communities. We pilot this initiative with 12 student technology evangelists across 8 countries globally to foster the sharing of mental health challenges and diverse perspectives. We highlight insights from our experiences running this global program remotely, its structure, and recommendations for co-research.
In recent years, the expansion of internet technology and advancements in automation have brought significant attention to autonomous driving technology. Major automobile manufacturers, including Volvo, Mercedes-Benz, and Tesla, have progressively introduced products ranging from assisted-driving vehicles to semi-autonomous vehicles. However, this period has also witnessed several traffic safety incidents involving self-driving vehicles. For instance, in March 2016, a Google self-driving car was involved in a minor collision with a bus. At the time of the accident, the autonomous vehicle was attempting to merge into the right lane but failed to dynamically respond to the real-time environmental information during the lane change. It incorrectly assumed that the approaching bus would slow down to avoid it, leading to a low-speed collision with the bus. This incident highlights the current technological shortcomings and safety concerns associated with autonomous lane-changing behavior, despite the rapid advancements in autonomous driving technology. Lane-changing is among the most common and hazardous behaviors in highway driving, significantly impacting traffic safety and flow. Therefore, lane-changing is crucial for traffic safety, and accurately predicting drivers' lane change intentions can markedly enhance driving safety. This paper introduces a deep learning-based prediction method for autonomous driving lane change behavior, aiming to facilitate safe lane changes and thereby improve road safety.
We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain. We aim to solve this problem via online distributionally robust Markov decision processes (DRMDPs), where the learning algorithm actively interacts with the source domain while seeking the optimal performance under the worst possible dynamics that is within an uncertainty set of the source domain's transition kernel. We provide the first study on online DRMDPs with function approximation for off-dynamics RL. We find that DRMDPs' dual formulation can induce nonlinearity, even when the nominal transition kernel is linear, leading to error propagation. By designing a $d$-rectangular uncertainty set using the total variation distance, we remove this additional nonlinearity and bypass the error propagation. We then introduce DR-LSVI-UCB, the first provably efficient online DRMDP algorithm for off-dynamics RL with function approximation, and establish a polynomial suboptimality bound that is independent of the state and action space sizes. Our work makes the first step towards a deeper understanding of the provable efficiency of online DRMDPs with linear function approximation. Finally, we substantiate the performance and robustness of DR-LSVI-UCB through different numerical experiments.
State-of-the-art neural retrievers predominantly focus on high-resource languages like English, which impedes their adoption in retrieval scenarios involving other languages. Current approaches circumvent the lack of high-quality labeled data in non-English languages by leveraging multilingual pretrained language models capable of cross-lingual transfer. However, these models require substantial task-specific fine-tuning across multiple languages, often perform poorly in languages with minimal representation in the pretraining corpus, and struggle to incorporate new languages after the pretraining phase. In this work, we present a novel modular dense retrieval model that learns from the rich data of a single high-resource language and effectively zero-shot transfers to a wide array of languages, thereby eliminating the need for language-specific labeled data. Our model, ColBERT-XM, demonstrates competitive performance against existing state-of-the-art multilingual retrievers trained on more extensive datasets in various languages. Further analysis reveals that our modular approach is highly data-efficient, effectively adapts to out-of-distribution data, and significantly reduces energy consumption and carbon emissions. By demonstrating its proficiency in zero-shot scenarios, ColBERT-XM marks a shift towards more sustainable and inclusive retrieval systems, enabling effective information accessibility in numerous languages. We publicly release our code and models for the community.
With the rapid development of Mobile Edge Computing (MEC), various real-time applications have been deployed to benefit people's daily lives. The performance of these applications relies heavily on the freshness of collected environmental information, which can be quantified by its Age of Information (AoI). In the traditional definition of AoI, it is assumed that the status information can be actively sampled and directly used. However, for many MEC-enabled applications, the desired status information is updated in an event-driven manner and necessitates data processing. To better serve these applications, we propose a new definition of AoI and, based on the redefined AoI, we formulate an online AoI minimization problem for MEC systems. Notably, the problem can be interpreted as a Markov Decision Process (MDP), thus enabling its solution through Reinforcement Learning (RL) algorithms. Nevertheless, the traditional RL algorithms are designed for MDPs with completely unknown system dynamics and hence usually suffer long convergence times. To accelerate the learning process, we introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness. Numerical results demonstrate that our algorithm outperforms the benchmarks under various scenarios.
We introduce CyberDemo, a novel approach to robotic imitation learning that leverages simulated human demonstrations for real-world tasks. By incorporating extensive data augmentation in a simulated environment, CyberDemo outperforms traditional in-domain real-world demonstrations when transferred to the real world, handling diverse physical and visual conditions. Regardless of its affordability and convenience in data collection, CyberDemo outperforms baseline methods in terms of success rates across various tasks and exhibits generalizability with previously unseen objects. For example, it can rotate novel tetra-valve and penta-valve, despite human demonstrations only involving tri-valves. Our research demonstrates the significant potential of simulated human demonstrations for real-world dexterous manipulation tasks. More details can be found at //cyber-demo.github.io
The integration of medical imaging, computational analysis, and robotic technology has brought about a significant transformation in minimally invasive surgical procedures, particularly in the realm of laparoscopic rectal surgery (LRS). This specialized surgical technique, aimed at addressing rectal cancer, requires an in-depth comprehension of the spatial dynamics within the narrow space of the pelvis. Leveraging Magnetic Resonance Imaging (MRI) scans as a foundational dataset, this study incorporates them into Computer-Aided Design (CAD) software to generate precise three-dimensional (3D) reconstructions of the patient's anatomy. At the core of this research is the analysis of the surgical workspace, a critical aspect in the optimization of robotic interventions. Sophisticated computational algorithms process MRI data within the CAD environment, meticulously calculating the dimensions and contours of the pelvic internal regions. The outcome is a nuanced understanding of both viable and restricted zones during LRS, taking into account factors such as curvature, diameter variations, and potential obstacles. This paper delves deeply into the complexities of workspace analysis for robotic LRS, illustrating the seamless collaboration between medical imaging, CAD software, and surgical robotics. Through this interdisciplinary approach, the study aims to surpass traditional surgical methodologies, offering novel insights for a paradigm shift in optimizing robotic interventions within the complex environment of the pelvis.
The fiercely competitive business environment and increasingly personalized customization needs are driving the digital transformation and upgrading of the manufacturing industry. IIoT intelligence, which can provide innovative and efficient solutions for various aspects of the manufacturing value chain, illuminates the path of transformation for the manufacturing industry. It's time to provide a systematic vision of IIoT intelligence. However, existing surveys often focus on specific areas of IIoT intelligence, leading researchers and readers to have biases in their understanding of IIoT intelligence, that is, believing that research in one direction is the most important for the development of IIoT intelligence, while ignoring contributions from other directions. Therefore, this paper provides a comprehensive overview of IIoT intelligence. We first conduct an in-depth analysis of the inevitability of manufacturing transformation and study the successful experiences from the practices of Chinese enterprises. Then we give our definition of IIoT intelligence and demonstrate the value of IIoT intelligence for industries in fucntions, operations, deployments, and application. Afterwards, we propose a hierarchical development architecture for IIoT intelligence, which consists of five layers. The practical values of technical upgrades at each layer are illustrated by a close look on lighthouse factories. Following that, we identify seven kinds of technologies that accelerate the transformation of manufacturing, and clarify their contributions. The ethical implications and environmental impacts of adopting IIoT intelligence in manufacturing are analyzed as well. Finally, we explore the open challenges and development trends from four aspects to inspire future researches.
Sociotechnical research increasingly includes the social sub-networks that emerge from large-scale sociotechnical infrastructure, including the infrastructure for building open source software. This paper addresses these numerous sub-networks as advantageous for researchers. It provides a methodological synthesis focusing on how researchers can best span adjacent social sub-networks during engaged field research. Specifically, we describe practices and artifacts that aid movement from one social subsystem within a more extensive technical infrastructure to another. To surface the importance of spanning sub-networks, we incorporate a discussion of social capital and the role of technical infrastructure in its development for sociotechnical researchers. We then characterize a five-step process for spanning social sub-networks during engaged field research: commitment, context mapping, jargon competence, returning value, and bridging. We then present our experience studying corporate open source software projects and the role of that experience in accelerating our work in open source scientific software research as described through the lens of bridging social capital. Based on our analysis, we offer recommendations for engaging in fieldwork in adjacent social sub-networks that share a technical context and discussion of how the relationship between social and technically acquired social capital is a missing but critical methodological dimension for research on large-scale sociotechnical research.
Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at //github.com/davidhalladay/Frido.
Over the past few years, the rapid development of deep learning technologies for computer vision has greatly promoted the performance of medical image segmentation (MedISeg). However, the recent MedISeg publications usually focus on presentations of the major contributions (e.g., network architectures, training strategies, and loss functions) while unwittingly ignoring some marginal implementation details (also known as "tricks"), leading to a potential problem of the unfair experimental result comparisons. In this paper, we collect a series of MedISeg tricks for different model implementation phases (i.e., pre-training model, data pre-processing, data augmentation, model implementation, model inference, and result post-processing), and experimentally explore the effectiveness of these tricks on the consistent baseline models. Compared to paper-driven surveys that only blandly focus on the advantages and limitation analyses of segmentation models, our work provides a large number of solid experiments and is more technically operable. With the extensive experimental results on both the representative 2D and 3D medical image datasets, we explicitly clarify the effect of these tricks. Moreover, based on the surveyed tricks, we also open-sourced a strong MedISeg repository, where each of its components has the advantage of plug-and-play. We believe that this milestone work not only completes a comprehensive and complementary survey of the state-of-the-art MedISeg approaches, but also offers a practical guide for addressing the future medical image processing challenges including but not limited to small dataset learning, class imbalance learning, multi-modality learning, and domain adaptation. The code has been released at: //github.com/hust-linyi/MedISeg