We propose a smart dimming sunglasses system for individuals with photophobia, especially those who are easily irritated by light intensity. The system uses a spatial light modulator (SLM) to selectively filter light entering the eye based on the scene detection of a camera. By controlling the transmittance of each pixel on the SLM using a modulation function, the proposed sunglasses enable an automated non-linear field of view dimming and also flexible light modulation that meets the photophobic user's visual requirements. Meanwhile, an occlusion mask created on the SLM, which possesses low transmittance to block the incoming light rays, appears blurred from the eye since the focal plane is not on the SLM and blocks the light stimulation insufficiently. To solve this problem, the aperture-based expanded mask has been used in past studies, however, the excessive large expansion ratio used in this approach leads to over-blocking (occlusion leak). In this work, we build an optimization model by simulating the defocused occlusion mask and determining the effective contribution of the degraded pixels based on the occlusion efficiency of the pixel transmittance. While the non-processed mask cannot provide sufficient occlusion and the aperture-based expanded mask causes occlusion leak, our optimized mask attenuates the intensely bright areas to a proper brightness without incorrectly attenuating surrounding areas that no need to modulation.
In the modern era of Deep Learning, network parameters play a vital role in models efficiency but it has its own limitations like extensive computations and memory requirements, which may not be suitable for real time intelligent robot grasping tasks. Current research focuses on how the model efficiency can be maintained by introducing sparsity but without compromising accuracy of the model in the robot grasping domain. More specifically, in this research two light-weighted neural networks have been introduced, namely Sparse-GRConvNet and Sparse-GINNet, which leverage sparsity in the robotic grasping domain for grasp pose generation by integrating the Edge-PopUp algorithm. This algorithm facilitates the identification of the top K% of edges by considering their respective score values. Both the Sparse-GRConvNet and Sparse-GINNet models are designed to generate high-quality grasp poses in real-time at every pixel location, enabling robots to effectively manipulate unfamiliar objects. We extensively trained our models using two benchmark datasets: Cornell Grasping Dataset (CGD) and Jacquard Grasping Dataset (JGD). Both Sparse-GRConvNet and Sparse-GINNet models outperform the current state-of-the-art methods in terms of performance, achieving an impressive accuracy of 97.75% with only 10% of the weight of GR-ConvNet and 50% of the weight of GI-NNet, respectively, on CGD. Additionally, Sparse-GRConvNet achieve an accuracy of 85.77% with 30% of the weight of GR-ConvNet and Sparse-GINNet achieve an accuracy of 81.11% with 10% of the weight of GI-NNet on JGD. To validate the performance of our proposed models, we conducted extensive experiments using the Anukul (Baxter) hardware cobot.
Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands posed by the self-attention mechanism and the large feedforward network in Transformers limit their ability to handle long sequences, thereby creating challenges for tasks involving multiple long sequences or long-term dependencies. We present a distinct approach, Blockwise Parallel Transformer (BPT), that leverages blockwise computation of self-attention and feedforward network fusion to minimize memory costs. By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences up to 32 times longer than vanilla Transformers and 2 to 4 times longer than previous memory-efficient methods. Extensive experiments on language modeling and reinforcement learning tasks demonstrate the effectiveness of BPT in reducing memory requirements and improving performance.
Recommendation systems aim to provide users with relevant suggestions, but often lack interpretability and fail to capture higher-level semantic relationships between user behaviors and profiles. In this paper, we propose a novel approach that leverages large language models (LLMs) to construct personalized reasoning graphs. These graphs link a user's profile and behavioral sequences through causal and logical inferences, representing the user's interests in an interpretable way. Our approach, LLM reasoning graphs (LLMRG), has four components: chained graph reasoning, divergent extension, self-verification and scoring, and knowledge base self-improvement. The resulting reasoning graph is encoded using graph neural networks, which serves as additional input to improve conventional recommender systems, without requiring extra user or item information. Our approach demonstrates how LLMs can enable more logical and interpretable recommender systems through personalized reasoning graphs. LLMRG allows recommendations to benefit from both engineered recommendation systems and LLM-derived reasoning graphs. We demonstrate the effectiveness of LLMRG on benchmarks and real-world scenarios in enhancing base recommendation models.
Metaverse-enabled digital healthcare systems are expected to exploit an unprecedented amount of personal health data, while ensuring that sensitive or private information of individuals are not disclosed. Machine learning and artificial intelligence (ML/AI) techniques can be widely utilized in metaverse healthcare systems, such as virtual clinics and intelligent consultations. In such scenarios, the key challenge is that data privacy laws might not allow virtual clinics to share their medical data with other parties. Moreover, clinical AI/ML models themselves carry extensive information about the medical datasets, such that private attributes can be easily inferred by malicious actors in the metaverse (if not rigorously privatized). In this paper, inspired by the idea of "incognito mode", which has recently been developed as a promising solution to safeguard metaverse users' privacy, we propose global differential privacy for the distributed metaverse healthcare systems. In our scheme, a randomized mechanism in the format of artificial "mix-up" noise is applied to the federated clinical ML/AI models before sharing with other peers. This way, we provide an adjustable level of distributed privacy against both the malicious actors and honest-but curious metaverse servers. Our evaluations on breast cancer Wisconsin dataset (BCWD) highlight the privacy-utility trade-off (PUT) in terms of diagnosis accuracy and loss function for different levels of privacy. We also compare our private scheme with the non-private centralized setup in terms of diagnosis accuracy.
The goal of community detection over graphs is to recover underlying labels/attributes of users (e.g., political affiliation) given the connectivity between users (represented by adjacency matrix of a graph). There has been significant recent progress on understanding the fundamental limits of community detection when the graph is generated from a stochastic block model (SBM). Specifically, sharp information theoretic limits and efficient algorithms have been obtained for SBMs as a function of $p$ and $q$, which represent the intra-community and inter-community connection probabilities. In this paper, we study the community detection problem while preserving the privacy of the individual connections (edges) between the vertices. Focusing on the notion of $(\epsilon, \delta)$-edge differential privacy (DP), we seek to understand the fundamental tradeoffs between $(p, q)$, DP budget $(\epsilon, \delta)$, and computational efficiency for exact recovery of the community labels. To this end, we present and analyze the associated information-theoretic tradeoffs for three broad classes of differentially private community recovery mechanisms: a) stability based mechanism; b) sampling based mechanisms; and c) graph perturbation mechanisms. Our main findings are that stability and sampling based mechanisms lead to a superior tradeoff between $(p,q)$ and the privacy budget $(\epsilon, \delta)$; however this comes at the expense of higher computational complexity. On the other hand, albeit low complexity, graph perturbation mechanisms require the privacy budget $\epsilon$ to scale as $\Omega(\log(n))$ for exact recovery. To the best of our knowledge, this is the first work to study the impact of privacy constraints on the fundamental limits for community detection.
Over the past few decades, ubiquitous sensors and systems have been an integral part of humans' everyday life. They augment human capabilities and provide personalized experiences across diverse contexts such as healthcare, education, and transportation. However, the widespread adoption of ubiquitous computing has also brought forth concerns regarding fairness and equitable treatment. As these systems can make automated decisions that impact individuals, it is essential to ensure that they do not perpetuate biases or discriminate against specific groups. While fairness in ubiquitous computing has been an acknowledged concern since the 1990s, it remains understudied within the field. To bridge this gap, we propose a framework that incorporates fairness considerations into system design, including prioritizing stakeholder perspectives, inclusive data collection, fairness-aware algorithms, appropriate evaluation criteria, enhancing human engagement while addressing privacy concerns, and interactive improvement and regular monitoring. Our framework aims to guide the development of fair and unbiased ubiquitous computing systems, ensuring equal treatment and positive societal impact.
The Weighted Connectivity Augmentation Problem is the problem of augmenting the edge-connectivity of a given graph by adding links of minimum total cost. This work focuses on connectivity augmentation problems in the Steiner setting, where we are not interested in the connectivity between all nodes of the graph, but only the connectivity between a specified subset of terminals. We consider two related settings. In the Steiner Augmentation of a Graph problem ($k$-SAG), we are given a $k$-edge-connected subgraph $H$ of a graph $G$. The goal is to augment $H$ by including links and nodes from $G$ of minimum cost so that the edge-connectivity between nodes of $H$ increases by 1. In the Steiner Connectivity Augmentation Problem ($k$-SCAP), we are given a Steiner $k$-edge-connected graph connecting terminals $R$, and we seek to add links of minimum cost to create a Steiner $(k+1)$-edge-connected graph for $R$. Note that $k$-SAG is a special case of $k$-SCAP. All of the above problems can be approximated to within a factor of 2 using e.g. Jain's iterative rounding algorithm for Survivable Network Design. In this work, we leverage the framework of Traub and Zenklusen to give a $(1 + \ln{2} +\varepsilon)$-approximation for the Steiner Ring Augmentation Problem (SRAP): given a cycle $H = (V(H),E)$ embedded in a larger graph $G = (V, E \cup L)$ and a subset of terminals $R \subseteq V(H)$, choose a subset of links $S \subseteq L$ of minimum cost so that $(V, E \cup S)$ has 3 pairwise edge-disjoint paths between every pair of terminals. We show this yields a polynomial time algorithm with approximation ratio $(1 + \ln{2} + \varepsilon)$ for $2$-SCAP. We obtain an improved approximation guarantee of $(1.5+\varepsilon)$ for SRAP in the case that $R = V(H)$, which yields a $(1.5+\varepsilon)$-approximation for $k$-SAG for any $k$.
Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.