The demand for psychological counselling has grown significantly in recent years, particularly with the global outbreak of COVID-19, which has heightened the need for timely and professional mental health support. Online psychological counselling has emerged as the predominant mode of providing services in response to this demand. In this study, we propose the Psy-LLM framework, an AI-based assistive tool leveraging Large Language Models (LLMs) for question-answering in psychological consultation settings to ease the demand for mental health professions. Our framework combines pre-trained LLMs with real-world professional Q\&A from psychologists and extensively crawled psychological articles. The Psy-LLM framework serves as a front-end tool for healthcare professionals, allowing them to provide immediate responses and mindfulness activities to alleviate patient stress. Additionally, it functions as a screening tool to identify urgent cases requiring further assistance. We evaluated the framework using intrinsic metrics, such as perplexity, and extrinsic evaluation metrics, with human participant assessments of response helpfulness, fluency, relevance, and logic. The results demonstrate the effectiveness of the Psy-LLM framework in generating coherent and relevant answers to psychological questions. This article discusses the potential and limitations of using large language models to enhance mental health support through AI technologies.
In recent years, DL has developed rapidly, and personalized services are exploring using DL algorithms to improve the performance of the recommendation system. For personalized services, a successful recommendation consists of two parts: attracting users to click the item and users being willing to consume the item. If both tasks need to be predicted at the same time, traditional recommendation systems generally train two independent models. This approach is cumbersome and does not effectively model the relationship between the two subtasks of "click-consumption". Therefore, in order to improve the success rate of recommendation and reduce computational costs, researchers are trying to model multi-task learning. At present, existing multi-task learning models generally adopt hard parameter sharing or soft parameter sharing architecture, but these two architectures each have certain problems. Therefore, in this work, we propose a novel recommendation model based on real recommendation scenarios, Deep Cross network based on RNN for partial parameter sharing (DCRNN). The model has three innovations: 1) It adopts the idea of cross network and uses RNN network to cross-process the features, thereby effectively improves the expressive ability of the model; 2) It innovatively proposes the structure of partial parameter sharing; 3) It can effectively capture the potential correlation between different tasks to optimize the efficiency and methods for learning different tasks.
We consider a fully-decentralized scenario in which no central trusted entity exists and all clients are honest-but-curious. The state-of-the-art approaches to this problem often rely on cryptographic protocols, such as multiparty computation (MPC), that require mapping real-valued data to a discrete alphabet, specifically a finite field. These approaches, however, can result in substantial accuracy losses due to computation overflows. To address this issue, we propose A-MPC, a private analog MPC protocol that performs all computations in the analog domain. We characterize the privacy of individual datasets in terms of $(\epsilon, \delta)$-local differential privacy, where the privacy of a single record in each client's dataset is guaranteed against other participants. In particular, we characterize the required noise variance in the Gaussian mechanism in terms of the required $(\epsilon,\delta)$-local differential privacy parameters by solving an optimization problem. Furthermore, compared with existing decentralized protocols, A-MPC keeps the privacy of individual datasets against the collusion of all other participants, thereby, in a notably significant improvement, increasing the maximum number of colluding clients tolerated in the protocol by a factor of three compared with the state-of-the-art collaborative learning protocols. Our experiments illustrate that the accuracy of the proposed $(\epsilon,\delta)$-locally differential private logistic regression and linear regression models trained in a fully-decentralized fashion using A-MPC closely follows that of a centralized one performed by a single trusted entity.
Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single file. This over-simplified setting falls short of representing the real-world software development scenario where repositories span multiple files with numerous cross-file dependencies, and accessing and understanding cross-file context is often required to complete the code correctly. To fill in this gap, we propose CrossCodeEval, a diverse and multilingual code completion benchmark that necessitates an in-depth cross-file contextual understanding to complete the code accurately. CrossCodeEval is built on a diverse set of real-world, open-sourced, permissively-licensed repositories in four popular programming languages: Python, Java, TypeScript, and C#. To create examples that strictly require cross-file context for accurate completion, we propose a straightforward yet efficient static-analysis-based approach to pinpoint the use of cross-file context within the current file. Extensive experiments on state-of-the-art code language models like CodeGen and StarCoder demonstrate that CrossCodeEval is extremely challenging when the relevant cross-file context is absent, and we see clear improvements when adding these context into the prompt. However, despite such improvements, the pinnacle of performance remains notably unattained even with the highest-performing model, indicating that CrossCodeEval is also capable of assessing model's capability in leveraging extensive context to make better code completion. Finally, we benchmarked various methods in retrieving cross-file context, and show that CrossCodeEval can also be used to measure the capability of code retrievers.
Robotic dexterous grasping is a challenging problem due to the high degree of freedom (DoF) and complex contacts of multi-fingered robotic hands. Existing deep reinforcement learning (DRL) based methods leverage human demonstrations to reduce sample complexity due to the high dimensional action space with dexterous grasping. However, less attention has been paid to hand-object interaction representations for high-level generalization. In this paper, we propose a novel geometric and spatial hand-object interaction representation, named DexRep, to capture dynamic object shape features and the spatial relations between hands and objects during grasping. DexRep comprises Occupancy Feature for rough shapes within sensing range by moving hands, Surface Feature for changing hand-object surface distances, and Local-Geo Feature for local geometric surface features most related to potential contacts. Based on the new representation, we propose a dexterous deep reinforcement learning method to learn a generalizable grasping policy DexRepNet. Experimental results show that our method outperforms baselines using existing representations for robotic grasping dramatically both in grasp success rate and convergence speed. It achieves a 93% grasping success rate on seen objects and higher than 80% grasping success rates on diverse objects of unseen categories in both simulation and real-world experiments.
Robots need to predict and react to human motions to navigate through a crowd without collisions. Many existing methods decouple prediction from planning, which does not account for the interaction between robot and human motions and can lead to the robot getting stuck. We propose SICNav, a Model Predictive Control (MPC) method that jointly solves for robot motion and predicted crowd motion in closed-loop. We model each human in the crowd to be following an Optimal Reciprocal Collision Avoidance (ORCA) scheme and embed that model as a constraint in the robot's local planner, resulting in a bilevel nonlinear MPC optimization problem. We use a KKT-reformulation to cast the bilevel problem as a single level and use a nonlinear solver to optimize. Our MPC method can influence pedestrian motion while explicitly satisfying safety constraints in a single-robot multi-human environment. We analyze the performance of SICNav in a simulation environment to demonstrate safe robot motion that can influence the surrounding humans. We also validate the trajectory forecasting performance of ORCA on a human trajectory dataset.
We introduce a new debiasing framework for high-dimensional linear regression that bypasses the restrictions on covariate distributions imposed by modern debiasing technology. We study the prevalent setting where the number of features and samples are both large and comparable. In this context, state-of-the-art debiasing technology uses a degrees-of-freedom correction to remove the shrinkage bias of regularized estimators and conduct inference. However, this method requires that the observed samples are i.i.d., the covariates follow a mean zero Gaussian distribution, and reliable covariance matrix estimates for observed features are available. This approach struggles when (i) covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii) rows of the design exhibit heterogeneity or dependencies, and (iii) reliable feature covariance estimates are lacking. To address these, we develop a new strategy where the debiasing correction is a rescaled gradient descent step (suitably initialized) with step size determined by the spectrum of the sample covariance matrix. Unlike prior work, we assume that eigenvectors of this matrix are uniform draws from the orthogonal group. We show this assumption remains valid in diverse situations where traditional debiasing fails, including designs with complex row-column dependencies, heavy tails, asymmetric properties, and latent low-rank structures. We establish asymptotic normality of our proposed estimator (centered and scaled) under various convergence notions. Moreover, we develop a consistent estimator for its asymptotic variance. Lastly, we introduce a debiased Principal Components Regression (PCR) technique using our Spectrum-Aware approach. In varied simulations and real data experiments, we observe that our method outperforms degrees-of-freedom debiasing by a margin.
The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily, and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection), a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally, thus providing increased privacy and confidentiality assurances. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of previously unknown 42 malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.
In recent years, Face Image Quality Assessment (FIQA) has become an indispensable part of the face recognition system to guarantee the stability and reliability of recognition performance in an unconstrained scenario. For this purpose, the FIQA method should consider both the intrinsic property and the recognizability of the face image. Most previous works aim to estimate the sample-wise embedding uncertainty or pair-wise similarity as the quality score, which only considers the information from partial intra-class. However, these methods ignore the valuable information from the inter-class, which is for estimating to the recognizability of face image. In this work, we argue that a high-quality face image should be similar to its intra-class samples and dissimilar to its inter-class samples. Thus, we propose a novel unsupervised FIQA method that incorporates Similarity Distribution Distance for Face Image Quality Assessment (SDD-FIQA). Our method generates quality pseudo-labels by calculating the Wasserstein Distance (WD) between the intra-class similarity distributions and inter-class similarity distributions. With these quality pseudo-labels, we are capable of training a regression network for quality prediction. Extensive experiments on benchmark datasets demonstrate that the proposed SDD-FIQA surpasses the state-of-the-arts by an impressive margin. Meanwhile, our method shows good generalization across different recognition systems.
Defensive deception is a promising approach for cyberdefense. Although defensive deception is increasingly popular in the research community, there has not been a systematic investigation of its key components, the underlying principles, and its tradeoffs in various problem settings. This survey paper focuses on defensive deception research centered on game theory and machine learning, since these are prominent families of artificial intelligence approaches that are widely employed in defensive deception. This paper brings forth insights, lessons, and limitations from prior work. It closes with an outline of some research directions to tackle major gaps in current defensive deception research.
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.