Asymmetry is a crucial characteristic of bilateral mammograms (Bi-MG) when abnormalities are developing. It is widely utilized by radiologists for diagnosis. The question of 'what the symmetrical Bi-MG would look like when the asymmetrical abnormalities have been removed ?' has not yet received strong attention in the development of algorithms on mammograms. Addressing this question could provide valuable insights into mammographic anatomy and aid in diagnostic interpretation. Hence, we propose a novel framework, DisAsymNet, which utilizes asymmetrical abnormality transformer guided self-adversarial learning for disentangling abnormalities and symmetric Bi-MG. At the same time, our proposed method is partially guided by randomly synthesized abnormalities. We conduct experiments on three public and one in-house dataset, and demonstrate that our method outperforms existing methods in abnormality classification, segmentation, and localization tasks. Additionally, reconstructed normal mammograms can provide insights toward better interpretable visual cues for clinical diagnosis. The code will be accessible to the public.
Heavy Ball (HB) nowadays is one of the most popular momentum methods in non-convex optimization. It has been widely observed that incorporating the Heavy Ball dynamic in gradient-based methods accelerates the training process of modern machine learning models. However, the progress on establishing its theoretical foundation of acceleration is apparently far behind its empirical success. Existing provable acceleration results are of the quadratic or close-to-quadratic functions, as the current techniques of showing HB's acceleration are limited to the case when the Hessian is fixed. In this work, we develop some new techniques that help show acceleration beyond quadratics, which is achieved by analyzing how the change of the Hessian at two consecutive time points affects the convergence speed. Based on our technical results, a class of Polyak-\L{}ojasiewicz (PL) optimization problems for which provable acceleration can be achieved via HB is identified. Moreover, our analysis demonstrates a benefit of adaptively setting the momentum parameter. (Update: 08/29/2023) Erratum is added in Appendix J. This is an updated version that fixes an issue in the previous version. An additional condition needs to be satisfied for the acceleration result of HB beyond quadratics in this work, which naturally holds when the dimension is one or, more broadly, when the Hessian is diagonal. We elaborate on the issue in Appendix J.
The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmentation. In this paper, instead of using prompts during the inference stage, we introduce a pipeline that utilizes the SAM, called all-in-SAM, through the entire AI development workflow (from annotation generation to model finetuning) without requiring manual prompts during the inference stage. Specifically, SAM is first employed to generate pixel-level annotations from weak prompts (e.g., points, bounding box). Then, the pixel-level annotations are used to finetune the SAM segmentation model rather than training from scratch. Our experimental results reveal two key findings: 1) the proposed pipeline surpasses the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset, and 2) the utilization of weak and few annotations for SAM finetuning achieves competitive performance compared to using strong pixel-wise annotated data.
There is a growing interest in using pose estimation algorithms for video-based assessment of Bradykinesia in Parkinson's Disease (PD) to facilitate remote disease assessment and monitoring. However, the accuracy of pose estimation algorithms in videos from video streaming services during Telehealth appointments has not been studied. In this study, we used seven off-the-shelf hand pose estimation models to estimate the movement of the thumb and index fingers in videos of the finger-tapping (FT) test recorded from Healthy Controls (HC) and participants with PD and under two different conditions: streaming (videos recorded during a live Zoom meeting) and on-device (videos recorded locally with high-quality cameras). The accuracy and reliability of the models were estimated by comparing the models' output with manual results. Three of the seven models demonstrated good accuracy for on-device recordings, and the accuracy decreased significantly for streaming recordings. We observed a negative correlation between movement speed and the model's accuracy for the streaming recordings. Additionally, we evaluated the reliability of ten movement features related to bradykinesia extracted from video recordings of PD patients performing the FT test. While most of the features demonstrated excellent reliability for on-device recordings, most of the features demonstrated poor to moderate reliability for streaming recordings. Our findings highlight the limitations of pose estimation algorithms when applied to video recordings obtained during Telehealth visits, and demonstrate that on-device recordings can be used for automatic video-assessment of bradykinesia in PD.
Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST (Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the neural machine translation (NMT) field. Can we also boost end-to-end (E2E) speech-to-text translation (ST) by leveraging consistency regularization? In this paper, we conduct empirical studies on intra-modal and cross-modal consistency and propose two training strategies, SimRegCR and SimZeroCR, for E2E ST in regular and zero-shot scenarios. Experiments on the MuST-C benchmark show that our approaches achieve state-of-the-art (SOTA) performance in most translation directions. The analyses prove that regularization brought by the intra-modal consistency, instead of modality gap, is crucial for the regular E2E ST, and the cross-modal consistency could close the modality gap and boost the zero-shot E2E ST performance.
Lawful Interception (LI) is a legal obligation of Communication Service Providers (CSPs) to provide interception capabilities to Law Enforcement Agencies (LEAs) in order to gain insightful data from network communications for criminal proceedings, e.g., network identifiers for tracking suspects. With the privacy-enhancements of network identifiers in the 5th generation of mobile networks (5G), LEAs need to interact with CSPs for network identifier resolution. This raises new privacy issues, as untrusted CSPs are able to infer sensitive information about ongoing investigations, e.g., the identities of their subscribers under suspicion. In this work, we propose P3LI5, a novel system that enables LEAs to privately query CSPs for network identifier resolution leveraging on an information retrieval protocol, SparseWPIR, that is based on private information retrieval and its weakly private version. As such, P3LI5 can be adapted to various operational scenarios with different confidentiality or latency requirements, by selectively allowing a bounded information leakage for improved performance. We implement P3LI5 on the 5G LI infrastructure using well known open-source projects and demonstrate its scalability to large databases while retaining low latency. To the best of our knowledge, P3LI5 is the first proposal for addressing the privacy issues raised by the mandatory requirement for LI on the 5G core network.
Ranking is a crucial module using in the recommender system. In particular, the ranking module using in our YoungTao recommendation scenario is to provide an ordered list of items to users, to maximize the click number throughout the recommendation session for each user. However, we found that the traditional ranking method for optimizing Click-Through rate(CTR) cannot address our ranking scenario well, since it completely ignores user leaving, and CTR is the optimization goal for the one-step recommendation. To effectively undertake the purpose of our ranking module, we propose a long-term optimization goal, named as CTE (Click-Through quantity expectation), for explicitly taking the behavior of user leaving into account. Based on CTE, we propose an effective model trained by reinforcement learning. Moreover, we build a simulation environment from offline log data for estimating PBR and CTR. We conduct extensive experiments on offline datasets and an online e-commerce scenario in TaoBao. Experimental results show that our method can boost performance effectively
We introduce ChatSQC, an innovative chatbot system that combines the power of OpenAI's Large Language Models (LLM) with a specific knowledge base in Statistical Quality Control (SQC). Our research focuses on enhancing LLMs using specific SQC references, shedding light on how data preprocessing parameters and LLM selection impact the quality of generated responses. By illustrating this process, we hope to motivate wider community engagement to refine LLM design and output appraisal techniques. We also highlight potential research opportunities within the SQC domain that can be facilitated by leveraging ChatSQC, thereby broadening the application spectrum of SQC. A primary goal of our work is to equip practitioners with a tool capable of generating precise SQC-related responses, thereby democratizing access to advanced SQC knowledge. To continuously improve ChatSQC, we ask the SQC community to provide feedback, highlight potential issues, request additional features, and/or contribute via pull requests through our public GitHub repository. Additionally, the team will continue to explore adding supplementary reference material that would further improve the contextual understanding of the chatbot. Overall, ChatSQC serves as a testament to the transformative potential of AI within SQC, and we hope it will spur further advancements in the integration of AI in this field.
Recently, Mutual Information (MI) has attracted attention in bounding the generalization error of Deep Neural Networks (DNNs). However, it is intractable to accurately estimate the MI in DNNs, thus most previous works have to relax the MI bound, which in turn weakens the information theoretic explanation for generalization. To address the limitation, this paper introduces a probabilistic representation of DNNs for accurately estimating the MI. Leveraging the proposed MI estimator, we validate the information theoretic explanation for generalization, and derive a tighter generalization bound than the state-of-the-art relaxations.
Within the rapidly developing Internet of Things (IoT), numerous and diverse physical devices, Edge devices, Cloud infrastructure, and their quality of service requirements (QoS), need to be represented within a unified specification in order to enable rapid IoT application development, monitoring, and dynamic reconfiguration. But heterogeneities among different configuration knowledge representation models pose limitations for acquisition, discovery and curation of configuration knowledge for coordinated IoT applications. This paper proposes a unified data model to represent IoT resource configuration knowledge artifacts. It also proposes IoT-CANE (Context-Aware recommendatioN systEm) to facilitate incremental knowledge acquisition and declarative context driven knowledge recommendation.
Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at //github.com/kstant0725/SpectralNet .