With the increasing application of deep learning in various domains, salient object detection in optical remote sensing images (ORSI-SOD) has attracted significant attention. However, most existing ORSI-SOD methods predominantly rely on local information from low-level features to infer salient boundary cues and supervise them using boundary ground truth, but fail to sufficiently optimize and protect the local information, and almost all approaches ignore the potential advantages offered by the last layer of the decoder to maintain the integrity of saliency maps. To address these issues, we propose a novel method named boundary-semantic collaborative guidance network (BSCGNet) with dual-stream feedback mechanism. First, we propose a boundary protection calibration (BPC) module, which effectively reduces the loss of edge position information during forward propagation and suppresses noise in low-level features without relying on boundary ground truth. Second, based on the BPC module, a dual feature feedback complementary (DFFC) module is proposed, which aggregates boundary-semantic dual features and provides effective feedback to coordinate features across different layers, thereby enhancing cross-scale knowledge communication. Finally, to obtain more complete saliency maps, we consider the uniqueness of the last layer of the decoder for the first time and propose the adaptive feedback refinement (AFR) module, which further refines feature representation and eliminates differences between features through a unique feedback mechanism. Extensive experiments on three benchmark datasets demonstrate that BSCGNet exhibits distinct advantages in challenging scenarios and outperforms the 17 state-of-the-art (SOTA) approaches proposed in recent years. Codes and results have been released on GitHub: //github.com/YUHsss/BSCGNet.
We propose hybrid digital-analog learning algorithms on Rydberg atom arrays, combining the potentially practical utility and near-term realizability of quantum learning with the rapidly scaling architectures of neutral atoms. Our construction requires only single-qubit operations in the digital setting and global driving according to the Rydberg Hamiltonian in the analog setting. We perform a comprehensive numerical study of our algorithm on both classical and quantum data, given respectively by handwritten digit classification and unsupervised quantum phase boundary learning. We show in the two representative problems that digital-analog learning is not only feasible in the near term, but also requires shorter circuit depths and is more robust to realistic error models as compared to digital learning schemes. Our results suggest that digital-analog learning opens a promising path towards improved variational quantum learning experiments in the near term.
Fast, incremental evolution of physics instrumentation raises the question of efficient software abstraction and transferability of algorithms across similar technologies. This contribution aims to provide an answer by introducing Track Lab, a modern data acquisition program focusing on extensibility and high performance. Shipping with documented API and more than 20 standard modules, Track Lab allows complex analysis pipelines to be constructed from simple, reusable building blocks. Thanks to multi-threaded infrastructure, data can be clustered, filtered, aggregated and plotted concurrently in real-time. In addition, full hardware support for Timepix2, Timepix3 pixel detectors and embedded photomultiplier systems enables such analysis to be carried out online during data acquisition. Repetitive procedures can be automated with support for motorized stages and X-ray tubes. Freely distributed on 7 popular operating systems and 2 CPU architectures, Track Lab is a versatile tool for high energy physics research.
Distributing quantum information between remote systems will necessitate the integration of emerging quantum components with existing communication infrastructure. This requires understanding the channel-induced degradations of the transmitted quantum signals, beyond the typical characterization methods for classical communication systems. Here we report on a comprehensive characterization of a Boston-Area Quantum Network (BARQNET) telecom fiber testbed, measuring the time-of-flight, polarization, and phase noise imparted on transmitted signals. We further design and demonstrate a compensation system that is both resilient to these noise sources and compatible with integration of emerging quantum memory components on the deployed link. These results have utility for future work on the BARQNET as well as other quantum network testbeds in development, enabling near-term quantum networking demonstrations and informing what areas of technology development will be most impactful in advancing future system capabilities.
With the rapid development of deep learning in various fields of science and technology, such as speech recognition, image classification, and natural language processing, recently it is also widely applied in the functional data analysis (FDA) with some empirical success. However, due to the infinite dimensional input, we need a powerful dimension reduction method for functional learning tasks, especially for the nonlinear functional regression. In this paper, based on the idea of smooth kernel integral transformation, we propose a functional deep neural network with an efficient and fully data-dependent dimension reduction method. The architecture of our functional net consists of a kernel embedding step: an integral transformation with a data-dependent smooth kernel; a projection step: a dimension reduction by projection with eigenfunction basis based on the embedding kernel; and finally an expressive deep ReLU neural network for the prediction. The utilization of smooth kernel embedding enables our functional net to be discretization invariant, efficient, and robust to noisy observations, capable of utilizing information in both input functions and responses data, and have a low requirement on the number of discrete points for an unimpaired generalization performance. We conduct theoretical analysis including approximation error and generalization error analysis, and numerical simulations to verify these advantages of our functional net.
Safety assessment of crash and conflict avoidance systems is important for both the automotive industry and other stakeholders. One type of system that needs such an assessment is a driver monitoring system (DMS) with some intervention (e.g., warning or nudging) when the driver looks off-road for too long. Although using computer simulation to assess safety systems is becoming increasingly common, it is not yet commonly used for systems that affect driver behavior, such as DMSs. Models that generate virtual crashes, taking crash-causation mechanisms into account, are needed to assess these systems. However, few such models exist, and those that do have not been thoroughly validated on real-world data. This study aims to address this research gap by validating a rear-end crash-causation model which is based on four crash-causation mechanisms related to driver behavior: a) off-road glances, b) too-short headway, c) not braking with the maximum deceleration possible, and d) sleepiness (not reacting before the crash). The pre-crash kinematics were obtained from the German GIDAS in-depth crash database. Challenges with the validation process were identified and addressed. Most notably, a process was developed to transform the generated crashes to mimic the crash severity distribution in GIDAS. This step was necessary because GIDAS does not include property-damage-only (PDO) crashes, while the generated crashes cover the full range of severities (including low-severity crashes, of which many are PDOs). Our results indicate that the proposed model is a reasonably good crash generator. We further demonstrated that the model is a valid method for assessing DMSs in virtual simulations; it shows the safety impact of shorter longest off-road glances. As expected, cutting away long off-road glances substantially reduces the number of crashes that occur and reduces the average delta-v.
As a pivotal approach in machine learning and data science, manifold learning aims to uncover the intrinsic low-dimensional structure within complex nonlinear manifolds in high-dimensional space. By exploiting the manifold hypothesis, various techniques for nonlinear dimension reduction have been developed to facilitate visualization, classification, clustering, and gaining key insights. Although existing manifold learning methods have achieved remarkable successes, they still suffer from extensive distortions incurred in the global structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. Here, we propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the non-landmarks into the learned space based on the constrained locally linear embedding (CLLE). We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types, and applied it to analyze the single-cell transcriptomics and detect anomalies in electrocardiogram (ECG) signals. scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure. The experiments demonstrate notable robustness in embedding quality as the sample rate decreases.
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.
Network meta-analysis (NMA) combines evidence from multiple trials to compare the effectiveness of a set of interventions. In public health research, interventions are often complex, made up of multiple components or features. This makes it difficult to define a common set of interventions on which to perform the analysis. One approach to this problem is component network meta-analysis (CNMA) which uses a meta-regression framework to define each intervention as a subset of components whose individual effects combine additively. In this paper, we are motivated by a systematic review of complex interventions to prevent obesity in children. Due to considerable heterogeneity across the trials, these interventions cannot be expressed as a subset of components but instead are coded against a framework of characteristic features. To analyse these data, we develop a bespoke CNMA-inspired model that allows us to identify the most important features of interventions. We define a meta-regression model with covariates on three levels: intervention, study, and follow-up time, as well as flexible interaction terms. By specifying different regression structures for trials with and without a control arm, we relax the assumption from previous CNMA models that a control arm is the absence of intervention components. Furthermore, we derive a correlation structure that accounts for trials with multiple intervention arms and multiple follow-up times. Although our model was developed for the specifics of the obesity data set, it has wider applicability to any set of complex interventions that can be coded according to a set of shared features.
This research explores the reliability of deep learning, specifically Long Short-Term Memory (LSTM) networks, for estimating the Hurst parameter in fractional stochastic processes. The study focuses on three types of processes: fractional Brownian motion (fBm), fractional Ornstein-Uhlenbeck (fOU) process, and linear fractional stable motions (lfsm). The work involves a fast generation of extensive datasets for fBm and fOU to train the LSTM network on a large volume of data in a feasible time. The study analyses the accuracy of the LSTM network's Hurst parameter estimation regarding various performance measures like RMSE, MAE, MRE, and quantiles of the absolute and relative errors. It finds that LSTM outperforms the traditional statistical methods in the case of fBm and fOU processes; however, it has limited accuracy on lfsm processes. The research also delves into the implications of training length and valuation sequence length on the LSTM's performance. The methodology is applied by estimating the Hurst parameter in Li-ion battery degradation data and obtaining confidence bounds for the estimation. The study concludes that while deep learning methods show promise in parameter estimation of fractional processes, their effectiveness is contingent on the process type and the quality of training data.
We study few-shot acoustic event detection (AED) in this paper. Few-shot learning enables detection of new events with very limited labeled data. Compared to other research areas like computer vision, few-shot learning for audio recognition has been under-studied. We formulate few-shot AED problem and explore different ways of utilizing traditional supervised methods for this setting as well as a variety of meta-learning approaches, which are conventionally used to solve few-shot classification problem. Compared to supervised baselines, meta-learning models achieve superior performance, thus showing its effectiveness on generalization to new audio events. Our analysis including impact of initialization and domain discrepancy further validate the advantage of meta-learning approaches in few-shot AED.