Audio source separation is often achieved by estimating the magnitude spectrogram of each source, and then applying a phase recovery (or spectrogram inversion) algorithm to retrieve time-domain signals. Typically, spectrogram inversion is treated as an optimization problem involving one or several terms in order to promote estimates that comply with a consistency property, a mixing constraint, and/or a target magnitude objective. Nonetheless, it is still unclear which set of constraints and problem formulation is the most appropriate in practice. In this paper, we design a general framework for deriving spectrogram inversion algorithm, which is based on formulating optimization problems by combining these objectives either as soft penalties or hard constraints. We solve these by means of algorithms that perform alternating projections on the subsets corresponding to each objective/constraint. Our framework encompasses existing techniques from the literature as well as novel algorithms. We investigate the potential of these approaches for a speech enhancement task. In particular, one of our novel algorithms outperforms other approaches in a realistic setting where the magnitudes are estimated beforehand using a neural network.
The magnetic inversion method is one of the non-destructive geophysical methods, which aims to estimate the subsurface susceptibility distribution from surface magnetic anomaly data. Recently, supervised deep learning methods have been widely utilized in lots of geophysical fields including magnetic inversion. However, these methods rely heavily on synthetic training data, whose performance is limited since the synthetic data is not independently and identically distributed with the field data. Thus, we proposed to realize magnetic inversion by self-supervised deep learning. The proposed self-supervised knowledge-driven 3D magnetic inversion method (SSKMI) learns on the target field data by a closed loop of the inversion and forward models. Given that the parameters of the forward model are preset, SSKMI can optimize the inversion model by minimizing the mean absolute error between observed and re-estimated surface magnetic anomalies. Besides, there is a knowledge-driven module in the proposed inversion model, which makes the deep learning method more explicable. Meanwhile, comparative experiments demonstrate that the knowledge-driven module can accelerate the training of the proposed method and achieve better results. Since magnetic inversion is an ill-pose task, SSKMI proposed to constrain the inversion model by a guideline in the auxiliary loop. The experimental results demonstrate that the proposed method is a reliable magnetic inversion method with outstanding performance.
Instance segmentation in electron microscopy (EM) volumes poses a significant challenge due to the complex morphology of instances and insufficient annotations. Self-supervised learning has recently emerged as a promising solution, enabling the acquisition of prior knowledge of cellular tissue structures that are essential for EM instance segmentation. However, existing pretraining methods often lack the ability to capture complex visual patterns and relationships between voxels, which results in the acquired prior knowledge being insufficient for downstream EM analysis tasks. In this paper, we propose a novel pretraining framework that leverages multiscale visual representations to capture both voxel-level and feature-level consistency in EM volumes. Specifically, our framework enforces voxel-level consistency between the outputs of a Siamese network by a reconstruction function, and incorporates a cross-attention mechanism for soft feature matching to achieve fine-grained feature-level consistency. Moreover, we propose a contrastive learning scheme on the feature pyramid to extract discriminative features across multiple scales. We extensively pretrain our method on four large-scale EM datasets, achieving promising performance improvements in representative tasks of neuron and mitochondria instance segmentation.
The rhythm of synthetic speech is usually too smooth, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems.
Electroaerodynamic propulsion, where force is produced through collisions between electrostatically accelerated ions and neutral air molecules, is an attractive alternative to propeller- and flapping wing-based methods for micro air vehicle (MAV) flight due to its silent and solid-state nature. One major barrier to adoption is its limited thrust efficiency at useful disk loading levels. Ducted actuators comprising multiple serially-integrated acceleration stages are a potential solution, allowing individual stages to operate at higher efficiency while maintaining a useful total thrust, and potentially improving efficiency through various aerodynamic and fluid dynamic mechanisms. In this work, we investigate the effects of duct and emitter electrode geometries on actuator performance, then show how a combination of increasing cross-sectional aspect ratio and serial integration of multiple stages can be used to produce overall thrust densities comparable to commercial propulsors. An optimized five-stage device attains a thrust density of about 18 N/m$^2$ at a thrust efficiency of about 2 mN/W, among the highest values ever measured at this scale. We further show how this type of thruster can be integrated under the wings of a MAV-scale fixed wing platform, pointing towards future use as a distributed propulsion system.
The sensor placement problem is a common problem that arises when monitoring correlated phenomena, such as temperature and precipitation. Existing approaches to this problem typically use discrete optimization methods, which are computationally expensive and cannot scale to large problems. We address the sensor placement problem in correlated environments by reducing it to a regression problem that can be efficiently solved using sparse Gaussian processes (SGPs). Our approach can handle both discrete sensor placement problems-where sensors are limited to a subset of a given set of locations-and continuous sensor placement problems-where sensors can be placed anywhere in a bounded continuous region. We further generalize our approach to handle sensors with a non-point field of view and integrated observations. Our experimental results on three real-world datasets show that our approach generates sensor placements that result in reconstruction quality that is consistently on par or better than the prior state-of-the-art approach while being significantly faster. Our computationally efficient approach enables both large-scale sensor placement and fast robotic sensor placement for informative path planning algorithms.
Single-particle traces of the diffusive motion of molecules, cells, or animals are by-now routinely measured, similar to stochastic records of stock prices or weather data. Deciphering the stochastic mechanism behind the recorded dynamics is vital in understanding the observed systems. Typically, the task is to decipher the exact type of diffusion and/or to determine system parameters. The tools used in this endeavor are currently revolutionized by modern machine-learning techniques. In this Perspective we provide an overview over recently introduced methods in machine-learning for diffusive time series, most notably, those successfully competing in the Anomalous-Diffusion-Challenge. As such methods are often criticized for their lack of interpretability, we focus on means to include uncertainty estimates and feature-based approaches, both improving interpretability and providing concrete insight into the learning process of the machine. We expand the discussion by examining predictions on different out-of-distribution data. We also comment on expected future developments.
It is shown that a class of optical physical unclonable functions (PUFs) can be learned to arbitrary precision with arbitrarily high probability, even in the presence of noise, given access to polynomially many challenge-response pairs and polynomially bounded computational power, under mild assumptions about the distributions of the noise and challenge vectors. This extends the results of Rh\"uramir et al. (2013), who showed a subset of this class of PUFs to be learnable in polynomial time in the absence of noise, under the assumption that the optics of the PUF were either linear or had negligible nonlinear effects. We derive polynomial bounds for the required number of samples and the computational complexity of a linear regression algorithm, based on size parameters of the PUF, the distributions of the challenge and noise vectors, and the probability and accuracy of the regression algorithm, with a similar analysis to one done by Bootle et al. (2018), who demonstrated a learning attack on a poorly implemented version of the Learning With Errors problem.
Difference-in-differences is undoubtedly one of the most widely used methods for evaluating the causal effect of an intervention in observational (i.e., nonrandomized) settings. The approach is typically used when pre- and post-exposure outcome measurements are available, and one can reasonably assume that the association of the unobserved confounder with the outcome has the same absolute magnitude in the two exposure arms, and is constant over time; a so-called parallel trends assumption. The parallel trends assumption may not be credible in many practical settings, including if the outcome is binary, a count, or polytomous, as well as when an uncontrolled confounder exhibits non-additive effects on the distribution of the outcome, even if such effects are constant over time. We introduce an alternative approach that replaces the parallel trends assumption with an odds ratio equi-confounding assumption under which an association between treatment and the potential outcome under no-treatment is identified with a well-specified generalized linear model relating the pre-exposure outcome and the exposure. Because the proposed method identifies any causal effect that is conceivably identified in the absence of confounding bias, including nonlinear effects such as quantile treatment effects, the approach is aptly called Universal Difference-in-differences (UDiD). Both fully parametric and more robust semiparametric UDiD estimators are described and illustrated in a real-world application concerning the causal effects of a Zika virus outbreak on birth rate in Brazil.
In scientific simulations, observations, and experiments, the cost of transferring data to and from disk and across networks has become a significant bottleneck that particularly impacts subsequent data analysis and visualization. To address this challenge, compression techniques have been widely adopted. However, traditional lossy compression approaches often require setting error tolerances conservatively to respect the numerical sensitivities of a wide variety of post hoc data analyses, some of which may not even be known a priori. Progressive data compression and retrieval has emerged as a solution, allowing for the adaptive handling of compressed data according to the needs of a given post-processing task. However, few analysis algorithms natively support progressive data processing, and adapting compression techniques, file formats, client/server frameworks, and APIs to support progressivity can be challenging. This work presents a general framework that supports progressive-precision data queries independently of the underlying data compressor or number representation. Our approach is based on a multiple-component representation that successively, with each new component, reduces the error between the original and compressed field, allowing each field in the progressive sequence to be expressed as a partial sum of components. We have implemented our approach on top of four popular scientific data compressors and have evaluated its behavior on several real-world data sets from the SDRBench collection. Numerical results indicate that our framework is effective in terms of accuracy compared to each of the standalone compressors it builds upon. In addition, (de)compression time is proportional to the number and granularity of components. Finally, our framework allows for fully lossless compression using lossy compressors when a sufficient number of components are employed.
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.