Due to the emergence of various wireless sensing technologies, numerous positioning algorithms have been introduced in the literature, categorized into \emph{geometry-driven positioning} (GP) and \emph{data-driven positioning} (DP). These approaches have complementary limitations, e.g., a non-line-of-sight issue for GP and the lack of a real-time training dataset for DP, calling for combining the two for practical use. To this end, this paper aims at introducing a new principle, called \emph{combinatorial data augmentation} (CDA), a catalyst for the two approaches' tight integration. Specifically, GP-based datasets augmented from different combinations of positioning entities can be used as DP's inputs and labels. We confirm the CDA's effectiveness from field experiments based on WiFi \emph{round-trip times} (RTTs) and \emph{inertial measurement units} (IMUs) by designing several CDA-driven positioning algorithms. First, we show that CDA enables us to quantify the given position estimates' uncertainties. Then, we can filter out unreliable ones for WiFi RTT positioning and compute the covariance matrix of a Kalman filter to integrate two position estimates derived by WiFi RTT and IMUs. The mean and standard deviations of the resultant positioning error are $1.65$ (m) and $1.01$ (m), respectively. Next, we use the above position estimate to its real-time label for \emph{fingerprint-based positioning} (FBP), shown to provide an acceptable positioning accuracy, say the average positioning error of $1.51$ (m) with a standard deviation of $0.88$ (m). Lastly, we discuss the proposed CDA's potential from positioning and beyond positioning perspectives.
Vision transformers have achieved great successes in many computer vision tasks. Most methods generate vision tokens by splitting an image into a regular and fixed grid and treating each cell as a token. However, not all regions are equally important in human-centric vision tasks, e.g., the human body needs a fine representation with many tokens, while the image background can be modeled by a few tokens. To address this problem, we propose a novel Vision Transformer, called Token Clustering Transformer (TCFormer), which merges tokens by progressive clustering, where the tokens can be merged from different locations with flexible shapes and sizes. The tokens in TCFormer can not only focus on important areas but also adjust the token shapes to fit the semantic concept and adopt a fine resolution for regions containing critical details, which is beneficial to capturing detailed information. Extensive experiments show that TCFormer consistently outperforms its counterparts on different challenging human-centric tasks and datasets, including whole-body pose estimation on COCO-WholeBody and 3D human mesh reconstruction on 3DPW. Code is available at //github.com/ zengwang430521/TCFormer.git.
We introduce and analyze various Regularized Combined Field Integral Equations (CFIER) formulations of time-harmonic Navier equations in media with piece-wise constant material properties. These formulations can be derived systematically starting from suitable coercive approximations of Dirichlet-to-Neumann operators (DtN), and we present a periodic pseudodifferential calculus framework within which the well posedness of CIER formulations can be established. We also use the DtN approximations to derive and analyze Optimized Schwarz (OS) methods for the solution of elastodynamics transmission problems. The pseudodifferential calculus we develop in this paper relies on careful singularity splittings of the kernels of Navier boundary integral operators which is also the basis of high-order Nystr\"om quadratures for their discretizations. Based on these high-order discretizations we investigate the rate of convergence of iterative solvers applied to CFIER and OS formulations of scattering and transmission problems. We present a variety of numerical results that illustrate that the CFIER methodology leads to important computational savings over the classical CFIE one, whenever iterative solvers are used for the solution of the ensuing discretized boundary integral equations. Finally, we show that the OS methods are competitive in the high-frequency high-contrast regime.
Removing noise from the any processed images is very important. Noise should be removed in such a way that important information of image should be preserved. A decisionbased nonlinear algorithm for elimination of band lines, drop lines, mark, band lost and impulses in images is presented in this paper. The algorithm performs two simultaneous operations, namely, detection of corrupted pixels and evaluation of new pixels for replacing the corrupted pixels. Removal of these artifacts is achieved without damaging edges and details. However, the restricted window size renders median operation less effective whenever noise is excessive in that case the proposed algorithm automatically switches to mean filtering. The performance of the algorithm is analyzed in terms of Mean Square Error [MSE], Peak-Signal-to-Noise Ratio [PSNR], Signal-to-Noise Ratio Improved [SNRI], Percentage Of Noise Attenuated [PONA], and Percentage Of Spoiled Pixels [POSP]. This is compared with standard algorithms already in use and improved performance of the proposed algorithm is presented. The advantage of the proposed algorithm is that a single algorithm can replace several independent algorithms which are required for removal of different artifacts.
Recent advances in deep learning and computer vision offer an excellent opportunity to investigate high-level visual analysis tasks such as human localization and human pose estimation. Although the performance of human localization and human pose estimation has significantly improved in recent reports, they are not perfect and erroneous localization and pose estimation can be expected among video frames. Studies on the integration of these techniques into a generic pipeline that is robust to noise introduced from those errors are still lacking. This paper fills the missing study. We explored and developed two working pipelines that suited the visual-based positioning and pose estimation tasks. Analyses of the proposed pipelines were conducted on a badminton game. We showed that the concept of tracking by detection could work well, and errors in position and pose could be effectively handled by a linear interpolation technique using information from nearby frames. The results showed that the Visual-based Positioning and Pose Estimation could deliver position and pose estimations with good spatial and temporal resolutions.
Despite significant advances in the field of deep learning in applications to various fields, explaining the inner processes of deep learning models remains an important and open question. The purpose of this article is to describe and substantiate the geometric and topological view of the learning process of neural networks. Our attention is focused on the internal representation of neural networks and on the dynamics of changes in the topology and geometry of the data manifold on different layers. We also propose a method for assessing the generalizing ability of neural networks based on topological descriptors. In this paper, we use the concepts of topological data analysis and intrinsic dimension, and we present a wide range of experiments on different datasets and different configurations of convolutional neural network architectures. In addition, we consider the issue of the geometry of adversarial attacks in the classification task and spoofing attacks on face recognition systems. Our work is a contribution to the development of an important area of explainable and interpretable AI through the example of computer vision.
We apply a reinforcement meta-learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile. The system is implemented as a policy that maps navigation system outputs directly to commanded rates of change for the missile's control surface deflections. The system induces intercept trajectories against a maneuvering target that satisfy control constraints on fin deflection angles, and path constraints on look angle and load. We test the optimized system in a six degrees-of-freedom simulator that includes a non-linear radome model and a strapdown seeker model, and demonstrate that the system adapts to both a large flight envelope and off-nominal flight conditions including perturbation of aerodynamic coefficient parameters and center of pressure locations, and flexible body dynamics. Moreover, we find that the system is robust to the parasitic attitude loop induced by radome refraction and imperfect seeker stabilization. We compare our system's performance to a longitudinal model of proportional navigation coupled with a three loop autopilot, and find that our system outperforms this benchmark by a large margin. Additional experiments investigate the impact of removing the recurrent layer from the policy and value function networks, performance with an infrared seeker, and flexible body dynamics.
White noise is a fundamental and fairly well understood stochastic process that conforms the conceptual basis for many other processes, as well as for the modeling of time series. Here we push a fresh perspective toward white noise that, grounded on combinatorial considerations, contributes to give new interesting insights both for modelling and theoretical purposes. To this aim, we incorporate the ordinal pattern analysis approach which allows us to abstract a time series as a sequence of patterns and their associated permutations, and introduce a simple functional over permutations that partitions them into classes encoding their level of asymmetry. We compute the exact probability mass function (p.m.f.) of this functional over the symmetric group of degree $n$, thus providing the description for the case of an infinite white noise realization. This p.m.f. can be conveniently approximated by a continuous probability density from an exponential family, the Gaussian, hence providing natural sufficient statistics that render a convenient and simple statistical analysis through ordinal patterns. Such analysis is exemplified on experimental data for the spatial increments from tracks of gold nanoparticles in 3D diffusion.
Modern neural network training relies heavily on data augmentation for improved generalization. After the initial success of label-preserving augmentations, there has been a recent surge of interest in label-perturbing approaches, which combine features and labels across training samples to smooth the learned decision surface. In this paper, we propose a new augmentation method that leverages the first and second moments extracted and re-injected by feature normalization. We replace the moments of the learned features of one training image by those of another, and also interpolate the target labels. As our approach is fast, operates entirely in feature space, and mixes different signals than prior methods, one can effectively combine it with existing augmentation methods. We demonstrate its efficacy across benchmark data sets in computer vision, speech, and natural language processing, where it consistently improves the generalization performance of highly competitive baseline networks.
Attributed graph clustering is challenging as it requires joint modelling of graph structures and node attributes. Recent progress on graph convolutional networks has proved that graph convolution is effective in combining structural and content information, and several recent methods based on it have achieved promising clustering performance on some real attributed networks. However, there is limited understanding of how graph convolution affects clustering performance and how to properly use it to optimize performance for different graphs. Existing methods essentially use graph convolution of a fixed and low order that only takes into account neighbours within a few hops of each node, which underutilizes node relations and ignores the diversity of graphs. In this paper, we propose an adaptive graph convolution method for attributed graph clustering that exploits high-order graph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs. We establish the validity of our method by theoretical analysis and extensive experiments on benchmark datasets. Empirical results show that our method compares favourably with state-of-the-art methods.
In this paper, we focus on three problems in deep learning based medical image segmentation. Firstly, U-net, as a popular model for medical image segmentation, is difficult to train when convolutional layers increase even though a deeper network usually has a better generalization ability because of more learnable parameters. Secondly, the exponential ReLU (ELU), as an alternative of ReLU, is not much different from ReLU when the network of interest gets deep. Thirdly, the Dice loss, as one of the pervasive loss functions for medical image segmentation, is not effective when the prediction is close to ground truth and will cause oscillation during training. To address the aforementioned three problems, we propose and validate a deeper network that can fit medical image datasets that are usually small in the sample size. Meanwhile, we propose a new loss function to accelerate the learning process and a combination of different activation functions to improve the network performance. Our experimental results suggest that our network is comparable or superior to state-of-the-art methods.