Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on sound-scape augmentation.
Human trajectory forecasting is a critical challenge in fields such as robotics and autonomous driving. Due to the inherent uncertainty of human actions and intentions in real-world scenarios, various unexpected occurrences may arise. To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network. Our method involves constructing a memory bank derived from clustered prior knowledge of motion patterns observed in the training set trajectories. We introduce an addressing mechanism to retrieve the matched pattern and the potential target distributions for each prediction from the memory bank, which enables the identification and retrieval of natural motion patterns exhibited by agents, subsequently using the target priors memory token to guide the diffusion model to generate predictions. Extensive experiments validate the effectiveness of our approach, achieving state-of-the-art trajectory prediction accuracy. The code will be made publicly available.
The digital divide is the gap among population sub-groups in accessing and/or using digital technologies. For instance, older people show a lower propensity to have a broadband connection, use the Internet, and adopt new technologies than the younger ones. Motivated by the analysis of the heterogeneity in the use of digital technologies, we build a bipartite network concerning the presence of various digital skills in individuals from three different European countries: Finland, Italy, and Bulgaria. Bipartite networks provide a useful structure for representing relationships between two disjoint sets of nodes, formally called sending and receiving nodes. The goal is to perform a clustering of individuals (sending nodes) based on their digital skills (receiving nodes) for each country. In this regard, we employ a Mixture of Latent Trait Analyzers (MLTA) accounting for concomitant variables, which allows us to (i) cluster individuals according to their individual profile; (ii) analyze how socio-economic and demographic characteristics, as well as intergenerational ties, influence individual digitalization. Results show that the type of digitalization substantially depends on age, income and level of education, while the presence of children in the household seems to play an important role in the digitalization process in Italy and Finland only.
The use of photon counting detection technology has resulted in significant X-ray imaging research interest in recent years. Computed Tomography (CT) scanners can benefit from photon-counting detectors, which are new technology with the potential to overcome key limitations of conventional CT detectors. Researchers are still studying the effectiveness and sensitivity of semiconductor detector materials in photon counting detectors for detecting soft tissue contrasts. This study aimed to characterize the performance of the Cadmium Zinc Telluride photon counting detector in identifying various tissues. An optimal frame rate per second (FPS) of CZT detector was evaluated by setting the X-ray tube voltage and current at 25 keV, 35 keV and 0.5 mA, 1.0 mA respectively by keeping the optimum FPS fixed, the detector energy thresholds were set in small steps from 15 keV to 35 keV and the Currents were set for X-ray tubes in ranges of 0.1 mA to 1.0 mA to find the relationship between voltage and current of the X-ray source and counts per second (CPS). The samples i.e., fat, liver, muscles, paraffin wax, and contrast media were stacked at six different thickness levels in a stair-step chamber made from Plexi-glass. X-ray transmission at six different thicknesses of tissue samples was also examined for five different energy (regions) thresholds (21 keV, 25 keV, 29 keV, 31 keV, and 45 keV) to determine the effect on count per second (CPS). In this study, 12 frames per second is found to be the optimum frame rate per second (FPS) based on the spectral response of an X-ray source and CPS has a linear relationship with X-ray tube current as well. It was also noted that A sample's thickness also affects its X-ray transmission at different energy thresholds. A high sensitivity and linearity of the detectors make them suitable for use in both preclinical and medical applications.
This review presents various image segmentation methods using complex networks. Image segmentation is one of the important steps in image analysis as it helps analyze and understand complex images. At first, it has been tried to classify complex networks based on how it being used in image segmentation. In computer vision and image processing applications, image segmentation is essential for analyzing complex images with irregular shapes, textures, or overlapping boundaries. Advanced algorithms make use of machine learning, clustering, edge detection, and region-growing techniques. Graph theory principles combined with community detection-based methods allow for more precise analysis and interpretation of complex images. Hybrid approaches combine multiple techniques for comprehensive, robust segmentation, improving results in computer vision and image processing tasks.
Sequential optimization methods are often confronted with the curse of dimensionality in high-dimensional spaces. Current approaches under the Gaussian process framework are still burdened by the computational complexity of tracking Gaussian process posteriors and need to partition the optimization problem into small regions to ensure exploration or assume an underlying low-dimensional structure. With the idea of transiting the candidate points towards more promising positions, we propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior. We provide theoretical guarantees of its convergence in the Gaussian process Thompson sampling setting. We also show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks.
Quasiperiodic systems, related to irrational numbers, are space-filling structures without decay nor translation invariance. How to accurately recover these systems, especially for non-smooth cases, presents a big challenge in numerical computation. In this paper, we propose a new algorithm, finite points recovery (FPR) method, which is available for both smooth and non-smooth cases, to address this challenge. The FPR method first establishes a homomorphism between the lower-dimensional definition domain of the quasiperiodic function and the higher-dimensional torus, then recovers the global quasiperiodic system by employing interpolation technique with finite points in the definition domain without dimensional lifting. Furthermore, we develop accurate and efficient strategies of selecting finite points according to the arithmetic properties of irrational numbers. The corresponding mathematical theory, convergence analysis, and computational complexity analysis on choosing finite points are presented. Numerical experiments demonstrate the effectiveness and superiority of FPR approach in recovering both smooth quasiperiodic functions and piecewise constant Fibonacci quasicrystals. While existing spectral methods encounter difficulties in accurately recovering non-smooth quasiperiodic functions.
The present work is devoted to strong approximations of a generalized Ait-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which is easily implementable and able to preserve positivity of the original model unconditionally, i.e., for any time step-size h>0. A mean-square convergence rate of order 0.5 is also obtained for the proposed scheme in both non-critical and general critical cases. Our work is motivated by the need to justify the multi-level Monte Carlo (MLMC) simulations for the underlying model, where the rate of mean-square convergence is required and the preservation of positivity is desirable particularly for large discretization time steps. To the best of our knowledge, this is the first paper to propose an unconditionally positivity preserving explicit scheme with order 1/2 of mean-square convergence for the model. Numerical experiments are finally provided to confirm the theoretical findings.
This work introduces UstanceBR, a multimodal corpus in the Brazilian Portuguese Twitter domain for target-based stance prediction. The corpus comprises 86.8 k labelled stances towards selected target topics, and extensive network information about the users who published these stances on social media. In this article we describe the corpus multimodal data, and a number of usage examples in both in-domain and zero-shot stance prediction based on text- and network-related information, which are intended to provide initial baseline results for future studies in the field.
Analysis tools used in research laboratories, for sound synthesis, by musicians or sound engineers can be rather different. Discussion of the assumptions and of the limitations of these tools permits to propose a first tool as relevant and versatile as possible for all the sound actors with a major aim: one must be able to listen to each element of the analysis because hearing is the final reference tool. This tool should also be used, in the future, to reinvestigate the definition of sound (or Acoustics) on the basis of some recent works on musical instrument modeling, speech production and loudspeakers design. Audio illustrations will be given.Paper 6041 presented at the 116th Convention of the Audio Engineering Society, Berlin, 2004
Visual reasoning is dominated by end-to-end neural networks scaled to billions of model parameters and training examples. However, even the largest models struggle with compositional reasoning, generalization, fine-grained spatial and temporal reasoning, and counting. Visual reasoning with large language models (LLMs) as controllers can, in principle, address these limitations by decomposing the task and solving subtasks by orchestrating a set of (visual) tools. Recently, these models achieved great performance on tasks such as compositional visual question answering, visual grounding, and video temporal reasoning. Nevertheless, in their current form, these models heavily rely on human engineering of in-context examples in the prompt, which are often dataset- and task-specific and require significant labor by highly skilled programmers. In this work, we present a framework that mitigates these issues by introducing spatially and temporally abstract routines and by leveraging a small number of labeled examples to automatically generate in-context examples, thereby avoiding human-created in-context examples. On a number of visual reasoning tasks, we show that our framework leads to consistent gains in performance, makes LLMs as controllers setup more robust, and removes the need for human engineering of in-context examples.