亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The high feature dimensionality is a challenge in music emotion recognition. There is no common consensus on a relation between audio features and emotion. The MER system uses all available features to recognize emotion; however, this is not an optimal solution since it contains irrelevant data acting as noise. In this paper, we introduce a feature selection approach to eliminate redundant features for MER. We created a Selected Feature Set (SFS) based on the feature selection algorithm (FSA) and benchmarked it by training with two models, Support Vector Regression (SVR) and Random Forest (RF) and comparing them against with using the Complete Feature Set (CFS). The result indicates that the performance of MER has improved for both Random Forest (RF) and Support Vector Regression (SVR) models by using SFS. We found using FSA can improve performance in all scenarios, and it has potential benefits for model efficiency and stability for MER task.

相關內容

特征選擇( Feature Selection )也稱特征子集選擇( Feature Subset Selection , FSS ),或屬性選擇( Attribute Selection )。是指從已有的M個特征(Feature)中選擇N個特征使得系統的特定指標最優化,是從原始特征中選擇出一些最有效特征以降低數據集維度的過程,是提高學習算法性能的一個重要手段,也是模式識別中關鍵的數據預處理步驟。對于一個學習算法來說,好的學習樣本是訓練模型的關鍵。

Speech emotion recognition is a challenging research topic that plays a critical role in human-computer interaction. Multimodal inputs further improve the performance as more emotional information is used. However, existing studies learn all the information in the sample while only a small portion of it is about emotion. The redundant information will become noises and limit the system performance. In this paper, a key-sparse Transformer is proposed for efficient emotion recognition by focusing more on emotion related information. The proposed method is evaluated on the IEMOCAP and LSSED. Experimental results show that the proposed method achieves better performance than the state-of-the-art approaches.

Body actions and head gestures are natural interfaces for interaction in virtual environments. Existing methods for in-place body action recognition often require hardware more than a head-mounted display (HMD), making body action interfaces difficult to be introduced to ordinary virtual reality (VR) users as they usually only possess an HMD. In addition, there lacks a unified solution to recognize in-place body actions and head gestures. This potentially hinders the exploration of the use of in-place body actions and head gestures for novel interaction experiences in virtual environments. We present a unified two-stream 1-D convolutional neural network (CNN) for recognition of body actions when a user performs walking-in-place (WIP) and for recognition of head gestures when a user stands still wearing only an HMD. Compared to previous approaches, our method does not require specialized hardware and/or additional tracking devices other than an HMD and can recognize a significantly larger number of body actions and head gestures than other existing methods. In total, ten in-place body actions and eight head gestures can be recognized with the proposed method, which makes this method a readily available body action interface (head gestures included) for interaction with virtual environments. We demonstrate one utility of the interface through a virtual locomotion task. Results show that the present body action interface is reliable in detecting body actions for the VR locomotion task but is physically demanding compared to a touch controller interface. The present body action interface is promising for new VR experiences and applications, especially for VR fitness applications where workouts are intended.

In recent years, an association is established between faces and voices of celebrities leveraging large scale audio-visual information from YouTube. The availability of large scale audio-visual datasets is instrumental in developing speaker recognition methods based on standard Convolutional Neural Networks. Thus, the aim of this paper is to leverage large scale audio-visual information to improve speaker recognition task. To achieve this task, we proposed a two-branch network to learn joint representations of faces and voices in a multimodal system. Afterwards, features are extracted from the two-branch network to train a classifier for speaker recognition. We evaluated our proposed framework on a large scale audio-visual dataset named VoxCeleb$1$. Our results show that addition of facial information improved the performance of speaker recognition. Moreover, our results indicate that there is an overlap between face and voice.

This work presents a rigorous mathematical formulation for topology optimization of a macrostructure undergoing ductile failure. The prediction of ductile solid materials which exhibit dominant plastic deformation is an intriguingly challenging task and plays an extremely important role in various engineering applications. Here, we rely on the phase-field approach to fracture which is a widely adopted framework for modeling and computing the fracture failure phenomena in solids. The first objective is to optimize the topology of the structure in order to minimize its mass, while accounting for structural damage. To do so, the topological phase transition function (between solid and void phases) is introduced, thus resulting in an extension of all the governing equations. Our second objective is to additionally enhance the fracture resistance of the structure. Accordingly, two different formulations are proposed. One requires only the residual force vector of the deformation field as a constraint, while in the second formulation, the residual force vector of the deformation and phase-field fracture simultaneously have been imposed. An incremental minimization principles for a class of gradient-type dissipative materials are used to derive the governing equations. Here, the level-set-based topology optimization is employed to seek an optimal layout with smooth and clear boundaries. Sensitivities are derived using the analytical gradient-based adjoint method to update the level-set surface for both formulations. Here, the evolution of the level-set surface is realized by the reaction-diffusion equation to maximize the strain energy of the structure while a certain volume of design domain is prescribed. Several three-dimensional numerical examples are presented to substantiate our algorithmic developments.

Grid-free Monte Carlo methods based on the \emph{walk on spheres (WoS)} algorithm solve fundamental partial differential equations (PDEs) like the Poisson equation without discretizing the problem domain, nor approximating functions in a finite basis. Such methods hence avoid aliasing in the solution, and evade the many challenges of mesh generation. Yet for problems with complex geometry, practical grid-free methods have been largely limited to basic Dirichlet boundary conditions. This paper introduces the \emph{walk on stars (WoSt)} method, which solves linear elliptic PDEs with arbitrary mixed Neumann and Dirichlet boundary conditions. The key insight is that one can efficiently simulate reflecting Brownian motion (which models Neumann conditions) by replacing the balls used by WoS with \emph{star-shaped} domains; we identify such domains by locating the closest visible point on the geometric silhouette. Overall, WoSt retains many attractive features of other grid-free Monte Carlo methods, such as progressive evaluation, trivial parallel implementation, and logarithmic scaling relative to geometric complexity.

Attention Networks (ATNs) such as Transformers are used in many domains ranging from Natural Language Processing to Autonomous Driving. In this paper, we study the robustness problem of ATNs, a key characteristic where low robustness may cause safety concerns. Specifically, we focus on Sparsemax-based ATNs and reduce the finding of their maximum robustness to a Mixed Integer Quadratically Constrained Programming (MIQCP) problem. We also design two pre-processing heuristics that can be embedded in the MIQCP encoding and substantially accelerate its solving. We then conduct experiments using the application of Land Departure Warning to compare the robustness of Sparsemax-based ATNs against that of the more conventional Multi-Layer-Perceptron (MLP) Neural Networks (NNs). To our surprise, ATNs are not necessarily more robust, leading to profound considerations in selecting appropriate NN architectures for safety-critical domain applications.

In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.

Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence. It has great fundamental importance and strong industrial needs. Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks, with the help of large amounts of training data and new powerful computation resources. Though recognition accuracy is usually the first concern for new progresses, efficiency is actually rather important and sometimes critical for both academic research and industrial applications. Moreover, insightful views on the opportunities and challenges of efficiency are also highly required for the entire community. While general surveys on the efficiency issue of DNNs have been done from various perspectives, as far as we are aware, scarcely any of them focused on visual recognition systematically, and thus it is unclear which progresses are applicable to it and what else should be concerned. In this paper, we present the review of the recent advances with our suggestions on the new possible directions towards improving the efficiency of DNN-related visual recognition approaches. We investigate not only from the model but also the data point of view (which is not the case in existing surveys), and focus on three most studied data types (images, videos and points). This paper attempts to provide a systematic summary via a comprehensive survey which can serve as a valuable reference and inspire both researchers and practitioners who work on visual recognition problems.

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.

北京阿比特科技有限公司