Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of certain emotions. In this paper, we present a comprehensive approach to improve the SER performance throughout the model lifecycle, including pre-training, fine-tuning, and inference stages. To address the data scarcity issue, we utilize a pre-trained model, wav2vec2.0. During fine-tuning, we propose a novel loss function that combines cross-entropy loss with supervised contrastive learning loss to improve the model's discriminative ability. This approach increases the inter-class distances and decreases the intra-class distances, mitigating the issue of blurred boundaries. Finally, to leverage the improved distances, we propose an interpolation method at the inference stage that combines the model prediction with the output from a k-nearest neighbors model. Our experiments on IEMOCAP demonstrate that our proposed methods outperform current state-of-the-art results.
Zero-shot Dialogue State Tracking (DST) addresses the challenge of acquiring and annotating task-oriented dialogues, which can be time consuming and costly. However, DST extends beyond simple slot-filling and requires effective updating strategies for tracking dialogue state as conversations progress. In this paper, we propose ParsingDST, a new In-Context Learning (ICL) method, to introduce additional intricate updating strategies in zero-shot DST. Our approach reformulates the DST task by leveraging powerful Large Language Models (LLMs) and translating the original dialogue text to JSON through semantic parsing as an intermediate state. We also design a novel framework that includes more modules to ensure the effectiveness of updating strategies in the text-to-JSON process. Experimental results demonstrate that our approach outperforms existing zero-shot DST methods on MultiWOZ, exhibiting significant improvements in Joint Goal Accuracy (JGA) and slot accuracy compared to existing ICL methods.
Dyadic data is often encountered when quantities of interest are associated with the edges of a network. As such it plays an important role in statistics, econometrics and many other data science disciplines. We consider the problem of uniformly estimating a dyadic Lebesgue density function, focusing on nonparametric kernel-based estimators taking the form of dyadic empirical processes. Our main contributions include the minimax-optimal uniform convergence rate of the dyadic kernel density estimator, along with strong approximation results for the associated standardized and Studentized $t$-processes. A consistent variance estimator enables the construction of valid and feasible uniform confidence bands for the unknown density function. We showcase the broad applicability of our results by developing novel counterfactual density estimation and inference methodology for dyadic data, which can be used for causal inference and program evaluation. A crucial feature of dyadic distributions is that they may be "degenerate" at certain points in the support of the data, a property making our analysis somewhat delicate. Nonetheless our methods for uniform inference remain robust to the potential presence of such points. For implementation purposes, we discuss inference procedures based on positive semi-definite covariance estimators, mean squared error optimal bandwidth selectors and robust bias correction techniques. We illustrate the empirical finite-sample performance of our methods both in simulations and with real-world trade data, for which we make comparisons between observed and counterfactual trade distributions in different years. Our technical results concerning strong approximations and maximal inequalities are of potential independent interest.
In the era of digital markets, the challenge for consumers is discerning quality amidst information asymmetry. While traditional markets use brand mechanisms to address this issue, transferring such systems to internet-based P2P markets, where misleading practices like fake ratings are rampant, remains challenging. Current internet platforms strive to counter this through verification algorithms, but these efforts find themselves in a continuous tug-of-war with counterfeit actions. Exploiting the transparency, immutability, and traceability of blockchain technology, this paper introduces a robust reputation voting system grounded in it. Unlike existing blockchain-based reputation systems, our model harnesses an intrinsically economically incentivized approach to bolster agent integrity. We optimize this model to mirror real-world user behavior, preserving the reputation system's foundational sustainability. Through Monte-Carlo simulations, using both uniform and power-law distributions enabled by an innovative inverse transform method, we traverse a broad parameter landscape, replicating real-world complexity. The findings underscore the promise of a sustainable, transparent, and formidable reputation mechanism. Given its structure, our framework can potentially function as a universal, sustainable oracle for offchain-onchain bridging, aiding entities in perpetually cultivating their reputation. Future integration with technologies like Ring Signature and Zero Knowledge Proof could amplify the system's privacy facets, rendering it particularly influential in the ever-evolving digital domain.
We investigate the dimension-parametric complexity of the reachability problem in vector addition systems with states (VASS) and its extension with pushdown stack (pushdown VASS). Up to now, the problem is known to be $\mathcal{F}_k$-hard for VASS of dimension $3k+2$ (the complexity class $\mathcal{F}_k$ corresponds to the $k$th level of the fast-growing hierarchy), and no essentially better bound is known for pushdown VASS. We provide a new construction that improves the lower bound for VASS: $\mathcal{F}_k$-hardness in dimension $2k+3$. Furthermore, building on our new insights we show a new lower bound for pushdown VASS: $\mathcal{F}_k$-hardness in dimension $\frac k 2 + 4$. This dimension-parametric lower bound is strictly stronger than the upper bound for VASS, which suggests that the (still unknown) complexity of the reachability problem in pushdown VASS is higher than in plain VASS (where it is Ackermann-complete).
Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose simple and parameter-efficient adaptation procedures for pretrained multimodal networks. In particular, we exploit low-rank adaptation and modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 0.7% of the total parameters in most experiments). We conduct a series of experiments to highlight the robustness of our proposed method using diverse datasets for RGB-thermal and RGB-Depth semantic segmentation, multimodal material segmentation, and multimodal sentiment analysis tasks. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.
The Network Revenue Management (NRM) problem is a well-known challenge in dynamic decision-making under uncertainty. In this problem, fixed resources must be allocated to serve customers over a finite horizon, while customers arrive according to a stochastic process. The typical NRM model assumes that customer arrivals are independent over time. However, in this paper, we explore a more general setting where customer arrivals over different periods can be correlated. We propose a model that assumes the existence of a system state, which determines customer arrivals for the current period. This system state evolves over time according to a time-inhomogeneous Markov chain. We show our model can be used to represent correlation in various settings. To solve the NRM problem under our correlated model, we derive a new linear programming (LP) approximation of the optimal policy. Our approximation provides an upper bound on the total expected value collected by the optimal policy. We use our LP to develop a new bid price policy, which computes bid prices for each system state and time period in a backward induction manner. The decision is then made by comparing the reward of the customer against the associated bid prices. Our policy guarantees to collect at least $1/(1+L)$ fraction of the total reward collected by the optimal policy, where $L$ denotes the maximum number of resources required by a customer. In summary, our work presents a Markovian model for correlated customer arrivals in the NRM problem and provides a new LP approximation for solving the problem under this model. We derive a new bid price policy and provides a theoretical guarantee of the performance of the policy.
Mediation analysis is an important statistical tool in many research fields. Its aim is to investigate the mechanism along the causal pathway between an exposure and an outcome. The joint significance test is widely utilized as a prominent statistical approach for examining mediation effects in practical applications. Nevertheless, the limitation of this mediation testing method stems from its conservative Type I error, which reduces its statistical power and imposes certain constraints on its popularity and utility. The proposed solution to address this gap is the adaptive joint significance test for one mediator, a novel data-adaptive test for mediation effect that exhibits significant advancements compared to traditional joint significance test. The proposed method is designed to be user-friendly, eliminating the need for complicated procedures. We have derived explicit expressions for size and power, ensuring the theoretical validity of our approach. Furthermore, we extend the proposed adaptive joint significance tests for small-scale mediation hypotheses with family-wise error rate (FWER) control. Additionally, a novel adaptive Sobel-type approach is proposed for the estimation of confidence intervals for the mediation effects, demonstrating significant advancements over conventional Sobel's confidence intervals in terms of achieving desirable coverage probabilities. Our mediation testing and confidence intervals procedure is evaluated through comprehensive simulations, and compared with numerous existing approaches. Finally, we illustrate the usefulness of our method by analysing three real-world datasets with continuous, binary and time-to-event outcomes, respectively.
Graph Neural Networks (GNNs) have shown promising results on a broad spectrum of applications. Most empirical studies of GNNs directly take the observed graph as input, assuming the observed structure perfectly depicts the accurate and complete relations between nodes. However, graphs in the real world are inevitably noisy or incomplete, which could even exacerbate the quality of graph representations. In this work, we propose a novel Variational Information Bottleneck guided Graph Structure Learning framework, namely VIB-GSL, in the perspective of information theory. VIB-GSL advances the Information Bottleneck (IB) principle for graph structure learning, providing a more elegant and universal framework for mining underlying task-relevant relations. VIB-GSL learns an informative and compressive graph structure to distill the actionable information for specific downstream tasks. VIB-GSL deduces a variational approximation for irregular graph data to form a tractable IB objective function, which facilitates training stability. Extensive experimental results demonstrate that the superior effectiveness and robustness of VIB-GSL.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.
Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models.