Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PLMs) have achieved substantial advancements in the field of natural language processing. However, in real-world scenarios, data labels are often noisy due to the complex annotation process, making it essential to develop strategies for fine-tuning PLMs with such noisy labels. To this end, we introduce an innovative approach for fine-tuning PLMs using noisy labels, which incorporates the guidance of Large Language Models (LLMs) like ChatGPT. This guidance assists in accurately distinguishing between clean and noisy samples and provides supplementary information beyond the noisy labels, thereby boosting the learning process during fine-tuning PLMs. Extensive experiments on synthetic and real-world noisy datasets further demonstrate the superior advantages of our framework over the state-of-the-art baselines.
We propose a two-stage estimation procedure for a copula-based model with semi-competing risks data, where the non-terminal event is subject to dependent censoring by the terminal event, and both events are subject to independent censoring. Under a copula-based model, the marginal survival functions of individual event times are specified by semiparametric transformation models, and the dependence between the bivariate event times is specified by a parametric copula function. For the estimation procedure, in the first stage, the parameters associated with the marginal of the terminal event are estimated only using the corresponding observed outcomes, and in the second stage, the marginal parameters for the non-terminal event time and the copula parameter are estimated via maximizing a pseudo-likelihood function based on the joint distribution of the bivariate event times. We derived the asymptotic properties of the proposed estimator and provided an analytic variance estimator for inference. Through simulation studies, we showed that our approach leads to consistent estimates with less computational cost and more robustness compared to the one-stage procedure developed in Chen (2012), where all parameters were estimated simultaneously. In addition, our approach demonstrates more desirable finite-sample performances over another existing two-stage estimation method proposed in Zhu et al. (2021).
We present 3D Points Splatting Hand Reconstruction (3D-PSHR), a real-time and photo-realistic hand reconstruction approach. We propose a self-adaptive canonical points upsampling strategy to achieve high-resolution hand geometry representation. This is followed by a self-adaptive deformation that deforms the hand from the canonical space to the target pose, adapting to the dynamic changing of canonical points which, in contrast to the common practice of subdividing the MANO model, offers greater flexibility and results in improved geometry fitting. To model texture, we disentangle the appearance color into the intrinsic albedo and pose-aware shading, which are learned through a Context-Attention module. Moreover, our approach allows the geometric and the appearance models to be trained simultaneously in an end-to-end manner. We demonstrate that our method is capable of producing animatable, photorealistic and relightable hand reconstructions using multiple datasets, including monocular videos captured with handheld smartphones and large-scale multi-view videos featuring various hand poses. We also demonstrate that our approach achieves real-time rendering speeds while simultaneously maintaining superior performance compared to existing state-of-the-art methods.
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query. Since datasets in this domain are often gathered from limited video scenes, models tend to overfit to scene-specific factors, which leads to suboptimal performance when encountering new scenes in real-world applications. In a new scene, the fine-grained annotations are often insufficient due to the expensive labor cost, while the coarse-grained video-query pairs are easier to obtain. Thus, to address this issue and enhance model performance on new scenes, we explore the TVG task in an unsupervised domain adaptation (UDA) setting across scenes for the first time, where the video-query pairs in the source scene (domain) are labeled with temporal boundaries, while those in the target scene are not. Under the UDA setting, we introduce a novel Adversarial Multi-modal Domain Adaptation (AMDA) method to adaptively adjust the model's scene-related knowledge by incorporating insights from the target data. Specifically, we tackle the domain gap by utilizing domain discriminators, which help identify valuable scene-related features effective across both domains. Concurrently, we mitigate the semantic gap between different modalities by aligning video-query pairs with related semantics. Furthermore, we employ a mask-reconstruction approach to enhance the understanding of temporal semantics within a scene. Extensive experiments on Charades-STA, ActivityNet Captions, and YouCook2 demonstrate the effectiveness of our proposed method.
We introduce a new class of sequential Monte Carlo methods called nested sampling via sequential Monte Carlo (NS-SMC), which reformulates the essence of the nested sampling method of Skilling (2006) in terms of sequential Monte Carlo techniques. This new framework allows convergence results to be obtained in the setting when Markov chain Monte Carlo (MCMC) is used to produce new samples. An additional benefit is that marginal likelihood (normalizing constant) estimates are unbiased. In contrast to NS, the analysis of NS-SMC does not require the (unrealistic) assumption that the simulated samples be independent. We show that a minor adjustment to our adaptive NS-SMC algorithm recovers the original NS algorithm, which provides insights as to why NS seems to produce accurate estimates despite a typical violation of its assumptions. A numerical study is conducted where the performance of NS-SMC and temperature-annealed SMC is compared on challenging problems. Code for the experiments is made available online at //github.com/LeahPrice/SMC-NS .
In recent years, the rapid advancement and impressive capabilities of Large Language Models (LLMs) have been evident across various domains. This paper explores the application, implications, and potential of LLMs in building energy efficiency and decarbonization studies. The wide-ranging capabilities of LLMs are examined in the context of the building energy field, including intelligent control systems, code generation, data infrastructure, knowledge extraction, and education. Despite the promising potential of LLMs, challenges including complex and expensive computation, data privacy, security and copyright, complexity in fine-tuned LLMs, and self-consistency are discussed. The paper concludes with a call for future research focused on the enhancement of LLMs for domain-specific tasks, multi-modal LLMs, and collaborative research between AI and energy experts.
The reconstruction of Configuration Space (CS) from a limited number of samples plays a vital role in expediting motion planning for random tree algorithms. Traditionally, the evaluation of CS reconstruction is performed through collision checking. However, employing the collision checker as an evaluation measure can be misleading. In particular, a collision checker may exhibit high accuracy even when only a subset of the original CS is reconstructed, limiting the motion planner's ability to find paths comparable to those in the original CS. Additionally, a significant challenge arises when dealing with high-dimensional CSs, as it becomes increasingly difficult, if not impossible, to perform qualitative evaluations when working in dimensions higher than three. In this paper, we introduce a novel approach for representing high-dimensional CSs of manipulator robots in a 2D format. Specifically, we leverage the kinematic chain of manipulator robots and the human ability to perceive colors based on hue. This allows us to construct a visualization comprising a series of pairs of 2D projections. We showcase the efficacy of our method in representing a 7-degree-of-freedom CS of a manipulator robot in a 2D projection. This representation provides qualitative insights into the joint boundaries of the robot and the collision state combinations. From a quantitative perspective, we show that the proposed representation not only captures accuracy but also furnishes additional information, enhancing our ability to compare two different high-dimensional CSs during the deployment phase, beyond what is usually offered by the collision checker. The source code is publicly available on our repository.
Neural ranking models (NRMs) have shown great success in information retrieval (IR). But their predictions can easily be manipulated using adversarial examples, which are crafted by adding imperceptible perturbations to legitimate documents. This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs. By incorporating adversarial examples into training data, adversarial training has become the de facto defense approach to adversarial attacks against NRMs. However, this defense mechanism is subject to a trade-off between effectiveness and adversarial robustness. In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs. We decompose the robust ranking error into two components, i.e., a natural ranking error for effectiveness evaluation and a boundary ranking error for assessing adversarial robustness. Then, we define the perturbation invariance of a ranking model and prove it to be a differentiable upper bound on the boundary ranking error for attainable computation. Informed by our theoretical analysis, we design a novel \emph{perturbation-invariant adversarial training} (PIAT) method for ranking models to achieve a better effectiveness-robustness trade-off. We design a regularized surrogate loss, in which one term encourages the effectiveness to be maximized while the regularization term encourages the output to be smooth, so as to improve adversarial robustness. Experimental results on several ranking models demonstrate the superiority of PITA compared to existing adversarial defenses.
Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.
Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.
Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.