亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator and provides explicit non-asymptotic complexity guarantees for both.

相關內容

Irregular repetition slotted Aloha (IRSA) has shown significant advantages as a modern technique for uncoordinated random access with massive number of users due to its capability of achieving theoretically a throughput of $1$ packet per slot. When the receiver has also the multi-packet reception of multi-user (MUD) detection property, by applying successive interference cancellation, IRSA also obtains very low packet loss probabilities at low traffic loads, but is unable in general to achieve a normalized throughput close to the $1$. In this paper, we reconsider the case of IRSA with $k$-MUD receivers and derive the general density evolution equations for the non-asymptotic analysis of the packet loss rate, for arbitrary frame lengths and two variants of the first slot used for transmission. Next, using the potential function, we give new capacity bounds on the capacity of the system, showing the threshold arrival rate for zero decoding error probability. Our numerical results illustrate performance in terms of throughput and average delay for $k$-MUD IRSA with finite memory at the receiver, and also with bounded maximum delay.

Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives -- including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially because GPT-4 follows (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at //decodingtrust.github.io/. Additionally, our dataset can be previewed at //huggingface.co/datasets/AI-Secure/DecodingTrust, and a concise version of our DecodingTrust is accessible at //openreview.net/pdf?id=kaHpo8OZw2.

The emergence of Artificial Intelligence (AI)-driven audio attacks has revealed new security vulnerabilities in voice control systems. While researchers have introduced a multitude of attack strategies targeting voice control systems (VCS), the continual advancements of VCS have diminished the impact of many such attacks. Recognizing this dynamic landscape, our study endeavors to comprehensively assess the resilience of commercial voice control systems against a spectrum of malicious audio attacks. Through extensive experimentation, we evaluate six prominent attack techniques across a collection of voice control interfaces and devices. Contrary to prevailing narratives, our results suggest that commercial voice control systems exhibit enhanced resistance to existing threats. Particularly, our research highlights the ineffectiveness of white-box attacks in black-box scenarios. Furthermore, the adversaries encounter substantial obstacles in obtaining precise gradient estimations during query-based interactions with commercial systems, such as Apple Siri and Samsung Bixby. Meanwhile, we find that current defense strategies are not completely immune to advanced attacks. Our findings contribute valuable insights for enhancing defense mechanisms in VCS. Through this survey, we aim to raise awareness within the academic community about the security concerns of VCS and advocate for continued research in this crucial area.

Topology identification (TI) is a key task for state estimation (SE) in distribution grids, especially the one with high-penetration renewables. The uncertainties, initiated by the time-series behavior of renewables, will almost certainly lead to bad TI results without a proper treatment. These uncertainties are analytically intractable under conventional framework-they are usually jointly spatial-temporal dependent, and hence cannot be simply treated as white noise. For this purpose, a hybrid framework is suggested in this paper to handle these uncertainties in a systematic and theoretical way; in particular, big data analytics are studied to harness the jointly spatial-temporal statistical properties of those uncertainties. With some prior knowledge, a model bank is built first to store the countable typical models of network configurations; therefore, the difference between the SE outputs of each bank model and our observation is capable of being defined as a matrix variate-the so-called random matrix. In order to gain insight into the random matrix, a well-designed metric space is needed. Auto-regression (AR) model, factor analysis (FA), and random matrix theory (RMT) are tied together for the metric space design, followed by jointly temporal-spatial analysis of those matrices which is conducted in a high-dimensional (vector) space. Under the proposed framework, some big data analytics and theoretical results are obtained to improve the TI performance. Our framework is validated using IEEE standard distribution network with some field data in practice.

We present IntrinsicAvatar, a novel approach to recovering the intrinsic properties of clothed human avatars including geometry, albedo, material, and environment lighting from only monocular videos. Recent advancements in human-based neural rendering have enabled high-quality geometry and appearance reconstruction of clothed humans from just monocular videos. However, these methods bake intrinsic properties such as albedo, material, and environment lighting into a single entangled neural representation. On the other hand, only a handful of works tackle the problem of estimating geometry and disentangled appearance properties of clothed humans from monocular videos. They usually achieve limited quality and disentanglement due to approximations of secondary shading effects via learned MLPs. In this work, we propose to model secondary shading effects explicitly via Monte-Carlo ray tracing. We model the rendering process of clothed humans as a volumetric scattering process, and combine ray tracing with body articulation. Our approach can recover high-quality geometry, albedo, material, and lighting properties of clothed humans from a single monocular video, without requiring supervised pre-training using ground truth materials. Furthermore, since we explicitly model the volumetric scattering process and ray tracing, our model naturally generalizes to novel poses, enabling animation of the reconstructed avatar in novel lighting conditions.

We explore computational aspects of maximum likelihood estimation of the mixture proportions of a nonparametric finite mixture model -- a convex optimization problem with old roots in statistics and a key member of the modern data analysis toolkit. Motivated by problems in shape constrained inference, we consider structured variants of this problem with additional convex polyhedral constraints. We propose a new cubic regularized Newton method for this problem and present novel worst-case and local computational guarantees for our algorithm. We extend earlier work by Nesterov and Polyak to the case of a self-concordant objective with polyhedral constraints, such as the ones considered herein. We propose a Frank-Wolfe method to solve the cubic regularized Newton subproblem; and derive efficient solutions for the linear optimization oracles that may be of independent interest. In the particular case of Gaussian mixtures without shape constraints, we derive bounds on how well the finite mixture problem approximates the infinite-dimensional Kiefer-Wolfowitz maximum likelihood estimator. Experiments on synthetic and real datasets suggest that our proposed algorithms exhibit improved runtimes and scalability features over existing benchmarks.

While several long-form VideoQA datasets have been introduced, the length of both videos used to curate questions and sub-clips of clues leveraged to answer those questions have not yet reached the criteria for genuine long-form video understanding. Moreover, their QAs are unduly narrow and modality-biased, lacking a wider view of understanding long-term video content with rich dynamics and complex narratives. To remedy this, we introduce MoVQA, a long-form movie question-answering dataset, and benchmark to assess the diverse cognitive capabilities of multimodal systems rely on multi-level temporal lengths, with considering both video length and clue length. Additionally, to take a step towards human-level understanding in long-form video, versatile and multimodal question-answering is designed from the moviegoer-perspective to assess the model capabilities on various perceptual and cognitive axes.Through analysis involving various baselines reveals a consistent trend: the performance of all methods significantly deteriorate with increasing video and clue length. Meanwhile, our established baseline method has shown some improvements, but there is still ample scope for enhancement on our challenging MoVQA dataset. We expect our MoVQA to provide a new perspective and encourage inspiring works on long-form video understanding research.

Extensive work has been devoted to improving the safety mechanism of Large Language Models (LLMs). However, in specific scenarios, LLMs still generate harmful responses when faced with malicious instructions, a phenomenon referred to as "Jailbreak Attack". In our research, we introduce a novel jailbreak attack method (\textbf{RADIAL}), which consists of two steps: 1) Inherent Response Tendency Analysis: we analyze the inherent affirmation and rejection tendency of LLMs to react to real-world instructions. 2) Real-World Instructions-Driven Jailbreak: based on our analysis, we strategically choose several real-world instructions and embed malicious instructions into them to amplify the LLM's potential to generate harmful responses. On three open-source human-aligned LLMs, our method achieves excellent jailbreak attack performance for both Chinese and English malicious instructions. Besides, we guided detailed ablation experiments and verified the effectiveness of our core idea "Inherent Response Tendency Analysis". Our exploration also exposes the vulnerability of LLMs to being induced into generating more detailed harmful responses in subsequent rounds of dialogue.

Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.

北京阿比特科技有限公司