亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for optimization. Outcome supervision provides sparse rewards for final results without identifying error locations, whereas process supervision offers step-wise rewards but requires extensive manual annotation. R$^3$ overcomes these limitations by learning from correct demonstrations. Specifically, R$^3$ progressively slides the start state of reasoning from a demonstration's end to its beginning, facilitating easier model exploration at all stages. Thus, R$^3$ establishes a step-wise curriculum, allowing outcome supervision to offer step-level signals and precisely pinpoint errors. Using Llama2-7B, our method surpasses RL baseline on eight reasoning tasks by $4.1$ points on average. Notebaly, in program-based reasoning on GSM8K, it exceeds the baseline by $4.2$ points across three backbone models, and without any extra data, Codellama-7B + R$^3$ performs comparable to larger models or closed-source models.

相關內容

ACM/IEEE第23屆模型驅動工程語言和系統國際會議,是模型驅動軟件和系統工程的首要會議系列,由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來,模型涵蓋了建模的各個方面,從語言和方法到工具和應用程序。模特的參加者來自不同的背景,包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇,參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會,并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。 官網鏈接: · 相互獨立的 · 線性的 · 向量化 · 線性組合 ·
2024 年 4 月 26 日

In this paper, we set the mathematical foundations of the Dynamical Low-Rank Approximation (DLRA) method for stochastic differential equations. DLRA aims at approximating the solution as a linear combination of a small number of basis vectors with random coefficients (low rank format) with the peculiarity that both the basis vectors and the random coefficients vary in time. While the formulation and properties of DLRA are now well understood for random/parametric equations, the same cannot be said for SDEs and this work aims to fill this gap. We start by rigorously formulating a Dynamically Orthogonal (DO) approximation (an instance of DLRA successfully used in applications) for SDEs, which we then generalize to define a parametrization independent DLRA for SDEs. We show local well-posedness of the DO equations and their equivalence with the DLRA formulation. We also characterize the explosion time of the DO solution by a loss of linear independence of the random coefficients defining the solution expansion and give sufficient conditions for global existence.

This paper presents an innovative enhancement to the Sphere as Prior Generative Adversarial Network (SP-GAN) model, a state-of-the-art GAN designed for point cloud generation. A novel method is introduced for point cloud generation that elevates the structural integrity and overall quality of the generated point clouds by incorporating topological priors into the training process of the generator. Specifically, this work utilizes the K-means algorithm to segment a point cloud from the repository into clusters and extract centroids, which are then used as priors in the generation process of the SP-GAN. Furthermore, the discriminator component of the SP-GAN utilizes the identical point cloud that contributed the centroids, ensuring a coherent and consistent learning environment. This strategic use of centroids as intuitive guides not only boosts the efficiency of global feature learning but also substantially improves the structural coherence and fidelity of the generated point clouds. By applying the K-means algorithm to generate centroids as the prior, the work intuitively and experimentally demonstrates that such a prior enhances the quality of generated point clouds.

This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information. Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency. The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points.

In this paper, we propose the Graph-Learning-Dual Graph Convolutional Neural Network called GLDGCN based on the classic Graph Convolutional Neural Network(GCN) by introducing dual convolutional layer and graph learning layer. We apply GLDGCN to the semi-supervised node classification task. Compared with the baseline methods, we achieve higher classification accuracy on three citation networks Citeseer, Cora and Pubmed, and we also analyze and discussabout selection of the hyperparameters and network depth. GLDGCN also perform well on the classic social network KarateClub and the new Wiki-CS dataset. For the insufficient ability of our algorithm to process large graphs during the experiment, we also introduce subgraph clustering and stochastic gradient descent methods into GCN and design a semi-supervised node classification algorithm based on the CLustering Graph Convolutional neural Network, which enables GCN to process large graph and improves its application value. We complete semi-supervised node classification experiments on two classic large graph which are PPI dataset (more than 50,000 nodes) and Reddit dataset (more than 200,000 nodes), and also perform well.

Decentralized Finance enables many novel applications that were impossible in traditional finances. However, it also introduces new types of vulnerabilities, such as composability bugs. The composability bugs refer to issues that lead to erroneous behaviors when multiple smart contracts operate together. One typical example of composability bugs is those between token contracts and Constant Product Market Makers (CPMM), the most widely used model for Decentralized Exchanges. Since 2022, 23 exploits of such kind have resulted in a total loss of 2.2M USD. BlockSec, a smart contract auditing company, once reported that 138 exploits of such kind occurred just in February 2023. We propose CPMM-Exploiter, which automatically detects and generates end-to-end exploits for CPMM composability bugs. Generating such end-to-end exploits is challenging due to the large search space of multiple contracts and various fees involved with financial services. To tackle this, we investigated real-world exploits regarding these vulnerabilities and identified that they arise due to violating two safety invariants. Based on this observation, we implemented CPMM-Exploiter, a new grammar-based fuzzer targeting the detection of these bugs. CPMM-Exploiter uses fuzzing to find transactions that break the invariants. It then refines these transactions to make them profitable for the attacker. We evaluated CPMM-Exploiter on two real-world exploit datasets. CPMM-Exploiter obtained recalls of 0.91 and 0.89, respectively, while five baselines achieved maximum recalls of 0.36 and 0.58, respectively. We further evaluated CPMM-Exploiter by running it on the latest blocks of the Ethereum and Binance networks. It successfully generated 18 new exploits, which can result in 12.9K USD profit in total.

In this paper, we present two novel methods in Network Intrusion Detection Systems (NIDS) using Graph Neural Networks (GNNs). The first approach, Scattering Transform with E-GraphSAGE (STEG), utilizes the scattering transform to conduct multi-resolution analysis of edge feature vectors. This provides a detailed representation that is essential for identifying subtle anomalies in network traffic. The second approach improves node representation by initiating with Node2Vec, diverging from standard methods of using uniform values, thereby capturing a more accurate and holistic network picture. Our methods have shown significant improvements in performance compared to existing state-of-the-art methods in benchmark NIDS datasets.

Text Classification is the most essential and fundamental problem in Natural Language Processing. While numerous recent text classification models applied the sequential deep learning technique, graph neural network-based models can directly deal with complex structured text data and exploit global information. Many real text classification applications can be naturally cast into a graph, which captures words, documents, and corpus global features. In this survey, we bring the coverage of methods up to 2023, including corpus-level and document-level graph neural networks. We discuss each of these methods in detail, dealing with the graph construction mechanisms and the graph-based learning process. As well as the technological survey, we look at issues behind and future directions addressed in text classification using graph neural networks. We also cover datasets, evaluation metrics, and experiment design and present a summary of published performance on the publicly available benchmarks. Note that we present a comprehensive comparison between different techniques and identify the pros and cons of various evaluation metrics in this survey.

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax

北京阿比特科技有限公司