亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Autonomous Racing awards agents that react to opponents' behaviors with agile maneuvers towards progressing along the track while penalizing both over-aggressive and over-conservative agents. Understanding the intent of other agents is crucial to deploying autonomous systems in adversarial multi-agent environments. Current approaches either oversimplify the discretization of the action space of agents or fail to recognize the long-term effect of actions and become myopic. Our work focuses on addressing these two challenges. First, we propose a novel dimension reduction method that encapsulates diverse agent behaviors while conserving the continuity of agent actions. Second, we formulate the two-agent racing game as a regret minimization problem and provide a solution for tractable counterfactual regret minimization with a regret prediction model. Finally, we validate our findings experimentally on scaled autonomous vehicles. We demonstrate that using the proposed game-theoretic planner using agent characterization with the objective space significantly improves the win rate against different opponents, and the improvement is transferable to unseen opponents in an unseen environment.

相關內容

Vietnamese labor market has been under an imbalanced development. The number of university graduates is growing, but so is the unemployment rate. This situation is often caused by the lack of accurate and timely labor market information, which leads to skill miss-matches between worker supply and the actual market demands. To build a data monitoring and analytic platform for the labor market, one of the main challenges is to be able to automatically detect occupational skills from labor-related data, such as resumes and job listings. Traditional approaches rely on existing taxonomy and/or large annotated data to build Named Entity Recognition (NER) models. They are expensive and require huge manual efforts. In this paper, we propose a practical methodology for skill detection in Vietnamese job listings. Rather than viewing the task as a NER task, we consider the task as a ranking problem. We propose a pipeline in which phrases are first extracted and ranked in semantic similarity with the phrases' contexts. Then we employ a final classification to detect skill phrases. We collected three datasets and conducted extensive experiments. The results demonstrated that our methodology achieved better performance than a NER model in scarce datasets.

Detection of out-of-distribution samples is one of the critical tasks for real-world applications of computer vision. The advancement of deep learning has enabled us to analyze real-world data which contain unexplained samples, accentuating the need to detect out-of-distribution instances more than before. GAN-based approaches have been widely used to address this problem due to their ability to perform distribution fitting; however, they are accompanied by training instability and mode collapse. We propose a simple yet efficient reconstruction-based method that avoids adding complexities to compensate for the limitations of GAN models while outperforming them. Unlike previous reconstruction-based works that only utilize reconstruction error or generated samples, our proposed method simultaneously incorporates both of them in the detection task. Our model, which we call "Connective Novelty Detection" has two subnetworks, an autoencoder, and a binary classifier. The autoencoder learns the representation of the positive class by reconstructing them. Then, the model creates negative and connected positive examples using real and generated samples. Negative instances are generated via manipulating the real data, so their distribution is close to the positive class to achieve a more accurate boundary for the classifier. To boost the robustness of the detection to reconstruction error, connected positive samples are created by combining the real and generated samples. Finally, the binary classifier is trained using connected positive and negative examples. We demonstrate a considerable improvement in novelty detection over state-of-the-art methods on MNIST and Caltech-256 datasets.

Vehicular edge computing (VEC) becomes a promising paradigm for the development of emerging intelligent transportation systems. Nevertheless, the limited resources and massive transmission demands bring great challenges on implementing vehicular applications with stringent deadline requirements. This work presents a non-orthogonal multiple access (NOMA) based architecture in VEC, where heterogeneous edge nodes are cooperated for real-time task processing. We derive a vehicle-to-infrastructure (V2I) transmission model by considering both intra-edge and inter-edge interferences and formulate a cooperative resource optimization (CRO) problem by jointly optimizing the task offloading and resource allocation, aiming at maximizing the service ratio. Further, we decompose the CRO into two subproblems, namely, task offloading and resource allocation. In particular, the task offloading subproblem is modeled as an exact potential game (EPG), and a multi-agent distributed distributional deep deterministic policy gradient (MAD4PG) is proposed to achieve the Nash equilibrium. The resource allocation subproblem is divided into two independent convex optimization problems, and an optimal solution is proposed by using a gradient-based iterative method and KKT condition. Finally, we build the simulation model based on real-world vehicle trajectories and give a comprehensive performance evaluation, which conclusively demonstrates the superiority of the proposed solutions.

The Smart Grid (SG) is a cornerstone of modern society, providing the energy required to sustain billions of lives and thousands of industries. Unfortunately, as one of the most critical infrastructures of our World, the SG is an attractive target for attackers. The problem is aggravated by the increasing adoption of digitalisation, which further increases the SG's exposure to cyberthreats. Successful exploitation of such exposure leads to entire countries being paralysed, which is an unacceptable -- but ultimately inescapable -- risk. This paper aims to mitigate this risk by elucidating the perspective of real practitioners on the cybersecurity of the SG. We interviewed 18 entities, operating in diverse countries in Europe and covering all domains of the SG -- from energy generation, to its delivery. Our analysis highlights a stark contrast between (a)research and practice, but also between (b) public and private entities. For instance: some threats appear to be much less dangerous than what is claimed in related papers; some technological paradigms have dubious utility for practitioners, but are actively promoted by literature; finally, practitioners may either under- or over-estimate their own cybersecurity capabilities. We derive four takeaways that enable future endeavours to improve the overall cybersecurity in the SG. We conjecture that most of the problems are due to an improper communication between researchers, practitioners and regulatory bodies -- which, despite sharing a common goal, tend to neglect the viewpoint of the other `spheres'.

We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.

This paper models categorical data with two or multiple responses, focusing on the interactions between responses. We propose an efficient iterative procedure based on sufficient dimension reduction. We study the theoretical guarantees of the proposed method under the two- and multiple-response models, demonstrating the uniqueness of the proposed estimator and with the high probability that the proposed method recovers the oracle least squares estimators. For data analysis, we demonstrate that the proposed method is efficient in the multiple-response model and performs better than some existing methods built in the multiple-response models. We apply this modeling and the proposed method to an adult dataset and right heart catheterization dataset and obtain meaningful results.

Motion planning and control are crucial components of robotics applications. Here, spatio-temporal hard constraints like system dynamics and safety boundaries (e.g., obstacles in automated driving) restrict the robot's motions. Direct methods from optimal control solve a constrained optimization problem. However, in many applications finding a proper cost function is inherently difficult because of the weighting of partially conflicting objectives. On the other hand, Imitation Learning (IL) methods such as Behavior Cloning (BC) provide a intuitive framework for learning decision-making from offline demonstrations and constitute a promising avenue for planning and control in complex robot applications. Prior work primarily relied on soft-constraint approaches, which use additional auxiliary loss terms describing the constraints. However, catastrophic safety-critical failures might occur in out-of-distribution (OOD) scenarios. This work integrates the flexibility of IL with hard constraint handling in optimal control. Our approach constitutes a general framework for constraint robotic motion planning and control using offline IL. Hard constraints are integrated into the learning problem in a differentiable manner, via explicit completion and gradient-based correction. Simulated experiments of mobile robot navigation and automated driving provide evidence for the performance of the proposed method.

A good estimation of the actions' cost is key in task planning for human-robot collaboration. The duration of an action depends on agents' capabilities and the correlation between actions performed simultaneously by the human and the robot. This paper proposes an approach to learning actions' costs and coupling between actions executed concurrently by humans and robots. We leverage the information from past executions to learn the average duration of each action and a synergy coefficient representing the effect of an action performed by the human on the duration of the action performed by the robot (and vice versa). We implement the proposed method in a simulated scenario where both agents can access the same area simultaneously. Safety measures require the robot to slow down when the human is close, denoting a bad synergy of tasks operating in the same area. We show that our approach can learn such bad couplings so that a task planner can leverage this information to find better plans.

In the past few decades, artificial intelligence (AI) technology has experienced swift developments, changing everyone's daily life and profoundly altering the course of human society. The intention of developing AI is to benefit humans, by reducing human labor, bringing everyday convenience to human lives, and promoting social good. However, recent research and AI applications show that AI can cause unintentional harm to humans, such as making unreliable decisions in safety-critical scenarios or undermining fairness by inadvertently discriminating against one group. Thus, trustworthy AI has attracted immense attention recently, which requires careful consideration to avoid the adverse effects that AI may bring to humans, so that humans can fully trust and live in harmony with AI technologies. Recent years have witnessed a tremendous amount of research on trustworthy AI. In this survey, we present a comprehensive survey of trustworthy AI from a computational perspective, to help readers understand the latest technologies for achieving trustworthy AI. Trustworthy AI is a large and complex area, involving various dimensions. In this work, we focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems. We also discuss the accordant and conflicting interactions among different dimensions and discuss potential aspects for trustworthy AI to investigate in the future.

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.

北京阿比特科技有限公司