亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Localizing oneself during endoscopic procedures can be problematic due to the lack of distinguishable textures and landmarks, as well as difficulties due to the endoscopic device such as a limited field of view and challenging lighting conditions. Expert knowledge shaped by years of experience is required for localization within the human body during endoscopic procedures. In this work, we present a deep learning method based on anatomy recognition, that constructs a surgical path in an unsupervised manner from surgical videos, modelling relative location and variations due to different viewing angles. At inference time, the model can map an unseen video's frames on the path and estimate the viewing angle, aiming to provide guidance, for instance, to reach a particular destination. We test the method on a dataset consisting of surgical videos of transsphenoidal adenomectomies, as well as on a synthetic dataset. An online tool that lets researchers upload their surgical videos to obtain anatomy detections and the weights of the trained YOLOv7 model are available at: //surgicalvision.bmic.ethz.ch.

相關內容

The technique of equipping graphs with an equivalence relation, called equality saturation, has recently proved both powerful and practical in program optimisation, particularly for satisfiability modulo theory solvers. We give a categorical semantics to these structures, called e-graphs, in terms of Cartesian categories enriched over a semilattice. We show how this semantics can be generalised to monoidal categories, which opens the door to new applications of e-graph techniques, from algebraic to monoidal theories. Finally, we present a sound and complete combinatorial representation of morphisms in such a category, based on a generalisation of hypergraphs which we call e-hypergraphs. They have the usual advantage that many of their structural equations are absorbed into a general notion of isomorphism.

This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to overcome the severe multi-hop-reflection path loss in NLoS sensing. In particular, we consider two sensing setups without and with dedicated sensors equipped at active IRSs. In the first case without dedicated sensors at IRSs, we investigate the cooperative sensing at the BS, where the target's direction-of-arrival (DoA) with respect to each IRS is estimated based on the echo signals received at the BS. In the other case with dedicated sensors at IRSs, we consider that each IRS is able to receive echo signals and estimate the target's DoA with respect to itself. For both sensing setups, we first derive the closed-form Cram\'{e}r-Rao bound (CRB) for estimating target DoA. Then, the (maximum) CRB is minimized by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum amplification power and the maximum power amplification gain constraints at individual active IRSs. To tackle the resulting highly non-convex (max-)CRB minimization problems, we propose two efficient algorithms to obtain high-quality solutions for the two cases with sensing at the BS and at the IRSs, respectively, based on alternating optimization, successive convex approximation, and semi-definite relaxation.

Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they present biases inherited from the training data, inconsistency across different contexts, and difficulty understanding complex scenarios involving multiple layers of context. Therefore, recent research attempts to leverage the strength of multiple agents working collaboratively with various types of data and tools for enhanced consistency and reliability. To that end, this paper aims to understand whether multi-modal and multi-agent systems are advancing toward rationality by surveying the state-of-the-art works, identifying advancements over single-agent and single-modal systems in terms of rationality, and discussing open problems and future directions. We maintain an open repository at //github.com/bowen-upenn/MMMA_Rationality.

With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA). We release the source code at: //github.com/declare-lab/della.

Training recommendation systems (RecSys) faces several challenges as it requires the "data preprocessing" stage to preprocess an ample amount of raw data and feed them to the GPU for training in a seamless manner. To sustain high training throughput, state-of-the-art solutions reserve a large fleet of CPU servers for preprocessing which incurs substantial deployment cost and power consumption. Our characterization reveals that prior CPU-centric preprocessing is bottlenecked on feature generation and feature normalization operations as it fails to reap out the abundant inter-/intra-feature parallelism in RecSys preprocessing. PreSto is a storage-centric preprocessing system leveraging In-Storage Processing (ISP), which offloads the bottlenecked preprocessing operations to our ISP units. We show that PreSto outperforms the baseline CPU-centric system with a $9.6\times$ speedup in end-to-end preprocessing time, $4.3\times$ enhancement in cost-efficiency, and $11.3\times$ improvement in energyefficiency on average for production-scale RecSys preprocessing.

Diffusion models have emerged as a prominent class of generative models, surpassing previous methods regarding sample quality and training stability. Recent works have shown the advantages of diffusion models in improving reinforcement learning (RL) solutions, including as trajectory planners, expressive policy classes, data synthesizers, etc. This survey aims to provide an overview of the advancements in this emerging field and hopes to inspire new avenues of research. First, we examine several challenges encountered by current RL algorithms. Then, we present a taxonomy of existing methods based on the roles played by diffusion models in RL and explore how the existing challenges are addressed. We further outline successful applications of diffusion models in various RL-related tasks while discussing the limitations of current approaches. Finally, we conclude the survey and offer insights into future research directions, focusing on enhancing model performance and applying diffusion models to broader tasks. We are actively maintaining a GitHub repository for papers and other related resources in applying diffusion models in RL: //github.com/apexrl/Diff4RLSurvey .

This article presents the affordances that Generative Artificial Intelligence can have in disinformation context, one of the major threats to our digitalized society. We present a research framework to generate customized agent-based social networks for disinformation simulations that would enable understanding and evaluation of the phenomena whilst discussing open challenges.

More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed and many aspects of common sense remain untested. Consequently, we do not currently have any reliable way of measuring to what extent existing AI systems have achieved these abilities. This paper surveys the development and uses of AI commonsense benchmarks. We discuss the nature of common sense; the role of common sense in AI; the goals served by constructing commonsense benchmarks; and desirable features of commonsense benchmarks. We analyze the common flaws in benchmarks, and we argue that it is worthwhile to invest the work needed ensure that benchmark examples are consistently high quality. We survey the various methods of constructing commonsense benchmarks. We enumerate 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video based, and 7 simulated physical environments. We discuss the gaps in the existing benchmarks and aspects of commonsense reasoning that are not addressed in any existing benchmark. We conclude with a number of recommendations for future development of commonsense AI benchmarks.

Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous published state-of-the-art single-model results of panoptic segmentation on MS COCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MS COCO with 60%-90% faster inference speeds. Code and models will be released at //github.com/ZwwWayne/K-Net/.

We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs.We validate the utility ofMMKG in the sameAs link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.

北京阿比特科技有限公司