亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing. In such a task, target ambiguity remains a problem hindering the accuracy and efficiency of segmentation. That is, in scenes with rich context, one click may correspond to multiple potential targets, while most previous interactive segmentors only generate a single mask and fail to deal with target ambiguity. In this paper, we propose a novel interactive segmentation network named PiClick, to yield all potentially reasonable masks and suggest the most plausible one for the user. Specifically, PiClick utilizes a Transformer-based architecture to generate all potential target masks by mutually interactive mask queries. Moreover, a Target Reasoning module is designed in PiClick to automatically suggest the user-desired mask from all candidates, relieving target ambiguity and extra-human efforts. Extensive experiments on 9 interactive segmentation datasets demonstrate PiClick performs favorably against previous state-of-the-arts considering the segmentation results. Moreover, we show that PiClick effectively reduces human efforts in annotating and picking the desired masks. To ease the usage and inspire future research, we release the source code of PiClick together with a plug-and-play annotation tool at //github.com/cilinyan/PiClick.

相關內容

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來,這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接: · MoDELS · Learning · Prompt · 機器閱讀理解 ·
2023 年 10 月 13 日

Language models are achieving impressive performance on various tasks by aggressively adopting inference-time prompting techniques, such as zero-shot and few-shot prompting. In this work, we introduce EchoPrompt, a simple yet effective approach that prompts the model to rephrase its queries before answering them. EchoPrompt is adapted for both zero-shot and few-shot in-context learning with standard and chain-of-thought prompting. Experimental results show that EchoPrompt yields substantial improvements across all these settings for four families of causal language models. These improvements are observed across various numerical reasoning (e.g. GSM8K, SVAMP), reading comprehension (e.g. DROP), and logical reasoning (e.g. Coin Flipping) tasks. On average, EchoPrompt improves the Zero-shot-CoT performance of code-davinci-002 by 5% in numerical tasks and 13% in reading comprehension tasks. We investigate the factors contributing to EchoPrompt's effectiveness through ablation studies, which reveal that both the original query and the model-generated rephrased version are instrumental in its performance gains. Our empirical results indicate that EchoPrompt is an effective technique that enhances in-context learning performance. We recommend incorporating EchoPrompt into various baseline prompting strategies to achieve performance boosts.

We investigate a family of approximate multi-step proximal point methods, accelerated by implicit linear discretizations of gradient flow. The resulting methods are multi-step proximal point methods, with similar computational cost in each update as the proximal point method. We explore several optimization methods where applying an approximate multistep proximal points method results in improved convergence behavior. We argue that this is the result of the lowering of truncation error in approximating gradient flow

Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding speech quality and task generalization. This paper presents Vec-Tok Speech, an extensible framework that resembles multiple speech generation tasks, generating expressive and high-fidelity speech. Specifically, we propose a novel speech codec based on speech vectors and semantic tokens. Speech vectors contain acoustic details contributing to high-fidelity speech reconstruction, while semantic tokens focus on the linguistic content of speech, facilitating language modeling. Based on the proposed speech codec, Vec-Tok Speech leverages an LM to undertake the core of speech generation. Moreover, Byte-Pair Encoding (BPE) is introduced to reduce the token length and bit rate for lower exposure bias and longer context coverage, improving the performance of LMs. Vec-Tok Speech can be used for intra- and cross-lingual zero-shot voice conversion (VC), zero-shot speaking style transfer text-to-speech (TTS), speech-to-speech translation (S2ST), speech denoising, and speaker de-identification and anonymization. Experiments show that Vec-Tok Speech, built on 50k hours of speech, performs better than other SOTA models. Code will be available at //github.com/BakerBunker/VecTok .

Two-scale topology optimization, combined with the design of microstructure families with a broad range of effective material parameters, is increasingly widely used in many fabrication applications to achieve a target deformation behavior for a variety of objects. The main idea of this approach is to optimize the distribution of material properties in the object partitioned into relatively coarse cells, and then replace each cell with microstructure geometry that mimics these material properties. In this paper, we focus on adapting this approach to complex shapes in situations when preserving the shape's surface is important. Our approach extends any regular (i.e. defined on a regular lattice grid) microstructure family to complex shapes, by enriching it with individually optimized cut-cell tiles adapted to the geometry of the cut-cell. We propose an automated and robust pipeline based on this approach, and we show that the performance of the regular microstructure family is only minimally affected by our extension while allowing its use on 2D and 3D shapes of high complexity.

We give an adequate, concrete, categorical-based model for Lambda-S, which is a typed version of a linear-algebraic lambda calculus, extended with measurements. Lambda-S is an extension to first-order lambda calculus unifying two approaches of non-cloning in quantum lambda-calculi: to forbid duplication of variables, and to consider all lambda-terms as algebraic linear functions. The type system of Lambda-S have a superposition constructor S such that a type A is considered as the base of a vector space while SA is its span. Our model considers S as the composition of two functors in an adjunction relation between the category of sets and the category of vector spaces over C. The right adjoint is a forgetful functor U, which is hidden in the language, and plays a central role in the computational reasoning.

The device fingerprinting technique extracts fingerprints based on the hardware characteristics of the device to identify the device. The primary goal of device fingerprinting is to accurately and uniquely identify a device, which requires the generated device fingerprints to have good stability to achieve long-term tracking of the target device. However, the fingerprints generated by some existing fingerprinting technologies are not stable enough or change frequently, making it impossible to track the target device for a long time. In this paper, we present FPHammer, a novel DRAM-based fingerprinting technique. The device fingerprint generated by our technique has high stability and can be used to track the device for a long time. We leverage the Rowhammer technique to repeatedly and quickly access a row in DRAM to get bit flips in its adjacent row. We then construct a physical fingerprint of the device based on the locations of the collected bit flips. The evaluation results of the uniqueness and reliability of the physical fingerprint show that it can be used to distinguish devices with the same hardware and software configuration. The experimental results on device identification demonstrate that the physical fingerprints engendered by our innovative technique are inherently linked to the entirety of the device rather than just the DRAM module. Even if the device modifies software-level parameters such as MAC address and IP address or even reinstalls the operating system, we can accurately identify the target device. This demonstrates that FPHammer can generate stable fingerprints that are not affected by software layer parameters.

Suitable discretizations through tensor product formulas of popular multidimensional operators (diffusion--advection, for instance) lead to matrices with $d$-dimensional Kronecker sum structure. For evolutionary PDEs containing such operators and integrated in time with exponential integrators, it is of paramount importance to efficiently approximate actions of $\varphi$-functions of this kind of matrices. In this work, we show how to produce directional split approximations of third order with respect to the time step size. They conveniently employ tensor-matrix products (realized with highly performance level 3 BLAS) and that allow for the effective usage in practice of exponential integrators up to order three. The approach has been successfully tested against state-of-the-art techniques on two well-known physical models, namely FitzHugh--Nagumo and Schnakenberg.

Inspired by biological motion generation, central pattern generators (CPGs) is frequently employed in legged robot locomotion control to produce natural gait pattern with low-dimensional control signals. However, the limited adaptability and stability over complex terrains hinder its application. To address this issue, this paper proposes a terrain-adaptive locomotion control method that incorporates deep reinforcement learning (DRL) framework into CPG, where the CPG model is responsible for the generation of synchronized signals, providing basic locomotion gait, while DRL is integrated to enhance the adaptability of robot towards uneven terrains by adjusting the parameters of CPG mapping functions. The experiments conducted on the hexapod robot in Isaac Gym simulation environment demonstrated the superiority of the proposed method in terrain-adaptability, convergence rate and reward design complexity.

Image captioning is a challenging task involving generating a textual description for an image using computer vision and natural language processing techniques. This paper proposes a deep neural framework for image caption generation using a GRU-based attention mechanism. Our approach employs multiple pre-trained convolutional neural networks as the encoder to extract features from the image and a GRU-based language model as the decoder to generate descriptive sentences. To improve performance, we integrate the Bahdanau attention model with the GRU decoder to enable learning to focus on specific image parts. We evaluate our approach using the MSCOCO and Flickr30k datasets and show that it achieves competitive scores compared to state-of-the-art methods. Our proposed framework can bridge the gap between computer vision and natural language and can be extended to specific domains.

Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.

北京阿比特科技有限公司