亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Learned visual representations often capture large amounts of semantic information for accurate downstream applications. Human understanding of the world is fundamentally grounded in hierarchy. To mimic this and further improve representation capabilities, the community has explored "hierarchical" visual representations that aim at modeling the underlying hierarchy of the visual world. In this work, we set out to investigate if hierarchical visual representations truly capture the human perceived hierarchy better than standard learned representations. To this end, we create HierNet, a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet. After extensive evaluation of Hyperbolic and Matryoshka Representations across training setups, we conclude that they do not capture hierarchy any better than the standard representations but can assist in other aspects like search efficiency and interpretability. Our benchmark and the datasets are open-sourced at //github.com/ethanlshen/HierNet.

相關內容

The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging because of the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. To fully validate SAM's performance on medical data, we collected and sorted 53 open-source datasets and built a large medical segmentation dataset with 18 modalities, 84 objects, 125 object-modality paired targets, 1050K 2D images, and 6033K masks. We comprehensively analyzed different models and strategies on the so-called COSMOS 1050K dataset. Our findings mainly include the following: 1) SAM showed remarkable performance in some specific objects but was unstable, imperfect, or even totally failed in other situations. 2) SAM with the large ViT-H showed better overall performance than that with the small ViT-B. 3) SAM performed better with manual hints, especially box, than the Everything mode. 4) SAM could help human annotation with high labeling quality and less time. 5) SAM was sensitive to the randomness in the center point and tight box prompts, and may suffer from a serious performance drop. 6) SAM performed better than interactive methods with one or a few points, but will be outpaced as the number of points increases. 7) SAM's performance correlated to different factors, including boundary complexity, intensity differences, etc. 8) Finetuning the SAM on specific medical tasks could improve its average DICE performance by 4.39% and 6.68% for ViT-B and ViT-H, respectively. We hope that this comprehensive report can help researchers explore the potential of SAM applications in MIS, and guide how to appropriately use and develop SAM.

Knowledge retrieval with multi-modal queries plays a crucial role in supporting knowledge-intensive multi-modal applications. However, existing methods face challenges in terms of their effectiveness and training efficiency, especially when it comes to training and integrating multiple retrievers to handle multi-modal queries. In this paper, we propose an innovative end-to-end generative framework for multi-modal knowledge retrieval. Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases, even when trained with limited data. We retrieve knowledge via a two-step process: 1) generating knowledge clues related to the queries, and 2) obtaining the relevant document by searching databases using the knowledge clue. In particular, we first introduce an object-aware prefix-tuning technique to guide multi-grained visual learning. Then, we align multi-grained visual features into the textual feature space of the LLM, employing the LLM to capture cross-modal interactions. Subsequently, we construct instruction data with a unified format for model training. Finally, we propose the knowledge-guided generation strategy to impose prior constraints in the decoding steps, thereby promoting the generation of distinctive knowledge clues. Through experiments conducted on three benchmarks, we demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.

Instruction-tuned large language models (LLMs) excel at many tasks, and will even provide explanations for their behavior. Since these models are directly accessible to the public, there is a risk that convincing and wrong explanations can lead to unsupported confidence in LLMs. Therefore, interpretability-faithfulness of self-explanations is an important consideration for AI Safety. Assessing the interpretability-faithfulness of these explanations, termed self-explanations, is challenging as the models are too complex for humans to annotate what is a correct explanation. To address this, we propose employing self-consistency checks as a measure of faithfulness. For example, if an LLM says a set of words is important for making a prediction, then it should not be able to make the same prediction without these words. While self-consistency checks are a common approach to faithfulness, they have not previously been applied to LLM's self-explanations. We apply self-consistency checks to three types of self-explanations: counterfactuals, importance measures, and redactions. Our work demonstrate that faithfulness is both task and model dependent, e.g., for sentiment classification, counterfactual explanations are more faithful for Llama2, importance measures for Mistral, and redaction for Falcon 40B. Finally, our findings are robust to prompt-variations.

The common use case of code smells assumes causality: Identify a smell, remove it, and by doing so improve the code. We empirically investigate their fitness to this use. We present a list of properties that code smells should have if they indeed cause lower quality. We evaluated the smells in 31,687 Java files from 677 GitHub repositories, all the repositories with 200+ commits in 2019. We measured the influence of smells on four metrics for quality, productivity, and bug detection efficiency. Out of 151 code smells computed by the CheckStyle smell detector, less than 20% were found to be potentially causal, and only a handful are rather robust. The strongest smells deal with simplicity, defensive programming, and abstraction. Files without the potentially causal smells are 50% more likely to be of high quality. Unfortunately, most smells are not removed, and developers tend to remove the easy ones and not the effective ones.

Recent advances in deep learning techniques have achieved remarkable performance in several computer vision problems. A notably intuitive technique called Curriculum Learning (CL) has been introduced recently for training deep learning models. Surprisingly, curriculum learning achieves significantly improved results in some tasks but marginal or no improvement in others. Hence, there is still a debate about its adoption as a standard method to train supervised learning models. In this work, we investigate the impact of curriculum learning in crowd counting using the density estimation method. We performed detailed investigations by conducting 112 experiments using six different CL settings using eight different crowd models. Our experiments show that curriculum learning improves the model learning performance and shortens the convergence time.

Text-based knowledge graph completion (KGC) methods, leveraging textual entity descriptions are at the research forefront. The efficacy of these models hinges on the quality of the textual data. This study explores whether enriched or more efficient textual descriptions can amplify model performance. Recently, Large Language Models (LLMs) have shown remarkable improvements in NLP tasks, attributed to their sophisticated text generation and conversational capabilities. LLMs assimilate linguistic patterns and integrate knowledge from their training data. Compared to traditional databases like Wikipedia, LLMs provide several advantages, facilitating broader information querying and content augmentation. We hypothesize that LLMs, without fine-tuning, can refine entity descriptions, serving as an auxiliary knowledge source. An in-depth analysis was conducted to verify this hypothesis. We found that (1) without fine-tuning, LLMs have the capability to further improve the quality of entity text descriptions. We validated this through experiments on the FB15K-237 and WN18RR datasets. (2) LLMs exhibit text generation hallucination issues and selectively output words with multiple meanings. This was mitigated by contextualizing prompts to constrain LLM outputs. (3) Larger model sizes do not necessarily guarantee better performance; even the 7B model can achieve optimized results in this comparative task. These findings underscore the untapped potential of large models in text-based KGC, which is a promising direction for further research in KGC. The code and datasets are accessible at \href{//github.com/sjlmg/CP-KGC}.

The missing item problem, as introduced by Stoeckl in his work at SODA 23, focuses on continually identifying a missing element $e$ in a stream of elements ${e_1, ..., e_{\ell}}$ from the set $\{1,2,...,n\}$, such that $e \neq e_i$ for any $i \in \{1,...,n\}$. Stoeckl's investigation primarily delves into scenarios with $\ell<n$, providing bounds for the (i) deterministic case, (ii) the static case -- where the algorithm might be randomized but the stream is fixed in advanced) and (iii) the adversarially robust case -- where the algorithm is randomized and each stream element can be chosen depending on earlier algorithm outputs. Building upon this foundation, our paper addresses previously unexplored aspects of the missing item problem. In the first segment, we examine the static setting with a long stream, where the length of the steam $\ell$ is close to or even exceeds the size of the universe $n$. We present an algorithm demonstrating that even when $\ell$ is very close to $n$ (say $\ell=n-1$), polylog($n$) bits of memory suffice to identify the missing item. Additionally, we establish tight bounds of $\tilde{\Theta(k)}$ for the scenario of $\ell = n+k$. The second segment of this part of our work focuses on the {\em adversarially robust setting}. We show a lower bound for a pseudo-deterministic error-zero (where the algorithm reports its errors) algorithm of approximating $\Omega(\ell)$, up to polylog factors. Based on Stoeckl's work, we establish a lower bound for a random-start (only use randomness at initialization) error-zero streaming algorithm. In the final segment, we explore streaming algorithms with randomness-on-the-fly, where the random bits that are saved for future use are included in the space cost. For streams with length $\ell = O(\sqrt{n})$, we provide an upper bound of $O(log n)$. This establishes a gap between randomness-on-the-fly to random-start.

Graph Neural Networks (GNNs) draw their strength from explicitly modeling the topological information of structured data. However, existing GNNs suffer from limited capability in capturing the hierarchical graph representation which plays an important role in graph classification. In this paper, we innovatively propose hierarchical graph capsule network (HGCN) that can jointly learn node embeddings and extract graph hierarchies. Specifically, disentangled graph capsules are established by identifying heterogeneous factors underlying each node, such that their instantiation parameters represent different properties of the same entity. To learn the hierarchical representation, HGCN characterizes the part-whole relationship between lower-level capsules (part) and higher-level capsules (whole) by explicitly considering the structure information among the parts. Experimental studies demonstrate the effectiveness of HGCN and the contribution of each component.

Recommender systems play a fundamental role in web applications in filtering massive information and matching user interests. While many efforts have been devoted to developing more effective models in various scenarios, the exploration on the explainability of recommender systems is running behind. Explanations could help improve user experience and discover system defects. In this paper, after formally introducing the elements that are related to model explainability, we propose a novel explainable recommendation model through improving the transparency of the representation learning process. Specifically, to overcome the representation entangling problem in traditional models, we revise traditional graph convolution to discriminate information from different layers. Also, each representation vector is factorized into several segments, where each segment relates to one semantic aspect in data. Different from previous work, in our model, factor discovery and representation learning are simultaneously conducted, and we are able to handle extra attribute information and knowledge. In this way, the proposed model can learn interpretable and meaningful representations for users and items. Unlike traditional methods that need to make a trade-off between explainability and effectiveness, the performance of our proposed explainable model is not negatively affected after considering explainability. Finally, comprehensive experiments are conducted to validate the performance of our model as well as explanation faithfulness.

Deep learning applies multiple processing layers to learn representations of data with multiple levels of feature extraction. This emerging technique has reshaped the research landscape of face recognition since 2014, launched by the breakthroughs of Deepface and DeepID methods. Since then, deep face recognition (FR) technique, which leverages the hierarchical architecture to learn discriminative face representation, has dramatically improved the state-of-the-art performance and fostered numerous successful real-world applications. In this paper, we provide a comprehensive survey of the recent developments on deep FR, covering the broad topics on algorithms, data, and scenes. First, we summarize different network architectures and loss functions proposed in the rapid evolution of the deep FR methods. Second, the related face processing methods are categorized into two classes: `one-to-many augmentation' and `many-to-one normalization'. Then, we summarize and compare the commonly used databases for both model training and evaluation. Third, we review miscellaneous scenes in deep FR, such as cross-factor, heterogenous, multiple-media and industry scenes. Finally, potential deficiencies of the current methods and several future directions are highlighted.

北京阿比特科技有限公司