亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Text reuse is a methodological element of fundamental importance in humanities research: pieces of text that re-appear across different documents, verbatim or paraphrased, provide invaluable information about the historical spread and evolution of ideas. Large modern digitized corpora enable the joint analysis of text collections that span entire centuries and the detection of large-scale patterns, impossible to detect with traditional small-scale analysis. For this opportunity to materialize, it is necessary to develop efficient data science systems that perform the corresponding analysis tasks. In this paper, we share insights from ReceptionReader, a system for analyzing text reuse in large historical corpora. The system is built upon billions of instances of text reuses from large digitized corpora of 18th-century texts. Its main functionality is to perform downstream text reuse analysis tasks, such as finding reuses that stem from a given article or identifying the most reused quotes from a set of documents, with each task expressed as a database query. For the purposes of the paper, we discuss the related design choices including various database normalization levels and query execution frameworks, such as distributed data processing (Apache Spark), indexed row store engine (MariaDB Aria), and compressed column store engine (MariaDB Columnstore). Moreover, we present an extensive evaluation with various metrics of interest (latency, storage size, and computing costs) for varying workloads, and we offer insights from the trade-offs we observed and the choices that emerged as optimal in our setting. In summary, our results show that (1) for the workloads that are most relevant to text-reuse analysis, the MariaDB Aria framework emerges as the overall optimal choice, (2) big data processing (Apache Spark) is irreplaceable for all processing stages of the system's pipeline.

相關內容

The human cognitive system exhibits remarkable flexibility and generalization capabilities, partly due to its ability to form low-dimensional, compositional representations of the environment. In contrast, standard neural network architectures often struggle with abstract reasoning tasks, overfitting, and requiring extensive data for training. This paper investigates the impact of the relational bottleneck -- a mechanism that focuses processing on relations among inputs -- on the learning of factorized representations conducive to compositional coding and the attendant flexibility of processing. We demonstrate that such a bottleneck not only improves generalization and learning efficiency, but also aligns network performance with human-like behavioral biases. Networks trained with the relational bottleneck developed orthogonal representations of feature dimensions latent in the dataset, reflecting the factorized structure thought to underlie human cognitive flexibility. Moreover, the relational network mimics human biases towards regularity without pre-specified symbolic primitives, suggesting that the bottleneck fosters the emergence of abstract representations that confer flexibility akin to symbols.

We present a generalized distance metric that can be used to implement routing strategies and identify routing table entries to reach the root node for a given key, in a DHT (Distributed Hash Table) network based on either Chord, Kademlia, Tapestry, or Pastry. The generalization shows that all the above four DHT algorithms are in fact, the same algorithm but with different parameters in distance representation. We also proposes that nodes can have routing tables of varying sizes based on their memory capabilities but with the fact that each node must have at least two entries, one for the node closest from it, and the other for the node from whom it is closest in each ring components for all the algorithms. Messages will always reach the correct root nodes by following the above rule. We also further observe that in any network, if the distance metric to define the root node in the DHT is same at all the nodes, then the root node for a key will also be the same, irrespective of the size of the routing table at different nodes.

Computing the core decomposition of a graph is a fundamental problem that has recently been studied in the differentially private setting, motivated by practical applications in data mining. In particular, Dhulipala et al. [FOCS 2022] gave the first mechanism for approximate core decomposition in the challenging and practically relevant setting of local differential privacy. One of the main open problems left by their work is whether the accuracy, i.e., the approximation ratio and additive error, of their mechanism can be improved. We show the first lower bounds on the additive error of approximate and exact core decomposition mechanisms in the centralized and local model of differential privacy, respectively. We also give mechanisms for exact and approximate core decomposition in the local model, with almost matching additive error bounds. Our mechanisms are based on a black-box application of continual counting. They also yield improved mechanisms for the approximate densest subgraph problem in the local model.

Elliptic reconstruction property, originally introduced by Makridakis and Nochetto for linear parabolic problems, is a well-known tool to derive optimal a posteriori error estimates. No such results are known for nonlinear and nonsmooth problems such as parabolic variational inequalities (VIs). This article establishes the elliptic reconstruction property for parabolic VIs and derives a posteriori error estimates in $L^{\infty}(0,T;L^{2}(\Omega))$ and $L^{\infty}(0,T;L^{\infty}(\Omega))$, respectively. As an application, the residual-type error estimates are presented.

Uncertainty quantification (UQ) to detect samples with large expected errors (outliers) is applied to reactive molecular potential energy surfaces (PESs). Three methods - Ensembles, Deep Evidential Regression (DER), and Gaussian Mixture Models (GMM) - were applied to the H-transfer reaction between ${\it syn-}$Criegee and vinyl hydroxyperoxide. The results indicate that ensemble models provide the best results for detecting outliers, followed by GMM. For example, from a pool of 1000 structures with the largest uncertainty, the detection quality for outliers is $\sim 90$ \% and $\sim 50$ \%, respectively, if 25 or 1000 structures with large errors are sought. On the contrary, the limitations of the statistical assumptions of DER greatly impacted its prediction capabilities. Finally, a structure-based indicator was found to be correlated with large average error, which may help to rapidly classify new structures into those that provide an advantage for refining the neural network.

Medical image segmentation aims to delineate the anatomical or pathological structures of interest, playing a crucial role in clinical diagnosis. A substantial amount of high-quality annotated data is crucial for constructing high-precision deep segmentation models. However, medical annotation is highly cumbersome and time-consuming, especially for medical videos or 3D volumes, due to the huge labeling space and poor inter-frame consistency. Recently, a fundamental task named Moving Object Segmentation (MOS) has made significant advancements in natural images. Its objective is to delineate moving objects from the background within image sequences, requiring only minimal annotations. In this paper, we propose the first foundation model, named iMOS, for MOS in medical images. Extensive experiments on a large multi-modal medical dataset validate the effectiveness of the proposed iMOS. Specifically, with the annotation of only a small number of images in the sequence, iMOS can achieve satisfactory tracking and segmentation performance of moving objects throughout the entire sequence in bi-directions. We hope that the proposed iMOS can help accelerate the annotation speed of experts, and boost the development of medical foundation models.

Numerical solution of discrete PDEs corresponding to saddle point problems is highly relevant to physical systems such as Stokes flow. However, scaling up numerical solvers for such systems is often met with challenges in efficiency and convergence. Multigrid is an approach with excellent applicability to elliptic problems such as the Stokes equations, and can be a solution to such challenges of scalability and efficiency. The degree of success of such methods, however, is highly contingent on the design of key components of a multigrid scheme, including the hierarchy of discretizations, and the relaxation scheme used. Additionally, in many practical cases, it may be more effective to use a multigrid scheme as a preconditioner to an iterative Krylov subspace solver, as opposed to striving for maximum efficacy of the relaxation scheme in all foreseeable settings. In this paper, we propose an efficient symmetric multigrid preconditioner for the Stokes Equations on a staggered finite-difference discretization. Our contribution is focused on crafting a preconditioner that (a) is symmetric indefinite, matching the property of the Stokes system itself, (b) is appropriate for preconditioning the SQMR iterative scheme, and (c) has the requisite symmetry properties to be used in this context. In addition, our design is efficient in terms of computational cost and facilitates scaling to large domains.

Transformer-based Single Image Deraining (SID) methods have achieved remarkable success, primarily attributed to their robust capability in capturing long-range interactions. However, we've noticed that current methods handle rain-affected and unaffected regions concurrently, overlooking the disparities between these areas, resulting in confusion between rain streaks and background parts, and inabilities to obtain effective interactions, ultimately resulting in suboptimal deraining outcomes. To address the above issue, we introduce the Region Transformer (Regformer), a novel SID method that underlines the importance of independently processing rain-affected and unaffected regions while considering their combined impact for high-quality image reconstruction. The crux of our method is the innovative Region Transformer Block (RTB), which integrates a Region Masked Attention (RMA) mechanism and a Mixed Gate Forward Block (MGFB). Our RTB is used for attention selection of rain-affected and unaffected regions and local modeling of mixed scales. The RMA generates attention maps tailored to these two regions and their interactions, enabling our model to capture comprehensive features essential for rain removal. To better recover high-frequency textures and capture more local details, we develop the MGFB as a compensation module to complete local mixed scale modeling. Extensive experiments demonstrate that our model reaches state-of-the-art performance, significantly improving the image deraining quality. Our code and trained models are publicly available.

We examine the possibility of approximating Maximum Vertex-Disjoint Shortest Paths. In this problem, the input is an edge-weighted (directed or undirected) $n$-vertex graph $G$ along with $k$ terminal pairs $(s_1,t_1),(s_2,t_2),\ldots,(s_k,t_k)$. The task is to connect as many terminal pairs as possible by pairwise vertex-disjoint paths such that each path is a shortest path between the respective terminals. Our work is anchored in the recent breakthrough by Lochet [SODA '21], which demonstrates the polynomial-time solvability of the problem for a fixed value of $k$. Lochet's result implies the existence of a polynomial-time $ck$-approximation for Maximum Vertex-Disjoint Shortest Paths, where $c \leq 1$ is a constant. Our first result suggests that this approximation algorithm is, in a sense, the best we can hope for. More precisely, assuming the gap-ETH, we exclude the existence of an $o(k)$-approximations within $f(k) \cdot $poly($n$) time for any function $f$ that only depends on $k$. Our second result demonstrates the infeasibility of achieving an approximation ratio of $n^{\frac{1}{2}-\varepsilon}$ in polynomial time, unless P = NP. It is not difficult to show that a greedy algorithm selecting a path with the minimum number of arcs results in a $\lceil\sqrt{\ell}\rceil$-approximation, where $\ell$ is the number of edges in all the paths of an optimal solution. Since $\ell \leq n$, this underscores the tightness of the $n^{\frac{1}{2}-\varepsilon}$-inapproximability bound. Additionally, we establish that Maximum Vertex-Disjoint Shortest Paths is fixed-parameter tractable when parameterized by $\ell$ but does not admit a polynomial kernel. Our hardness results hold for undirected graphs with unit weights, while our positive results extend to scenarios where the input graph is directed and features arbitrary (non-negative) edge weights.

Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).

北京阿比特科技有限公司