亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we introduce a novel approach for generating texture images of infinite resolutions using Generative Adversarial Networks (GANs) based on a patch-by-patch paradigm. Existing texture synthesis techniques often rely on generating a large-scale texture using a one-forward pass to the generating model, this limits the scalability and flexibility of the generated images. In contrast, the proposed approach trains GANs models on a single texture image to generate relatively small patches that are locally correlated and can be seamlessly concatenated to form a larger image while using a constant GPU memory footprint. Our method learns the local texture structure and is able to generate arbitrary-size textures, while also maintaining coherence and diversity. The proposed method relies on local padding in the generator to ensure consistency between patches and utilizes spatial stochastic modulation to allow for local variations and diversity within the large-scale image. Experimental results demonstrate superior scalability compared to existing approaches while maintaining visual coherence of generated textures.

相關內容

In this paper, we present a novel training approach called the Homotopy Relaxation Training Algorithm (HRTA), aimed at accelerating the training process in contrast to traditional methods. Our algorithm incorporates two key mechanisms: one involves building a homotopy activation function that seamlessly connects the linear activation function with the ReLU activation function; the other technique entails relaxing the homotopy parameter to enhance the training refinement process. We have conducted an in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates. Our experimental results, especially when considering networks with larger widths, validate the theoretical conclusions. This proposed HRTA exhibits the potential for other activation functions and deep neural networks.

In this paper we propose a novel approach aimed at recovering the 3D position of an UUV from UAV imagery in shallow-water environments. Through combination of UAV and UUV measurements, we show that our method can be utilized as an accurate and cost-effective alternative when compared to acoustic sensing methods, typically required to obtain ground truth information in underwater localization problems. Furthermore, our approach allows for a seamless conversion to geo-referenced coordinates which can be utilized for navigation purposes. To validate our method, we present the results with data collected through a simulation environment and field experiments, demonstrating the ability to successfully recover the UUV position with sub-meter accuracy.

In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image's content. This approach divides the image into irregular, semantically coherent regions, effectively capturing intricate details and applicable at both initial and intermediate feature levels. SPFormer, trainable end-to-end, exhibits superior performance across various benchmarks. Notably, it exhibits significant improvements on the challenging ImageNet benchmark, achieving a 1.4% increase over DeiT-T and 1.1% over DeiT-S respectively. A standout feature of SPFormer is its inherent explainability. The superpixel structure offers a window into the model's internal processes, providing valuable insights that enhance the model's interpretability. This level of clarity significantly improves SPFormer's robustness, particularly in challenging scenarios such as image rotations and occlusions, demonstrating its adaptability and resilience.

This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while faithfully preserving the distinct visual characteristics of the reference exemplar. Our method leverages a pre-trained diffusion network, and uses self-attention mechanisms, to gradually align the synthesized texture with the reference, ensuring the retention of the structures in the provided target. Through experimental validation, our approach exhibits exceptional proficiency in handling non-stationary textures, demonstrating significant advancements in texture synthesis when compared to existing state-of-the-art techniques. Code is available at //github.com/xiaorongjun000/Self-Rectification

This paper presents the design and implementation of a Right Invariant Extended Kalman Filter (RIEKF) for estimating the states of the kinematic base of the Surena V humanoid robot. The state representation of the robot is defined on the Lie group $SE_4(3)$, encompassing the position, velocity, and orientation of the base, as well as the position of the left and right feet. In addition, we incorporated IMU biases as concatenated states within the filter. The prediction step of the RIEKF utilizes IMU equations, while the update step incorporates forward kinematics. To evaluate the performance of the RIEKF, we conducted experiments using the Choreonoid dynamic simulation framework and compared it against a Quaternion-based Extended Kalman Filter (QEKF). The results of the analysis demonstrate that the RIEKF exhibits reduced drift in localization and achieves estimation convergence in a shorter time compared to the QEKF. These findings highlight the effectiveness of the proposed RIEKF for accurate state estimation of the kinematic base in humanoid robotics.

In this paper, we provide a novel enumeration algorithm for the set of all walks of a given length within a directed graph. Our algorithm has worst-case constant delay between outputting succinct representations of such walks, after a preprocessing step requiring linear time relative to the size of the graph. We apply these results to the problem of enumerating succinct representations of the strings of a given length from a prefix-closed regular language (languages accepted by a finite automaton which has final states only).

In this paper, we propose a novel high-dimensional time-varying coefficient estimator for noisy high-frequency observations. In high-frequency finance, we often observe that noises dominate a signal of an underlying true process. Thus, we cannot apply usual regression procedures to analyze noisy high-frequency observations. To handle this issue, we first employ a smoothing method for the observed variables. However, the smoothed variables still contain non-negligible noises. To manage these non-negligible noises and the high dimensionality, we propose a nonconvex penalized regression method for each local coefficient. This method produces consistent but biased local coefficient estimators. To estimate the integrated coefficients, we propose a debiasing scheme and obtain a debiased integrated coefficient estimator using debiased local coefficient estimators. Then, to further account for the sparsity structure of the coefficients, we apply a thresholding scheme to the debiased integrated coefficient estimator. We call this scheme the Thresholded dEbiased Nonconvex LASSO (TEN-LASSO) estimator. Furthermore, this paper establishes the concentration properties of the TEN-LASSO estimator and discusses a nonconvex optimization algorithm.

In this paper, we introduce $\textbf{GS-SLAM}$ that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. The source code will be released soon.

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.

北京阿比特科技有限公司