亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in a unified framework. To address this gap and promote the development of robust and immersive modeling and rendering with consumer-grade devices, we propose a real-world Multi-Sensor Hybrid Room Dataset (MuSHRoom). Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices, and can jointly learn 3D reconstruction and novel view synthesis instead of treating them as separate tasks, making them ideal for real-world applications. We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis. Our dataset and benchmark show great potential in promoting the improvements for fusing 3D reconstruction and high-quality rendering in a robust and computationally efficient end-to-end fashion. The dataset and code are available at the project website: //xuqianren.github.io/publications/MuSHRoom/.

相關內容

在計(ji)(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)(ji)視(shi)(shi)(shi)覺中(zhong), 三(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)建(jian)是(shi)指根(gen)據單(dan)視(shi)(shi)(shi)圖或者(zhe)多(duo)視(shi)(shi)(shi)圖的(de)(de)圖像重(zhong)(zhong)建(jian)三(san)(san)維(wei)(wei)(wei)(wei)(wei)信(xin)息(xi)的(de)(de)過(guo)程. 由(you)于單(dan)視(shi)(shi)(shi)頻的(de)(de)信(xin)息(xi)不(bu)完(wan)全(quan),因此三(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)建(jian)需(xu)要利(li)用(yong)經驗知(zhi)識. 而多(duo)視(shi)(shi)(shi)圖的(de)(de)三(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)建(jian)(類似人的(de)(de)雙目定(ding)(ding)位(wei))相對比較容易, 其方法(fa)是(shi)先對攝像機(ji)(ji)(ji)進行標(biao)定(ding)(ding), 即計(ji)(ji)(ji)(ji)算(suan)(suan)出(chu)攝像機(ji)(ji)(ji)的(de)(de)圖象(xiang)坐(zuo)(zuo)標(biao)系與世界(jie)坐(zuo)(zuo)標(biao)系的(de)(de)關(guan)系.然(ran)后(hou)利(li)用(yong)多(duo)個二(er)維(wei)(wei)(wei)(wei)(wei)圖象(xiang)中(zhong)的(de)(de)信(xin)息(xi)重(zhong)(zhong)建(jian)出(chu)三(san)(san)維(wei)(wei)(wei)(wei)(wei)信(xin)息(xi)。 物體三(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)建(jian)是(shi)計(ji)(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)(ji)輔助(zhu)幾(ji)(ji)何設計(ji)(ji)(ji)(ji)(CAGD)、計(ji)(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)(ji)圖形學(CG)、計(ji)(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)(ji)動畫、計(ji)(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)(ji)視(shi)(shi)(shi)覺、醫學圖像處(chu)理(li)、科學計(ji)(ji)(ji)(ji)算(suan)(suan)和(he)(he)虛擬(ni)現(xian)實、數(shu)(shu)字媒體創作(zuo)等領域(yu)的(de)(de)共性(xing)科學問題和(he)(he)核(he)心技術。在計(ji)(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)(ji)內生成物體三(san)(san)維(wei)(wei)(wei)(wei)(wei)表(biao)示主(zhu)要有兩(liang)類方法(fa)。一(yi)類是(shi)使(shi)(shi)用(yong)幾(ji)(ji)何建(jian)模軟件通過(guo)人機(ji)(ji)(ji)交(jiao)互生成人為(wei)控制下的(de)(de)物體三(san)(san)維(wei)(wei)(wei)(wei)(wei)幾(ji)(ji)何模型,另一(yi)類是(shi)通過(guo)一(yi)定(ding)(ding)的(de)(de)手段獲取(qu)真實物體的(de)(de)幾(ji)(ji)何形狀(zhuang)。前者(zhe)實現(xian)技術已經十分(fen)成熟,現(xian)有若(ruo)干軟件支(zhi)持,比如:3DMAX、Maya、AutoCAD、UG等等,它(ta)們一(yi)般(ban)使(shi)(shi)用(yong)具有數(shu)(shu)學表(biao)達式(shi)的(de)(de)曲線曲面表(biao)示幾(ji)(ji)何形狀(zhuang)。后(hou)者(zhe)一(yi)般(ban)稱(cheng)為(wei)三(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)建(jian)過(guo)程,三(san)(san)維(wei)(wei)(wei)(wei)(wei)重(zhong)(zhong)建(jian)是(shi)指利(li)用(yong)二(er)維(wei)(wei)(wei)(wei)(wei)投(tou)影恢復物體三(san)(san)維(wei)(wei)(wei)(wei)(wei)信(xin)息(xi)(形狀(zhuang)等)的(de)(de)數(shu)(shu)學過(guo)程和(he)(he)計(ji)(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)(ji)技術,包(bao)括數(shu)(shu)據獲取(qu)、預處(chu)理(li)、點(dian)云拼接(jie)和(he)(he)特(te)征分(fen)析(xi)等步(bu)驟。

Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, we present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition. HiH features a main branch that utilizes Hierarchical Gait Decomposer (HGD) modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data. This approach captures motion hierarchies from overall body dynamics to detailed limb movements, facilitating the representation of gait attributes across multiple spatial resolutions. Complementing this, an auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis. It employs a Deformable Spatial Enhancement (DSE) module for pose-guided spatial attention and a Deformable Temporal Alignment (DTA) module for aligning motion dynamics through learned temporal offsets. Extensive evaluations across diverse indoor and outdoor datasets demonstrate HiH's state-of-the-art performance, affirming a well-balanced trade-off between accuracy and efficiency.

In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in //asam2024.github.io/.

While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97\%$).

News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article. Though Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in addressing various vision-language tasks, our research finds that current MLLMs still bear limitations in handling entity information on news image captioning task. Besides, while MLLMs have the ability to process long inputs, generating high-quality news image captions still requires a trade-off between sufficiency and conciseness of textual input information. To explore the potential of MLLMs and address problems we discovered, we propose : an Entity-Aware Multimodal Alignment based approach for news image captioning. Our approach first aligns the MLLM through Balance Training Strategy with two extra alignment tasks: Entity-Aware Sentence Selection task and Entity Selection task, together with News Image Captioning task, to enhance its capability in handling multimodal entity information. The aligned MLLM will utilizes the additional entity-related information it explicitly extracts to supplement its textual input while generating news image captions. Our approach achieves better results than all previous models in CIDEr score on GoodNews dataset (72.33 -> 88.39) and NYTimes800k dataset (70.83 -> 85.61).

Beyond scaling base models with more data or parameters, fine-tuned adapters provide an alternative way to generate high fidelity, custom images at reduced costs. As such, adapters have been widely adopted by open-source communities, accumulating a database of over 100K adapters-most of which are highly customized with insufficient descriptions. This paper explores the problem of matching the prompt to a set of relevant adapters, built on recent work that highlight the performance gains of composing adapters. We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt's keywords. Stylus outlines a three-stage approach that first summarizes adapters with improved descriptions and embeddings, retrieves relevant adapters, and then further assembles adapters based on prompts' keywords by checking how well they fit the prompt. To evaluate Stylus, we developed StylusDocs, a curated dataset featuring 75K adapters with pre-computed adapter embeddings. In our evaluation on popular Stable Diffusion checkpoints, Stylus achieves greater CLIP-FID Pareto efficiency and is twice as preferred, with humans and multimodal models as evaluators, over the base model. See stylus-diffusion.github.io for more.

The integration of brain-computer interfaces (BCIs) into the realm of smart wheelchair (SW) technology signifies a notable leap forward in enhancing the mobility and autonomy of individuals with physical disabilities. BCIs are a technology that enables direct communication between the brain and external devices. While BCIs systems offer remarkable opportunities for enhancing human-computer interaction and providing mobility solutions for individuals with disabilities, they also raise significant concerns regarding security, safety, and privacy that have not been thoroughly addressed by researchers on a large scale. Our research aims to enhance wheelchair control for individuals with physical disabilities by leveraging electroencephalography (EEG) signals for BCIs. We introduce a non-invasive BCI system that utilizes a neuro-signal acquisition headset to capture EEG signals. These signals are obtained from specific brain activities that individuals have been trained to produce, allowing for precise control of the wheelchair. EEG-based BCIs are instrumental in capturing the brain's electrical activity and translating these signals into actionable commands. The primary objective of our study is to demonstrate the system's capability to interpret EEG signals and decode specific thought patterns or mental commands issued by the user. By doing so, it aims to convert these into accurate control commands for the wheelchair. This process includes the recognition of navigational intentions, such as moving forward, backward, or executing turns, specifically tailored for wheelchair operation. Through this innovative approach, we aim to create a seamless interface between the user's cognitive intentions and the wheelchair's movements, enhancing autonomy and mobility for individuals with physical disabilities.

Taking over arbitrary tasks like humans do with a mobile service robot in open-world settings requires a holistic scene perception for decision-making and high-level control. This paper presents a human-inspired scene perception model to minimize the gap between human and robotic capabilities. The approach takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation. A recognition system splits the background and foreground to integrate exchangeable image-based object detectors and SLAM, a multi-layer knowledge base represents scene information in a hierarchical structure and offers interfaces for high-level control, and knowledge interpretation methods deploy spatio-temporal scene analysis and perceptual learning for self-adjustment. A single-setting ablation study is used to evaluate the impact of each component on the overall performance for a fetch-and-carry scenario in two simulated and one real-world environment.

Current recommendation systems are significantly affected by a serious issue of temporal data shift, which is the inconsistency between the distribution of historical data and that of online data. Most existing models focus on utilizing updated data, overlooking the transferable, temporal data shift-free information that can be learned from shifting data. We propose the Temporal Invariance of Association theorem, which suggests that given a fixed search space, the relationship between the data and the data in the search space keeps invariant over time. Leveraging this principle, we designed a retrieval-based recommendation system framework that can train a data shift-free relevance network using shifting data, significantly enhancing the predictive performance of the original model in the recommendation system. However, retrieval-based recommendation models face substantial inference time costs when deployed online. To address this, we further designed a distill framework that can distill information from the relevance network into a parameterized module using shifting data. The distilled model can be deployed online alongside the original model, with only a minimal increase in inference time. Extensive experiments on multiple real datasets demonstrate that our framework significantly improves the performance of the original model by utilizing shifting data.

This paper presents a solution to address carbon emission mitigation for end-to-end edge computing systems, including the computing at battery-powered edge devices and servers, as well as the communications between them. We design and implement, CarbonCP, a context-adaptive, carbon-aware, and uncertainty-aware AI inference framework built upon conformal prediction theory, which balances operational carbon emissions, end-to-end latency, and battery consumption of edge devices through DNN partitioning under varying system processing contexts and carbon intensity. Our experimental results demonstrate that CarbonCP is effective in substantially reducing operational carbon emissions, up to 58.8%, while maintaining key user-centric performance metrics with only 9.9% error rate.

Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments.

北京阿比特科技有限公司