精品夜色国产国偷自产乱码_无遮挡又黄又刺激的免费视频_国产在线无码精品麻豆不卡_久久久亚洲精品无码一二_日韩高清一区二区三区五区七区_国产一级A爱片在线观看视_午夜福利视频免费

In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.

相關內容

Networking

關注 22

Networking：IFIP International Conferences on Networking。 Explanation：國際網絡會議。 Publisher：IFIP。 SIT：

噪聲 · SGD · 隨機梯度下降 · Analysis · 小批量隨機 ·

2024 年 2 月 1 日

A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent

Mingze Wang,Lei Wu

from arxiv, 30 pages

In this paper, we provide a theoretical study of noise geometry for minibatch stochastic gradient descent (SGD), a phenomenon where noise aligns favorably with the geometry of local landscape. We propose two metrics, derived from analyzing how noise influences the loss and subspace projection dynamics, to quantify the alignment strength. We show that for (over-parameterized) linear models and two-layer nonlinear networks, when measured by these metrics, the alignment can be provably guaranteed under conditions independent of the degree of over-parameterization. To showcase the utility of our noise geometry characterizations, we present a refined analysis of the mechanism by which SGD escapes from sharp minima. We reveal that unlike gradient descent (GD), which escapes along the sharpest directions, SGD tends to escape from flatter directions and cyclical learning rates can exploit this SGD characteristic to navigate more effectively towards flatter regions. Lastly, extensive experiments are provided to support our theoretical findings.

后向 · Learning · 估計/估計量 · 約束 · 前向 ·

2024 年 2 月 1 日

ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update

Liyuan Mao,Haoran Xu,Weinan Zhang,Xianyuan Zhan

from arxiv, Spotlight @ ICLR 2024, first two authors contribute equally

In this study, we investigate the DIstribution Correction Estimation (DICE) methods, an important line of work in offline reinforcement learning (RL) and imitation learning (IL). DICE-based methods impose state-action-level behavior constraint, which is an ideal choice for offline learning. However, they typically perform much worse than current state-of-the-art (SOTA) methods that solely use action-level behavior constraint. After revisiting DICE-based methods, we find there exist two gradient terms when learning the value function using true-gradient update: forward gradient (taken on the current state) and backward gradient (taken on the next state). Using forward gradient bears a large similarity to many offline RL methods, and thus can be regarded as applying action-level constraint. However, directly adding the backward gradient may degenerate or cancel out its effect if these two gradients have conflicting directions. To resolve this issue, we propose a simple yet effective modification that projects the backward gradient onto the normal plane of the forward gradient, resulting in an orthogonal-gradient update, a new learning rule for DICE-based methods. We conduct thorough theoretical analyses and find that the projected backward gradient brings state-level behavior regularization, which reveals the mystery of DICE-based methods: the value learning objective does try to impose state-action-level constraint, but needs to be used in a corrected way. Through toy examples and extensive experiments on complex offline RL and IL tasks, we demonstrate that DICE-based methods using orthogonal-gradient updates (O-DICE) achieve SOTA performance and great robustness.

INFORMS · 可辨認的 · 數學 · Notability · TOOLS ·

2024 年 1 月 31 日

Model-Theoretic Logic for Mathematical Theory of Semantic Information and Communication

Ahmet Faruk Saz,Siheng Xiong,Yashas Malur Saidutta,Faramarz Fekri

In this paper, we propose an advancement to Tarskian model-theoretic semantics, leading to a unified quantitative theory of semantic information and communication. We start with description of inductive logic and probabilities, which serve as notable tools in development of the proposed theory. Then, we identify two disparate kinds of uncertainty in semantic communication, that of physical and content, present refined interpretations of semantic information measures, and conclude with proposing a new measure for semantic content-information and entropy. Our proposition standardizes semantic information across different universes and systems, hence bringing measurability and comparability into semantic communication. We then proceed with introducing conditional and mutual semantic cont-information measures and point out to their utility in formulating practical and optimizable lossless and lossy semantic compression objectives. Finally, we experimentally demonstrate the value of our theoretical propositions.

Minimax · Networking · Neural Networks · 優化器 · 泛函 ·

2024 年 1 月 30 日

Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes

Hyunouk Ko,Xiaoming Huo

In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impose explicit smoothness assumptions on the regression function, our framework encompasses more general settings. The proposed neural networks are either the minimizers of the logistic loss or the $0$-$1$ loss. In the former case, they are interpolating classifiers that exhibit a benign overfitting behavior.

縮放 · MoDELS · 蒸餾 · SR · U-Net ·

2024 年 1 月 30 日

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

Mehdi Noroozi,Isma Hadji,Brais Martinez,Adrian Bulat,Georgios Tzimiropoulos

In this paper, we introduce YONOS-SR, a novel stable diffusion-based approach for image super-resolution that yields state-of-the-art results using only a single DDIM step. We propose a novel scale distillation approach to train our SR model. Instead of directly training our SR model on the scale factor of interest, we start by training a teacher model on a smaller magnification scale, thereby making the SR problem simpler for the teacher. We then train a student model for a higher magnification scale, using the predictions of the teacher as a target during the training. This process is repeated iteratively until we reach the target scale factor of the final model. The rationale behind our scale distillation is that the teacher aids the student diffusion model training by i) providing a target adapted to the current noise level rather than using the same target coming from ground truth data for all noise levels and ii) providing an accurate target as the teacher has a simpler task to solve. We empirically show that the distilled model significantly outperforms the model trained for high scales directly, specifically with few steps during inference. Having a strong diffusion model that requires only one step allows us to freeze the U-Net and fine-tune the decoder on top of it. We show that the combination of spatially distilled U-Net and fine-tuned decoder outperforms state-of-the-art methods requiring 200 steps with only one single step.

INFORMS · Analysis · Markov · 馬爾可夫鏈 · 平穩分布 ·

2024 年 1 月 30 日

Age of Actuated Information and Age of Actuation in a Data-Caching Energy Harvesting Actuator

Ali Nikkhah,Anthony Ephremides,Nikolaos Pappas

In this paper, we introduce two metrics, namely, age of actuation (AoA) and age of actuated information (AoAI), within a discrete-time system model that integrates data caching and energy harvesting (EH). AoA evaluates the timeliness of actions irrespective of the age of the information, while AoAI considers the freshness of the utilized data packet. We use Markov Chain analysis to model the system's evolution. Furthermore, we employ three-dimensional Markov Chain analysis to characterize the stationary distributions for AoA and AoAI and calculate their average values. Our findings from the analysis, validated by simulations, show that while AoAI consistently decreases with increased data and energy packet arrival rates, AoA presents a more complex behavior, with potential increases under conditions of limited data or energy resources. These metrics go towards the semantics of information and goal-oriented communications since they consider the timeliness of utilizing the information to perform an action.

估計/估計量 · 卡爾曼濾波 · Analysis · 穩健性 · 優化器 ·

2024 年 1 月 30 日

Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization

Kihoon Shin,Hyunjae Sim,Seungwon Nam,Yonghee Kim,Jae Hu,Kwang-Ki K. Kim

from arxiv, 20 pages, 21 figures

In this paper, we consider multi-robot localization problems with focus on cooperative localization and observability analysis of relative pose estimation. For cooperative localization, there is extra information available to each robot via communication network and message passing. If odometry data of a target robot can be transmitted to the ego-robot then the observability of their relative pose estimation can be achieved by range-only or bearing-only measurements provided both of their linear velocities are non-zero. If odometry data of a target robot is not directly transmitted but estimated by the ego-robot then there must be both range and bearing measurements to guarantee the observability of relative pose estimation. For ROS/Gazebo simulations, we consider four different sensing and communication structures in which extended Kalman filtering (EKF) and pose graph optimization (PGO) estimation with different robust loss functions (filtering and smoothing with different batch sizes of sliding window) are compared in terms of estimation accuracy. For hardware experiments, two Turtlebot3 equipped with UWB modules are used for real-world inter-robot relative pose estimation, in which both EKF and PGO are applied and compared.

Weight · binary · 線性的 · 解碼 · 優化器 ·

2024 年 1 月 30 日

A Family of Low-Complexity Binary Codes with Constant Hamming Weights

Birenjith Sasidharan,Emanuele Viterbo,Son Hoang Dau

from arxiv, Submitted to Designs, Codes and Cryptography

In this paper, we focus on the design of binary constant-weight codes that admit low-complexity encoding and decoding algorithms, and that have size as a power of $2$. We construct a family of $(n=2^\ell, M=2^k, d=2)$ constant-weight codes ${\cal C}[\ell, r]$ parameterized by integers $\ell \geq 3$ and $1 \leq r \leq \lfloor \frac{\ell+3}{4} \rfloor$, by encoding information in the gaps between successive $1$'s of a vector. The code has weight $w = \ell$ and combinatorial dimension $k$ that scales quadratically with $\ell$. The encoding time is linear in the input size $k$, and the decoding time is poly-logarithmic in the input size $n$, discounting the linear time spent on parsing the input. Encoding and decoding algorithms of similar codes known in either information-theoretic or combinatorial literature require computation of large number of binomial coefficients. Our algorithms fully eliminate the need to evaluate binomial coefficients. While the code has a natural price to pay in $k$, it performs fairly well against the information-theoretic upper bound $\lfloor \log_2 {n \choose w} \rfloor$. When $\ell =3$, the code is optimal achieving the upper bound; when $\ell=4$, it is one bit away from the upper bound, and as $\ell$ grows it is order-optimal in the sense that the ratio of $k$ with its upper bound becomes a constant $\frac{11}{16}$ when $r=\lfloor \frac{\ell+3}{4} \rfloor$. With the same or even lower complexity, we derive new codes permitting a wider range of parameters by modifying ${\cal C}[\ell, r]$ in two different ways. The code derived using the first approach has the same blocklength $n=2^\ell$, but weight $w$ is allowed to vary from $\ell-1$ to $1$. In the second approach, the weight remains fixed as $w = \ell$, but the blocklength is reduced to $n=2^\ell - 2^r +1$. For certain selected values of parameters, these modified codes have an optimal $k$.

特化 · 逼真度 · 平滑 · 正則化項 · 可約的 ·

2024 年 1 月 29 日

Refined Inverse Rigging: A Balanced Approach to High-fidelity Blendshape Animation

Stevo Rackovi?,Cláudia Soares,Du?an Jakoveti?

In this paper, we present an advanced approach to solving the inverse rig problem in blendshape animation, using high-quality corrective blendshapes. Our algorithm introduces novel enhancements in three key areas: ensuring high data fidelity in reconstructed meshes, achieving greater sparsity in weight distributions, and facilitating smoother frame-to-frame transitions. While the incorporation of corrective terms is a known practice, our method differentiates itself by employing a unique combination of $l_1$ norm regularization for sparsity and a temporal smoothness constraint through roughness penalty, focusing on the sum of second differences in consecutive frame weights. A significant innovation in our approach is the temporal decoupling of blendshapes, which permits simultaneous optimization across entire animation sequences. This feature sets our work apart from existing methods and contributes to a more efficient and effective solution. Our algorithm exhibits a marked improvement in maintaining data fidelity and ensuring smooth frame transitions when compared to prior approaches that either lack smoothness regularization or rely solely on linear blendshape models. In addition to superior mesh resemblance and smoothness, our method offers practical benefits, including reduced computational complexity and execution time, achieved through a novel parallelization strategy using clustering methods. Our results not only advance the state of the art in terms of fidelity, sparsity, and smoothness in inverse rigging but also introduce significant efficiency improvements. The source code will be made available upon acceptance of the paper.

注意力機制 · 學成 · 端到端 · Networking · 損失函數（機器學習） ·

2018 年 3 月 28 日

End-to-End Multi-Task Learning with Attention

Shikun Liu,Edward Johns,Andrew J. Davison

from arxiv, submitted to ECCV 2018

In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.