亚洲黄色网站不卡免费,精品久久男人的天堂亚洲,视频一区视频二区亚洲无码,午夜理论片最新午夜理论剧

For the convolutional neural network (CNN) used for pattern classification, the training loss function is usually applied to the final output of the network, except for some regularization constraints on the network parameters. However, with the increasing of the number of network layers, the influence of the loss function on the network front layers gradually decreases, and the network parameters tend to fall into local optimization. At the same time, it is found that the trained network has significant information redundancy at all stages of features, which reduces the effectiveness of feature mapping at all stages and is not conducive to the change of the subsequent parameters of the network in the direction of optimality. Therefore, it is possible to obtain a more optimized solution of the network and further improve the classification accuracy of the network by designing a loss function for restraining the front stage features and eliminating the information redundancy of the front stage features .For CNN, this article proposes a multi-stage feature decorrelation loss (MFD Loss), which refines effective features and eliminates information redundancy by constraining the correlation of features at all stages. Considering that there are many layers in CNN, through experimental comparison and analysis, MFD Loss acts on multiple front layers of CNN, constrains the output features of each layer and each channel, and performs supervision training jointly with classification loss function during network training. Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better. Meanwhile, the comparison experiments before and after the combination of MFD Loss and some other typical loss functions verify its good universality.

相關內容

Networking

關注 22

Networking：IFIP International Conferences on Networking。 Explanation：國際網絡會議。 Publisher：IFIP。 SIT：

binary · 可辨認的 · 規范化的 · 設計 · 優化器 ·

2024 年 2 月 15 日

Optimal Bayesian stepped-wedge cluster randomised trial designs for binary outcome data

Laura Etfer,James M. S. Wason,Michael J. Grayling

Under a generalised estimating equation analysis approach, approximate design theory is used to determine Bayesian D-optimal designs. For two examples, considering simple exchangeable and exponential decay correlation structures, we compare the efficiency of identified optimal designs to balanced stepped-wedge designs and corresponding stepped-wedge designs determined by optimising using a normal approximation approach. The dependence of the Bayesian D-optimal designs on the assumed correlation structure is explored; for the considered settings, smaller decay in the correlation between outcomes across time periods, along with larger values of the intra-cluster correlation, leads to designs closer to a balanced design being optimal. Unlike for normal data, it is shown that the optimal design need not be centro-symmetric in the binary outcome case. The efficiency of the Bayesian D-optimal design relative to a balanced design can be large, but situations are demonstrated in which the advantages are small. Similarly, the optimal design from a normal approximation approach is often not much less efficient than the Bayesian D-optimal design. Bayesian D-optimal designs can be readily identified for stepped-wedge cluster randomised trials with binary outcome data. In certain circumstances, principally ones with strong time period effects, they will indicate that a design unlikely to have been identified by previous methods may be substantially more efficient. However, they require a larger number of assumptions than existing optimal designs, and in many situations existing theory under a normal approximation will provide an easier means of identifying an efficient design for binary outcome data.

MoDELS · 判別器 · 對象識別 · 生成模型 · 推斷 ·

2024 年 2 月 14 日

Intriguing properties of generative classifiers

Priyank Jaini,Kevin Clark,Robert Geirhos

from arxiv, ICLR 2024 Spotlight

What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysical data. We report four intriguing emergent properties of generative classifiers: they show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors, and they understand certain perceptual illusions. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.

估計/估計量 · 可辨認的 · 統計量 · 邊緣化 · contrastive ·

2024 年 2 月 14 日

Nonparametric identification and efficient estimation of causal effects with instrumental variables

Alexander W. Levis,Edward H. Kennedy,Luke Keele

from arxiv, 46 pages, 2 figures

Instrumental variables are widely used in econometrics and epidemiology for identifying and estimating causal effects when an exposure of interest is confounded by unmeasured factors. Despite this popularity, the assumptions invoked to justify the use of instruments differ substantially across the literature. Similarly, statistical approaches for estimating the resulting causal quantities vary considerably, and often rely on strong parametric assumptions. In this work, we compile and organize structural conditions that nonparametrically identify conditional average treatment effects, average treatment effects among the treated, and local average treatment effects, with a focus on identification formulae invoking the conditional Wald estimand. Moreover, we build upon existing work and propose nonparametric efficient estimators of functionals corresponding to marginal and conditional causal contrasts resulting from the various identification paradigms. We illustrate the proposed methods on an observational study examining the effects of operative care on adverse events for cholecystitis patients, and a randomized trial assessing the effects of market participation on political views.

級聯 · 置信度 · 模型評估 · Networking · 縮放 ·

2024 年 2 月 14 日

Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration

Jiexin Wang,Jiahao Chen,Bing Su

Although deep neural networks yield high classification accuracy given sufficient training data, their predictions are typically overconfident or under-confident, i.e., the prediction confidences cannot truly reflect the accuracy. Post-hoc calibration tackles this problem by calibrating the prediction confidences without re-training the classification model. However, current approaches assume congruence between test and validation data distributions, limiting their applicability to out-of-distribution scenarios. To this end, we propose a novel meta-set-based cascaded temperature regression method for post-hoc calibration. Our method tailors fine-grained scaling functions to distinct test sets by simulating various domain shifts through data augmentation on the validation set. We partition each meta-set into subgroups based on predicted category and confidence level, capturing diverse uncertainties. A regression network is then trained to derive category-specific and confidence-level-specific scaling, achieving calibration across meta-sets. Extensive experimental results on MNIST, CIFAR-10, and TinyImageNet demonstrate the effectiveness of the proposed method.

估計/估計量 · 優化器 · 情景 · 損失 · 損失函數（機器學習） ·

2024 年 2 月 14 日

Multiple-output composite quantile regression through an optimal transport lens

Xuzhi Yang,Tengyao Wang

from arxiv, 44 pages, 8 figures

Composite quantile regression has been used to obtain robust estimators of regression coefficients in linear models with good statistical efficiency. By revealing an intrinsic link between the composite quantile regression loss function and the Wasserstein distance from the residuals to the set of quantiles, we establish a generalization of the composite quantile regression to the multiple-output settings. Theoretical convergence rates of the proposed estimator are derived both under the setting where the additive error possesses only a finite $\ell$-th moment (for $\ell > 2$) and where it exhibits a sub-Weibull tail. In doing so, we develop novel techniques for analyzing the M-estimation problem that involves Wasserstein-distance in the loss. Numerical studies confirm the practical effectiveness of our proposed procedure.

Processing（編程語言） · 簇 · MoDELS · 可辨認的 · 模型復雜度 ·

2024 年 2 月 14 日

Mesh-clustered Gaussian process emulator for partial differential equation boundary value problems

Chih-Li Sung,Wenjia Wang,Liang Ding,Xingjian Wang

Partial differential equations (PDEs) have become an essential tool for modeling complex physical systems. Such equations are typically solved numerically via mesh-based methods, such as finite element methods, with solutions over the spatial domain. However, obtaining these solutions are often prohibitively costly, limiting the feasibility of exploring parameters in PDEs. In this paper, we propose an efficient emulator that simultaneously predicts the solutions over the spatial domain, with theoretical justification of its uncertainty quantification. The novelty of the proposed method lies in the incorporation of the mesh node coordinates into the statistical model. In particular, the proposed method segments the mesh nodes into multiple clusters via a Dirichlet process prior and fits Gaussian process models with the same hyperparameters in each of them. Most importantly, by revealing the underlying clustering structures, the proposed method can provide valuable insights into qualitative features of the resulting dynamics that can be used to guide further investigations. Real examples are demonstrated to show that our proposed method has smaller prediction errors than its main competitors, with competitive computation time, and identifies interesting clusters of mesh nodes that possess physical significance, such as satisfying boundary conditions. An R package for the proposed methodology is provided in an open repository.

推斷 · 樣本 · MoDELS · 控制器 · 經驗池 ·

2024 年 2 月 13 日

On diffusion models for amortized inference: Benchmarking and improving stochastic control and sampling

Marcin Sendera,Minsu Kim,Sarthak Mittal,Pablo Lemos,Luca Scimeca,Jarrid Rector-Brooks,Alexandre Adam,Yoshua Bengio,Nikolay Malkin

from arxiv, 21 pages; code: //github.com/GFNOrg/gfn-diffusion

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at //github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.

成比例 · 統計量 · 錯誤率 · Weight · CASE ·

2024 年 2 月 13 日

A two-step approach for analyzing time to event data under non-proportional hazards

Jonas Brugger,Tim Friede,Florian Klinglmüller,Martin Posch,Robin Ristl,Franz K?nig

The log-rank test and the Cox proportional hazards model are commonly used to compare time-to-event data in clinical trials, as they are most powerful under proportional hazards. But there is a loss of power if this assumption is violated, which is the case for some new oncology drugs like immunotherapies. We consider a two-stage test procedure, in which the weighting of the log-rank test statistic depends on a pre-test of the proportional hazards assumption. I.e., depending on the pre-test either the log-rank or an alternative test is used to compare the survival probabilities. We show that if naively implemented this can lead to a substantial inflation of the type-I error rate. To address this, we embed the two-stage test in a permutation test framework to keep the nominal level alpha. We compare the operating characteristics of the two-stage test with the log-rank test and other tests by clinical trial simulations.

估計/估計量 · 模型評估 · 有向 · 優化器 · 相互獨立的 ·

2024 年 2 月 13 日

Covariance estimation with direction dependence accuracy

Pedro Abdalla,Shahar Mendelson

We construct an estimator $\widehat{\Sigma}$ for covariance matrices of unknown, centred random vectors X, with the given data consisting of N independent measurements $X_1,...,X_N$ of X and the wanted confidence level. We show under minimal assumptions on X, the estimator performs with the optimal accuracy with respect to the operator norm. In addition, the estimator is also optimal with respect to direction dependence accuracy: $\langle \widehat{\Sigma}u,u\rangle$ is an optimal estimator for $\sigma^2(u)=\mathbb{E}\langle X,u\rangle^2$ when $\sigma^2(u)$ is ``large".

Continuity · state-of-the-art · 學成 · Extensibility · Networking ·

2021 年 4 月 16 日

A continual learning survey: Defying forgetting in classification tasks

Matthias De Lange,Rahaf Aljundi,Marc Masana,Sarah Parisot,Xu Jia,Ales Leonardis,Gregory Slabaugh,Tinne Tuytelaars

from arxiv, Accepted TPAMI paper, including Appendix, code publicly available

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.