Studying neural network loss landscapes provides insights into the nature of the underlying optimization problems. Unfortunately, loss landscapes are notoriously difficult to visualize in a human-comprehensible fashion. One common way to address this problem is to plot linear slices of the landscape, for example from the initial state of the network to the final state after optimization. On the basis of this analysis, prior work has drawn broader conclusions about the difficulty of the optimization problem. In this paper, we put inferences of this kind to the test, systematically evaluating how linear interpolation and final performance vary when altering the data, choice of initialization, and other optimizer and architecture design choices. Further, we use linear interpolation to study the role played by individual layers and substructures of the network. We find that certain layers are more sensitive to the choice of initialization, but that the shape of the linear path is not indicative of the changes in test accuracy of the model. Our results cast doubt on the broader intuition that the presence or absence of barriers when interpolating necessarily relates to the success of optimization.
The basic goal of survivable network design is to build cheap networks that guarantee the connectivity of certain pairs of nodes despite the failure of a few edges or nodes. A celebrated result by Jain [Combinatorica'01] provides a 2-approximation for a wide class of these problems. However nothing better is known even for very basic special cases, raising the natural question whether any improved approximation factor is possible at all. In this paper we address one of the most basic problems in this family for which 2 is still the best-known approximation factor, the Forest Augmentation Problem (FAP): given an undirected unweighted graph (that w.l.o.g. is a forest) and a collection of extra edges (links), compute a minimum cardinality subset of links whose addition to the graph makes it 2-edge-connected. Several better-than-2 approximation algorithms are known for the special case where the input graph is a tree, a.k.a. the Tree Augmentation Problem (TAP). Recently this was achieved also for the weighted version of TAP, and for the k-edge-connectivity generalization of TAP. These results heavily exploit the fact that the input graph is connected, a condition that does not hold in FAP. In this paper we breach the 2-approximation barrier for FAP. Our result is based on two main ingredients. First, we describe a reduction to the Path Augmentation Problem (PAP), the special case of FAP where the input graph is a collection of disjoint paths. Our reduction is not approximation preserving, however it is sufficiently accurate to improve on a factor 2 approximation. Second, we present a better-than-2 approximation algorithm for PAP, an open problem on its own. Here we exploit a novel notion of implicit credits which might turn out to be helpful in future related work.
Quantum communications is a promising technology that will play a fundamental role in the design of future networks. In fact, significant efforts are being dedicated by both the quantum physics and the classical communications communities on developing new architectures, solutions, and practical implementations of quantum communication networks (QCNs). Although these efforts led to various advances in today's technologies, there still exists a non-trivial gap between the research efforts of the two communities on designing and optimizing the QCN performance. For instance, most prior works by the classical communications community ignore important quantum physics-based constraints when designing QCNs. For example, many works on entanglement distribution do not account for the decoherence of qubits inside quantum memories and, thus, their designs become impractical since they assume an infinite quantum states' lifetime. In this paper, we introduce a novel framework, dubbed physics-informed QCNs, for designing and analyzing the performance of QCNs, by relying on the quantum physics principles that underly the different QCN components. The need of the proposed approach is then assessed and its fundamental role in designing practical QCNs is analyzed across various open research areas. Moreover, we identify novel physics-informed performance metrics and controls that enable QCNs to leverage the state-of-the-art advancements in quantum technologies to enhance their performance. Finally, we analyze multiple pressing challenges and open research directions in QCNs that must be treated using a physics-informed approach to lead practically viable results. Ultimately, this work attempts to bridge the gap between the classical communications and the quantum physics communities in the area of QCNs to foster the development of future communication networks (6G and beyond, and the quantum Internet).
The number of information systems (IS) studies dealing with explainable artificial intelligence (XAI) is currently exploding as the field demands more transparency about the internal decision logic of machine learning (ML) models. However, most techniques subsumed under XAI provide post-hoc-analytical explanations, which have to be considered with caution as they only use approximations of the underlying ML model. Therefore, our paper investigates a series of intrinsically interpretable ML models and discusses their suitability for the IS community. More specifically, our focus is on advanced extensions of generalized additive models (GAM) in which predictors are modeled independently in a non-linear way to generate shape functions that can capture arbitrary patterns but remain fully interpretable. In our study, we evaluate the prediction qualities of five GAMs as compared to six traditional ML models and assess their visual outputs for model interpretability. On this basis, we investigate their merits and limitations and derive design implications for further improvements.
Conductivity imaging represents one of the most important tasks in medical imaging. In this work we develop a neural network based reconstruction technique for imaging the conductivity from the magnitude of the internal current density. It is achieved by formulating the problem as a relaxed weighted least-gradient problem, and then approximating its minimizer by standard fully connected feedforward neural networks. We derive bounds on two components of the generalization error, i.e., approximation error and statistical error, explicitly in terms of properties of the neural networks (e.g., depth, total number of parameters, and the bound of the network parameters). We illustrate the performance and distinct features of the approach on several numerical experiments. Numerically, it is observed that the approach enjoys remarkable robustness with respect to the presence of data noise.
Momentum methods, including heavy-ball~(HB) and Nesterov's accelerated gradient~(NAG), are widely used in training neural networks for their fast convergence. However, there is a lack of theoretical guarantees for their convergence and acceleration since the optimization landscape of the neural network is non-convex. Nowadays, some works make progress towards understanding the convergence of momentum methods in an over-parameterized regime, where the number of the parameters exceeds that of the training instances. Nonetheless, current results mainly focus on the two-layer neural network, which are far from explaining the remarkable success of the momentum methods in training deep neural networks. Motivated by this, we investigate the convergence of NAG with constant learning rate and momentum parameter in training two architectures of deep linear networks: deep fully-connected linear neural networks and deep linear ResNets. Based on the over-parameterization regime, we first analyze the residual dynamics induced by the training trajectory of NAG for a deep fully-connected linear neural network under the random Gaussian initialization. Our results show that NAG can converge to the global minimum at a $(1 - \mathcal{O}(1/\sqrt{\kappa}))^t$ rate, where $t$ is the iteration number and $\kappa > 1$ is a constant depending on the condition number of the feature matrix. Compared to the $(1 - \mathcal{O}(1/{\kappa}))^t$ rate of GD, NAG achieves an acceleration over GD. To the best of our knowledge, this is the first theoretical guarantee for the convergence of NAG to the global minimum in training deep neural networks. Furthermore, we extend our analysis to deep linear ResNets and derive a similar convergence result.
Most existing works of polar codes focus on the analysis of block error probability. However, in many scenarios, bit error probability is also important for evaluating the performance of channel codes. In this paper, we establish a new framework to analyze the bit error probability of polar codes. Specifically, by revisiting the error event of bit-channel, we first introduce the conditional bit error probability as a metric to evaluate the reliability of bit-channel for both systematic and non-systematic polar codes. Guided by the concept of polar subcode, we then derive an upper bound on the conditional bit error probability of each bit-channel, and accordingly, an upper bound on the bit error probability of polar codes. Based on these, two types of construction metrics aiming at minimizing the bit error probability of polar codes are proposed, which are of linear computational complexity and explicit forms. Simulation results show that the polar codes constructed by the proposed methods can outperform those constructed by the conventional methods.
Factor analysis is often used to assess whether a single univariate latent variable is sufficient to explain most of the covariance among a set of indicators for some underlying construct. When evidence suggests that a single factor is adequate, research often proceeds by using a univariate summary of the indicators in subsequent research. Implicit in such practices is the assumption that it is the underlying latent, rather than the indicators, that is causally efficacious. The assumption that the indicators do not have effects on anything subsequent, and that they are themselves only affected by antecedents through the underlying latent is a strong assumption, effectively imposing a structural interpretation on the latent factor model. In this paper, we show that this structural assumption has empirically testable implications, even though the latent variable itself is unobserved. We develop a statistical test to potentially reject the structural interpretation of a latent factor model. We apply this test to data concerning associations between the Satisfaction-with-Life-Scale and subsequent all-cause mortality, which provides strong evidence against a structural interpretation for a univariate latent underlying the scale. Discussion is given to the implications of this result for the development, evaluation, and use of measures and for the use of factor analysis itself.
It is shown, with two sets of indicators that separately load on two distinct factors, independent of one another conditional on the past, that if it is the case that at least one of the factors causally affects the other, then, in many settings, the process will converge to a factor model in which a single factor will suffice to capture the covariance structure among the indicators. Factor analysis with one wave of data can then not distinguish between factor models with a single factor versus those with two factors that are causally related. Therefore, unless causal relations between factors can be ruled out a priori, alleged empirical evidence from one-wave factor analysis for a single factor still leaves open the possibilities of a single factor or of two factors that causally affect one another. The implications for interpreting the factor structure of psychological scales, such as self-report scales for anxiety and depression, or for happiness and purpose, are discussed. The results are further illustrated through simulations to gain insight into the practical implications of the results in more realistic settings prior to the convergence of the processes. Some further generalizations to an arbitrary number of underlying factors are noted.
As machine learning is increasingly applied to high-impact, high-risk domains, there have been a number of new methods aimed at making AI models more human interpretable. Despite the recent growth of interpretability work, there is a lack of systematic evaluation of proposed techniques. In this work, we propose HIVE (Human Interpretability of Visual Explanations), a novel human evaluation framework for visual interpretability methods that allows for falsifiable hypothesis testing, cross-method comparison, and human-centered evaluation. To the best of our knowledge, this is the first work of its kind. Using HIVE, we conduct IRB-approved human studies with nearly 1000 participants and evaluate four methods that represent the diversity of computer vision interpretability works: GradCAM, BagNet, ProtoPNet, and ProtoTree. Our results suggest that explanations engender human trust, even for incorrect predictions, yet are not distinct enough for users to distinguish between correct and incorrect predictions. We open-source HIVE to enable future studies and to encourage more human-centered approaches to interpretability research.
We present a pipelined multiplier with reduced activities and minimized interconnect based on online digit-serial arithmetic. The working precision has been truncated such that $p<n$ bits are used to compute $n$ bits product, resulting in significant savings in area and power. The digit slices follow variable precision according to input, increasing upto $p$ and then decreases according to the error profile. Pipelining has been done to achieve high throughput and low latency which is desirable for compute intensive inner products. Synthesis results of the proposed designs have been presented and compared with the non-pipelined online multiplier, pipelined online multiplier with full working precision and conventional serial-parallel and array multipliers. For $8, 16, 24$ and $32$ bit precision, the proposed low power pipelined design show upto $38\%$ and $44\%$ reduction in power and area respectively compared to the pipelined online multiplier without working precision truncation.