In analyzing large scale structures it is necessary to take into account the fine scale heterogeneity for accurate failure prediction. Resolving fine scale features in the numerical model drastically increases the number of degrees of freedom, thus making full fine-scale simulations infeasible, especially in cases where the model needs to be evaluated many times. In this paper, a methodology for fine scale modeling of large scale structures is proposed, which combines the variational multiscale method, domain decomposition and model order reduction. To address applications where the assumption of scale separation does not hold, the influence of the fine scale on the coarse scale is modelled directly by the use of an additive split of the displacement field. Possible coarse and fine scale solutions are exploited for a representative coarse grid element (RCE) to construct local approximation spaces. The local spaces are designed such that local contributions of RCE subdomains can be coupled in a conforming way. Therefore, the resulting global system of equations takes the effect of the fine scale on the coarse scale into account, is sparse and reduced in size compared to the full order model. Several numerical experiments show the accuracy and efficiency of the method.
This paper is concerned with a blood flow problem coupled with a slow plaque growth at the artery wall. In the model, the micro (fast) system is the Navier-Stokes equation with a periodically applied force and the macro (slow) system is a fractional reaction equation, which is used to describe the plaque growth with memory effect. We construct an auxiliary temporal periodic problem and an effective time-average equation to approximate the original problem and analyze the approximation error of the corresponding linearized PDE (Stokes) system, where the simple front-tracking technique is used to update the slow moving boundary. An effective multiscale method is then designed based on the approximate problem and the front tracking framework. We also present a temporal finite difference scheme with a spatial continuous finite element method and analyze its temporal discrete error. Furthermore, a fast iterative procedure is designed to find the initial value of the temporal periodic problem and its convergence is analyzed as well. Our designed front-tracking framework and the iterative procedure for solving the temporal periodic problem make it easy to implement the multiscale method on existing PDE solving software. The numerical method is implemented by a combination of the finite element platform COMSOL Multiphysics and the mainstream software MATLAB, which significantly reduce the programming effort and easily handle the fluid-structure interaction, especially moving boundaries with more complex geometries. We present some numerical examples of ODEs and 2-D Navier-Stokes system to demonstrate the effectiveness of the multiscale method. Finally, we have a numerical experiment on the plaque growth problem and discuss the physical implication of the fractional order parameter.
Hypercomplex neural networks have proven to reduce the overall number of parameters while ensuring valuable performance by leveraging the properties of Clifford algebras. Recently, hypercomplex linear layers have been further improved by involving efficient parameterized Kronecker products. In this paper, we define the parameterization of hypercomplex convolutional layers and introduce the family of parameterized hypercomplex neural networks (PHNNs) that are lightweight and efficient large-scale models. Our method grasps the convolution rules and the filter organization directly from data without requiring a rigidly predefined domain structure to follow. PHNNs are flexible to operate in any user-defined or tuned domain, from 1D to $n$D regardless of whether the algebra rules are preset. Such a malleability allows processing multidimensional inputs in their natural domain without annexing further dimensions, as done, instead, in quaternion neural networks for 3D inputs like color images. As a result, the proposed family of PHNNs operates with $1/n$ free parameters as regards its analog in the real domain. We demonstrate the versatility of this approach to multiple domains of application by performing experiments on various image datasets as well as audio datasets in which our method outperforms real and quaternion-valued counterparts. Full code is available at: //github.com/eleGAN23/HyperNets.
Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in distributed machine learning practice. Typically, a graph structure $G$ exists behind local devices for communication. In this work, we consider parameter estimation in federated learning with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We encode the distribution heterogeneity by parametrizing distributions on local devices with a set of distinct $p$-dimensional vectors. We then propose to jointly estimate parameters of all devices under the $M$-estimation framework with the fused Lasso regularization, encouraging an equal estimate of parameters on connected devices in $G$. We provide a general result for our estimator depending on $G$, which can be further calibrated to obtain convergence rates for various specific problem setups. Surprisingly, our estimator attains the optimal rate under certain graph fidelity condition on $G$, as if we could aggregate all samples sharing the same distribution. If the graph fidelity condition is not met, we propose an edge selection procedure via multiple testing to ensure the optimality. To ease the burden of local computation, a decentralized stochastic version of ADMM is provided, with convergence rate $O(T^{-1}\log T)$ where $T$ denotes the number of iterations. We highlight that, our algorithm transmits only parameters along edges of $G$ at each iteration, without requiring a central machine, which preserves privacy. We further extend it to the case where devices are randomly inaccessible during the training process, with a similar algorithmic convergence guarantee. The computational and statistical efficiency of our method is evidenced by simulation experiments and the 2020 US presidential election data set.
Heterogeneous big data poses many challenges in machine learning. Its enormous scale, high dimensionality, and inherent uncertainty make almost every aspect of machine learning difficult, from providing enough processing power to maintaining model accuracy to protecting privacy. However, perhaps the most imposing problem is that big data is often interspersed with sensitive personal data. Hence, we propose a privacy-preserving hierarchical fuzzy neural network (PP-HFNN) to address these technical challenges while also alleviating privacy concerns. The network is trained with a two-stage optimization algorithm, and the parameters at low levels of the hierarchy are learned with a scheme based on the well-known alternating direction method of multipliers, which does not reveal local data to other agents. Coordination at high levels of the hierarchy is handled by the alternating optimization method, which converges very quickly. The entire training procedure is scalable, fast and does not suffer from gradient vanishing problems like the methods based on back-propagation. Comprehensive simulations conducted on both regression and classification tasks demonstrate the effectiveness of the proposed model.
In this paper, we present a multiscale framework for solving the 2D Helmholtz equation in heterogeneous media without scale separation and in the high frequency regime where the wavenumber $k$ can be large. The main innovation is that our methods achieve a nearly exponential rate of convergence with respect to the computational degrees of freedom, using a coarse grid of mesh size $O(1/k)$ without suffering from the well-known pollution effect. The key idea is a non-overlapped domain decomposition and its associated coarse-fine scale decomposition of the solution space that adapts to the media property and wavenumber; this decomposition is inspired by the multiscale finite element method (MsFEM). We show that the coarse part is of low complexity in the sense that it can be approximated with a nearly exponential rate of convergence via local basis functions, due to the compactness of a restriction operator that maps Helmholtz-harmonic functions to their interpolation residues on edges, while the fine part is local such that it can be computed efficiently using the local information of the right hand side. The combination of the two parts yields the overall nearly exponential rate of convergence of our multiscale method. Our method draws many connections to multiscale methods in the literature, which we will comment in detail. We demonstrate the effectiveness of our methods theoretically and numerically; an exponential rate of convergence is consistently observed and confirmed. In addition, we observe the robustness of our methods regarding the high contrast in the media numerically.
While Transformers have had significant success in paragraph generation, they treat sentences as linear sequences of tokens and often neglect their hierarchical information. Prior work has shown that decomposing the levels of granularity~(e.g., word, phrase, or sentence) for input tokens has produced substantial improvements, suggesting the possibility of enhancing Transformers via more fine-grained modeling of granularity. In this work, we propose a continuous decomposition of granularity for neural paraphrase generation (C-DNPG). In order to efficiently incorporate granularity into sentence encoding, C-DNPG introduces a granularity-aware attention (GA-Attention) mechanism which extends the multi-head self-attention with: 1) a granularity head that automatically infers the hierarchical structure of a sentence by neurally estimating the granularity level of each input token; and 2) two novel attention masks, namely, granularity resonance and granularity scope, to efficiently encode granularity into attention. Experiments on two benchmarks, including Quora question pairs and Twitter URLs have shown that C-DNPG outperforms baseline models by a remarkable margin and achieves state-of-the-art results in terms of many metrics. Qualitative analysis reveals that C-DNPG indeed captures fine-grained levels of granularity with effectiveness.
The increasing size of recently proposed Neural Networks makes it hard to implement them on embedded devices, where memory, battery and computational power are a non-trivial bottleneck. For this reason during the last years network compression literature has been thriving and a large number of solutions has been been published to reduce both the number of operations and the parameters involved with the models. Unfortunately, most of these reducing techniques are actually heuristic methods and usually require at least one re-training step to recover the accuracy. The need of procedures for model reduction is well-known also in the fields of Verification and Performances Evaluation, where large efforts have been devoted to the definition of quotients that preserve the observable underlying behaviour. In this paper we try to bridge the gap between the most popular and very effective network reduction strategies and formal notions, such as lumpability, introduced for verification and evaluation of Markov Chains. Elaborating on lumpability we propose a pruning approach that reduces the number of neurons in a network without using any data or fine-tuning, while completely preserving the exact behaviour. Relaxing the constraints on the exact definition of the quotienting method we can give a formal explanation of some of the most common reduction techniques.
Background: Instrumental variables (IVs) can be used to provide evidence as to whether a treatment X has a causal effect on an outcome Y. Even if the instrument Z satisfies the three core IV assumptions of relevance, independence and the exclusion restriction, further assumptions are required to identify the average causal effect (ACE) of X on Y. Sufficient assumptions for this include: homogeneity in the causal effect of X on Y; homogeneity in the association of Z with X; and no effect modification (NEM). Methods: We describe the NO Simultaneous Heterogeneity (NOSH) assumption, which requires the heterogeneity in the X-Y causal effect to be mean independent of (i.e., uncorrelated with) both Z and heterogeneity in the Z-X association. This happens, for example, if there are no common modifiers of the X-Y effect and the Z-X association, and the X-Y effect is additive linear. We illustrate NOSH using simulations and by re-examining selected published studies. Results: When NOSH holds, the Wald estimand equals the ACE even if both homogeneity assumptions and NEM (which we demonstrate to be special cases of - and therefore stronger than - NOSH) are violated. Conclusions: NOSH is sufficient for identifying the ACE using IVs. Since NOSH is weaker than existing assumptions for ACE identification, doing so may be more plausible than previously anticipated.
Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.