Simulation speed matters for neuroscientific research: this includes not only how quickly the simulated model time of a large-scale spiking neuronal network progresses, but also how long it takes to instantiate the network model in computer memory. On the hardware side, acceleration via highly parallel GPUs is being increasingly utilized. On the software side, code generation approaches ensure highly optimized code, at the expense of repeated code regeneration and recompilation after modifications to the network model. Aiming for a greater flexibility with respect to iterative model changes, here we propose a new method for creating network connections interactively, dynamically, and directly in GPU memory through a set of commonly used high-level connection rules. We validate the simulation performance with both consumer and data center GPUs on two neuroscientifically relevant models: a cortical microcircuit of about 77,000 leaky-integrate-and-fire neuron models and 300 million static synapses, and a two-population network recurrently connected using a variety of connection rules. With our proposed ad hoc network instantiation, both network construction and simulation times are comparable or shorter than those obtained with other state-of-the-art simulation technologies, while still meeting the flexibility demands of explorative network modeling.
In a mobile wireless channel, the small-scale multipath fading induces temporal channel fluctuations in the form of peaks and deep fades. The channel capacity degradation with fading severity in the high signal-to-noise ratio (SNR) regime is well known in the wireless communication literature: the probability of deep fades increases significantly with higher fading severity resulting in poor performance. In this paper, we focus on double-fading pinhole channels under perfect CSIT to show a very counter-intuitive result that - higher fading severity enables higher ergodic capacity at sufficiently low SNR. The underlying reason is that at low SNRs, ergodic capacity depends crucially on the probability distribution of channel peaks (simply tail distribution); for the pinhole channel, the tail distribution improves with increased fading severity. This allows a transmitter operating at low SNR to exploit channel peaks more efficiently resulting in a net improvement in achievable spectral efficiency. We derive a new key result quantifying the above dependence for the double-Nakagami-$m$ fading pinhole channel - that is, the ergodic capacity ${C} \propto (m_T m_R)^{-1}$ at low SNR, where $m_T m_R$ is the product of fading (severity) parameters of the two independent Nakagami-$m$ fadings involved.
We present a novel form of Fourier analysis, and associated signal processing concepts, for signals (or data) indexed by edge-weighted directed acyclic graphs (DAGs). This means that our Fourier basis yields an eigendecomposition of a suitable notion of shift and convolution operators that we define. DAGs are the common model to capture causal relationships between data values and in this case our proposed Fourier analysis relates data with its causes under a linearity assumption that we define. The definition of the Fourier transform requires the transitive closure of the weighted DAG for which several forms are possible depending on the interpretation of the edge weights. Examples include level of influence, distance, or pollution distribution. Our framework is different from prior GSP: it is specific to DAGs and leverages, and extends, the classical theory of Moebius inversion from combinatorics. For a prototypical application we consider DAGs modeling dynamic networks in which edges change over time. Specifically, we model the spread of an infection on such a DAG obtained from real-world contact tracing data and learn the infection signal from samples assuming sparsity in the Fourier domain.
Recently, a variety of Neural radiance fields methods have garnered remarkable success in high render speed. However, current accelerating methods is specialized and not compatible for various implicit method, which prevent a real-time composition over different kinds of NeRF works. Since NeRF relies on sampling along rays, it's possible to provide a guidance generally. We propose a general implicit pipeline to rapidly compose NeRF objects. This new method enables the casting of dynamic shadows within or between objects using analytical light sources while allowing multiple NeRF objects to be seamlessly placed and rendered together with any arbitrary rigid transformations. Mainly, our work introduces a new surface representation known as Neural Depth Fields (NeDF) that quickly determines the spatial relationship between objects by allowing direct intersection computation between rays and implicit surfaces. It leverages an intersection neural network to query NeRF for acceleration instead of depending on an explicit spatial structure.Our proposed method is the first to enable both the progressive and interactive composition of NeRF objects. Additionally, it also serves as a previewing plugin for a range of existing NeRF works.
Among the most relevant processes in the Earth system for human habitability are quasi-periodic, ocean-driven multi-year events whose dynamics are currently incompletely characterized by physical models, and hence poorly predictable. This work aims at showing how 1) data-driven, stochastic machine learning approaches provide an affordable yet flexible means to forecast these processes; 2) the associated uncertainty can be properly calibrated with fast ensemble-based approaches. While the methodology introduced and discussed in this work pertains to synoptic scale events, the principle of augmenting incomplete or highly sensitive physical systems with data-driven models to improve predictability is far more general and can be extended to environmental problems of any scale in time or space.
Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e.g. self- and mutual occlusion and similar textures. Previous works only leverage information from a single RGB image without modeling their physically plausible relation, which leads to inferior reconstruction results. In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction. On one hand, we leverage temporal context to complement insufficient information provided by the single frame, and design a novel temporal framework with a temporal constraint for interacting hand motion smoothness. On the other hand, we further propose an interpenetration detection module to produce kinetically plausible interacting hands without physical collisions. Extensive experiments are performed to validate the effectiveness of our proposed framework, which achieves new state-of-the-art performance on public benchmarks.
State-space models (SSMs) are commonly used to model time series data where the observations depend on an unobserved latent process. However, inference on the model parameters of an SSM can be challenging, especially when the likelihood of the data given the parameters is not available in closed-form. One approach is to jointly sample the latent states and model parameters via Markov chain Monte Carlo (MCMC) and/or sequential Monte Carlo approximation. These methods can be inefficient, mixing poorly when there are many highly correlated latent states or parameters, or when there is a high rate of sample impoverishment in the sequential Monte Carlo approximations. We propose a novel block proposal distribution for Metropolis-within-Gibbs sampling on the joint latent state and parameter space. The proposal distribution is informed by a deterministic hidden Markov model (HMM), defined such that the usual theoretical guarantees of MCMC algorithms apply. We discuss how the HMMs are constructed, the generality of the approach arising from the tuning parameters, and how these tuning parameters can be chosen efficiently in practice. We demonstrate that the proposed algorithm using HMM approximations provides an efficient alternative method for fitting state-space models, even for those that exhibit near-chaotic behavior.
The ensemble data assimilation of computational fluid dynamics simulations based on the lattice Boltzmann method (LBM) and the local ensemble transform Kalman filter (LETKF) is implemented and optimized on a GPU supercomputer based on NVIDIA A100 GPUs. To connect the LBM and LETKF parts, data transpose communication is optimized by overlapping computation, file I/O, and communication based on data dependency in each LETKF kernel. In two dimensional forced isotropic turbulence simulations with the ensemble size of $M=64$ and the number of grid points of $N_x=128^2$, the optimized implementation achieved $\times3.80$ speedup from the naive implementation, in which the LETKF part is not parallelized. The main computing kernel of the local problem is the eigenvalue decomposition (EVD) of $M\times M$ real symmetric dense matrices, which is computed by a newly developed batched EVD in $\verb|EigenG|$. The batched EVD in $\verb|EigenG|$ outperforms that in $\verb|cuSOLVER|$, and $\times65.3$ speedup was achieved.
Efficient differential equation solvers have significantly reduced the sampling time of diffusion models (DMs) while retaining high sampling quality. Among these solvers, exponential integrators (EI) have gained prominence by demonstrating state-of-the-art performance. However, existing high-order EI-based sampling algorithms rely on degenerate EI solvers, resulting in inferior error bounds and reduced accuracy in contrast to the theoretically anticipated results under optimal settings. This situation makes the sampling quality extremely vulnerable to seemingly innocuous design choices such as timestep schedules. For example, an inefficient timestep scheduler might necessitate twice the number of steps to achieve a quality comparable to that obtained through carefully optimized timesteps. To address this issue, we reevaluate the design of high-order differential solvers for DMs. Through a thorough order analysis, we reveal that the degeneration of existing high-order EI solvers can be attributed to the absence of essential order conditions. By reformulating the differential equations in DMs and capitalizing on the theory of exponential integrators, we propose refined EI solvers that fulfill all the order conditions, which we designate as Refined Exponential Solver (RES). Utilizing these improved solvers, RES exhibits more favorable error bounds theoretically and achieves superior sampling efficiency and stability in practical applications. For instance, a simple switch from the single-step DPM-Solver++ to our order-satisfied RES solver when Number of Function Evaluations (NFE) $=9$, results in a reduction of numerical defects by $25.2\%$ and FID improvement of $25.4\%$ (16.77 vs 12.51) on a pre-trained ImageNet diffusion model.
This is part II of a two-part paper. Part I presented a universal Birkhoff theory for fast and accurate trajectory optimization. The theory rested on two main hypotheses. In this paper, it is shown that if the computational grid is selected from any one of the Legendre and Chebyshev family of node points, be it Lobatto, Radau or Gauss, then, the resulting collection of trajectory optimization methods satisfy the hypotheses required for the universal Birkhoff theory to hold. All of these grid points can be generated at an $\mathcal{O}(1)$ computational speed. Furthermore, all Birkhoff-generated solutions can be tested for optimality by a joint application of Pontryagin's- and Covector-Mapping Principles, where the latter was developed in Part~I. More importantly, the optimality checks can be performed without resorting to an indirect method or even explicitly producing the full differential-algebraic boundary value problem that results from an application of Pontryagin's Principle. Numerical problems are solved to illustrate all these ideas. The examples are chosen to particularly highlight three practically useful features of Birkhoff methods: (1) bang-bang optimal controls can be produced without suffering any Gibbs phenomenon, (2) discontinuous and even Dirac delta covector trajectories can be well approximated, and (3) extremal solutions over dense grids can be computed in a stable and efficient manner.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.