In this article, we study the test for independence of two random elements $X$ and $Y$ lying in an infinite dimensional space ${\cal{H}}$ (specifically, a real separable Hilbert space equipped with the inner product $\langle ., .\rangle_{\cal{H}}$). In the course of this study, a measure of association is proposed based on the sup-norm difference between the joint probability density function of the bivariate random vector $(\langle l_{1}, X \rangle_{\cal{H}}, \langle l_{2}, Y \rangle_{\cal{H}})$ and the product of marginal probability density functions of the random variables $\langle l_{1}, X \rangle_{\cal{H}}$ and $\langle l_{2}, Y \rangle_{\cal{H}}$, where $l_{1}\in{\cal{H}}$ and $l_{2}\in{\cal{H}}$ are two arbitrary elements. It is established that the proposed measure of association equals zero if and only if the random elements are independent. In order to carry out the test whether $X$ and $Y$ are independent or not, the sample version of the proposed measure of association is considered as the test statistic after appropriate normalization, and the asymptotic distributions of the test statistic under the null and the local alternatives are derived. The performance of the new test is investigated for simulated data sets and the practicability of the test is shown for three real data sets related to climatology, biological science and chemical science.
Very recently, the first mathematical runtime analyses of the multi-objective evolutionary optimizer NSGA-II have been conducted. We continue this line of research with a first runtime analysis of this algorithm on a benchmark problem consisting of two multimodal objectives. We prove that if the population size $N$ is at least four times the size of the Pareto front, then the NSGA-II with four different ways to select parents and bit-wise mutation optimizes the OneJumpZeroJump benchmark with jump size~$2 \le k \le n/4$ in time $O(N n^k)$. When using fast mutation, a recently proposed heavy-tailed mutation operator, this guarantee improves by a factor of $k^{\Omega(k)}$. Overall, this work shows that the NSGA-II copes with the local optima of the OneJumpZeroJump problem at least as well as the global SEMO algorithm.
In this paper, we propose an information geometry approach (IGA) for signal detection (SD) in ultra-massive multiple-input multiple-output (MIMO) systems. We formulate the signal detection as obtaining the marginals of the a posteriori probability distribution of the transmitted symbol vector. Then, a maximization of the a posteriori marginals (MPM) for signal detection can be performed. With the information geometry theory, we calculate the approximations of the a posteriori marginals. It is formulated as an iterative m-projection process between submanifolds with different constraints. We then apply the central-limit-theorem (CLT) to simplify the calculation of the m-projection since the direct calculation of the m-projection is of exponential-complexity. With the CLT, we obtain an approximate solution of the m-projection, which is asymptotically accurate. Simulation results demonstrate that the proposed IGA-SD emerges as a promising and efficient method to implement the signal detector in ultra-massive MIMO systems.
In this two-part paper, we investigate the channel estimation for massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. In Part I, we revisit the information geometry approach (IGA) for massive MIMO-OFDM channel estimation. By using the constant magnitude property of the entries of the measurement matrix in the massive MIMO-OFDM channel estimation and the asymptotic analysis, we find that the second-order natural parameters of the distributions on all the auxiliary manifolds are equivalent to each other at each iteration of IGA, and the first-order natural parameters of the distributions on all the auxiliary manifolds are asymptotically equivalent to each other at the fixed point of IGA. Motivated by these results, we simplify the iterative process of IGA and propose a simplified IGA for massive MIMO-OFDM channel estimation. It is proved that at the fixed point, the a posteriori mean obtained by the simplified IGA is asymptotically optimal. The simplified IGA allows efficient implementation with fast Fourier transformation (FFT). Simulations confirm that the simplified IGA can achieve near the optimal performance with low complexity in a limited number of iterations.
Training students in basic concepts of physics, such as the ones related to mass, volume, or density, is much more complicated than just stating the underlying definitions and laws. One of the reasons for this is that most students have deeply rooted delusions and misconceptions about the behavior of objects, sometimes close to magical thinking. Many innovative and promising technologies, in particular Virtual Reality (VR), can be used to enhance student learning. We compared the effectiveness of a serious immersive game in teaching the concept of density in various conditions: a 2D version in an embedded web browser and a 3D immersive game in VR. We also developed a specific questionnaire to assess students' knowledge improvement. Primary results have shown an increase in learning efficiency using VR. Also, most students were able to see the shortcomings of their initial theories and revise them, which means that they improved their understanding of this topic.
We propose a method for estimating a log-concave density on $\mathbb R^d$ from samples, under the assumption that there exists an orthogonal transformation that makes the components of the random vector independent. While log-concave density estimation is hard both computationally and statistically, the independent components assumption alleviates both issues, while still maintaining a large non-parametric class. We prove that under mild conditions, at most $\tilde{\mathcal{O}}(\epsilon^{-4})$ samples (suppressing constants and log factors) suffice for our proposed estimator to be within $\epsilon$ of the original density in squared Hellinger distance. On the computational front, while the usual log-concave maximum likelihood estimate can be obtained via a finite-dimensional convex program, it is slow to compute -- especially in higher dimensions. We demonstrate through numerical experiments that our estimator can be computed efficiently, making it more practical to use.
We establish the unique ergodicity of the Markov chain generated by the stochastic theta method (STM) with $\theta \in [1/2, 1]$ for monotone SODEs, without growth restriction on the coefficients, driven by nondegenerate multiplicative noise. The main ingredient of the arguments lies in the construction of new Lyapunov functions, involving the coefficients, the stepsize, and $\theta$, and the irreducibility and the strong Feller property for the STM. We also generalize the arguments to the STM and its Galerkin-based full discretizations for a class of monotone SPDEs driven by infinite-dimensional nondegenerate multiplicative trace-class noise. Applying these results to the stochastic Allen--Cahn equation indicates that its drift-implicit Euler scheme is uniquely ergodic for any interface thickness, which gives an affirmative answer to a question proposed in (J. Cui, J. Hong, and L. Sun, Stochastic Process. Appl. (2021): 55--93). Numerical experiments verify our theoretical results.
We consider the problem of red teaming LLMs on elementary calculations and algebraic tasks to evaluate how various prompting techniques affect the quality of outputs. We present a framework to procedurally generate numerical questions and puzzles, and compare the results with and without the application of several red teaming techniques. Our findings suggest that even though structured reasoning and providing worked-out examples slow down the deterioration of the quality of answers, the gpt-3.5-turbo and gpt-4 models are not well suited for elementary calculations and reasoning tasks, also when being red teamed.
We develop a new efficient sequential approximate leverage score algorithm, SALSA, using methods from randomized numerical linear algebra (RandNLA) for large matrices. We demonstrate that, with high probability, the accuracy of SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage scores. In addition, we show that the theoretical computational complexity and numerical accuracy of SALSA surpass existing approximations. These theoretical results are subsequently utilized to develop an efficient algorithm, named LSARMA, for fitting an appropriate ARMA model to large-scale time series data. Our proposed algorithm is, with high probability, guaranteed to find the maximum likelihood estimates of the parameters for the true underlying ARMA model. Furthermore, it has a worst-case running time that significantly improves those of the state-of-the-art alternatives in big data regimes. Empirical results on large-scale data strongly support these theoretical results and underscore the efficacy of our new approach.
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.