We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints. We consider a fully stochastic setting, where at each step a single sample is generated to estimate the objective gradient. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices (i.e., Hessians without modification) in SQP subproblems. As a trust-region method for constrained optimization, our algorithm must address an infeasibility issue -- the linearized equality constraints and trust-region constraints may lead to infeasible SQP subproblems. In this regard, we propose an adaptive relaxation technique to compute the trial step, consisting of a normal step and a tangential step. To control the lengths of these two steps while ensuring a scale-invariant property, we adaptively decompose the trust-region radius into two segments, based on the proportions of the rescaled feasibility and optimality residuals to the rescaled full KKT residual. The normal step has a closed form, while the tangential step is obtained by solving a trust-region subproblem, to which a solution ensuring the Cauchy reduction is sufficient for our study. We establish a global almost sure convergence guarantee for TR-StoSQP, and illustrate its empirical performance on both a subset of problems in the CUTEst test set and constrained logistic regression problems using data from the LIBSVM collection.
We examine the linear regression problem in a challenging high-dimensional setting with correlated predictors where the vector of coefficients can vary from sparse to dense. In this setting, we propose a combination of probabilistic variable screening with random projection tools as a viable approach. More specifically, we introduce a new data-driven random projection tailored to the problem at hand and derive a theoretical bound on the gain in expected prediction error over conventional random projections. The variables to enter the projection are screened by accounting for predictor correlation. To reduce the dependence on fine-tuning choices, we aggregate over an ensemble of linear models. A thresholding parameter is introduced to obtain a higher degree of sparsity. Both this parameter and the number of models in the ensemble can be chosen by cross-validation. In extensive simulations, we compare the proposed method with other random projection tools and with classical sparse and dense methods and show that it is competitive in terms of prediction across a variety of scenarios with different sparsity and predictor covariance settings. We also show that the method with cross-validation is able to rank the variables satisfactorily. Finally, we showcase the method on two real data applications.
We propose a PnP algorithm for a camera constrained to two-dimensional motion (applicable, for instance, to many wheeled robotics platforms). Leveraging this assumption allows accuracy and performance improvements over 3D PnP algorithms due to the reduction in search space dimensionality. It also reduces the incidence of ambiguous pose estimates (as, in most cases, the spurious solutions fall outside the plane of movement). Our algorithm finds an approximate solution by solving a polynomial system and refines its prediction iteratively to minimize the reprojection error. The algorithm compares favorably to existing 3D PnP algorithms in terms of accuracy, performance, and robustness to noise.
A novel near-field transmission framework is proposed for dynamic metasurface antenna (DMA)-enabled non-orthogonal multiple access (NOMA) networks. The base station (BS) exploits the hybrid beamforming to communicate with multiple near users (NUs) and far users (FUs) using the NOMA principle. Based on this framework, two novel beamforming schemes are proposed. 1) For the case of the grouped users distributed in the same direction, a beam-steering scheme is developed. The metric of beam pattern error (BPE) is introduced for the characterization of the gap between the hybrid beamformers and the desired ideal beamformers, where a two-layer algorithm is proposed to minimize BPE by optimizing hybrid beamformers. Then, the optimal power allocation strategy is obtained to maximize the sum achievable rate of the network. 2) For the case of users randomly distributed, a beam-splitting scheme is proposed, where two sub-beamformers are extracted from the single beamformer to serve different users in the same group. An alternating optimization (AO) algorithm is proposed for hybrid beamformer optimization, and the optimal power allocation is also derived. Numerical results validate that: 1) the proposed beamforming schemes exhibit superior performance compared with the existing imperfect-resolution-based beamforming scheme; 2) the communication rate of the proposed transmission framework is sensitive to the imperfect distance knowledge of NUs but not to that of FUs.
Current reinforcement learning algorithms struggle in sparse and complex environments, most notably in long-horizon manipulation tasks entailing a plethora of different sequences. In this work, we propose the Intrinsically Guided Exploration from Large Language Models (IGE-LLMs) framework. By leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the exploratory process in reinforcement learning to address intricate long-horizon with sparse rewards robotic manipulation tasks. We evaluate our framework and related intrinsic learning methods in an environment challenged with exploration, and a complex robotic manipulation task challenged by both exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, (ii) can be combined and complement existing learning methods highlighting its modularity, (iii) are fairly insensitive to different intrinsic scaling parameters, and (iv) maintain robustness against increased levels of uncertainty and horizons.
We consider estimating a matrix from noisy observations coming from an arbitrary additive bi- rotational invariant perturbation. We propose an estimator which is optimal among the class of rectangular rotational invariant estimators and can be applied irrespective of the prior on the signal. For the particular case of Gaussian noise, we prove the optimality of the proposed estimator, and we find an explicit expression for the MMSE in terms of the limiting singular value distribution of the observation matrix. Moreover, we prove a formula linking the asymptotic mutual information and the limit of a log-spherical integral of rectangular matrices. We also provide numerical checks for our results for general bi-rotational invariant noise, as well as Gaussian noise, which match our theoretical predictions.
Few-shot knowledge graph completion (FKGC) aims to query the unseen facts of a relation given its few-shot reference entity pairs. The side effect of noises due to the uncertainty of entities and triples may limit the few-shot learning, but existing FKGC works neglect such uncertainty, which leads them more susceptible to limited reference samples with noises. In this paper, we propose a novel uncertainty-aware few-shot KG completion framework (UFKGC) to model uncertainty for a better understanding of the limited data by learning representations under Gaussian distribution. Uncertainty representation is first designed for estimating the uncertainty scope of the entity pairs after transferring feature representations into a Gaussian distribution. Further, to better integrate the neighbors with uncertainty characteristics for entity features, we design an uncertainty-aware relational graph neural network (UR-GNN) to conduct convolution operations between the Gaussian distributions. Then, multiple random samplings are conducted for reference triples within the Gaussian distribution to generate smooth reference representations during the optimization. The final completion score for each query instance is measured by the designed uncertainty optimization to make our approach more robust to the noises in few-shot scenarios. Experimental results show that our approach achieves excellent performance on two benchmark datasets compared to its competitors.
The pursuit of accurate 3D hand pose estimation stands as a keystone for understanding human activity in the realm of egocentric vision. The majority of existing estimation methods still rely on single-view images as input, leading to potential limitations, e.g., limited field-of-view and ambiguity in depth. To address these problems, adding another camera to better capture the shape of hands is a practical direction. However, existing multi-view hand pose estimation methods suffer from two main drawbacks: 1) Requiring multi-view annotations for training, which are expensive. 2) During testing, the model becomes inapplicable if camera parameters/layout are not the same as those used in training. In this paper, we propose a novel Single-to-Dual-view adaptation (S2DHand) solution that adapts a pre-trained single-view estimator to dual views. Compared with existing multi-view training methods, 1) our adaptation process is unsupervised, eliminating the need for multi-view annotation. 2) Moreover, our method can handle arbitrary dual-view pairs with unknown camera parameters, making the model applicable to diverse camera settings. Specifically, S2DHand is built on certain stereo constraints, including pair-wise cross-view consensus and invariance of transformation between both views. These two stereo constraints are used in a complementary manner to generate pseudo-labels, allowing reliable adaptation. Evaluation results reveal that S2DHand achieves significant improvements on arbitrary camera pairs under both in-dataset and cross-dataset settings, and outperforms existing adaptation methods with leading performance. Project page: //github.com/MickeyLLG/S2DHand.
Video stabilization is a longstanding computer vision problem, particularly pixel-level synthesis solutions for video stabilization which synthesize full frames add to the complexity of this task. These techniques aim to stabilize videos by synthesizing full frames while enhancing the stability of the considered video. This intensifies the complexity of the task due to the distinct mix of unique motion profiles and visual content present in each video sequence, making robust generalization with fixed parameters difficult. In our study, we introduce a novel approach to enhance the performance of pixel-level synthesis solutions for video stabilization by adapting these models to individual input video sequences. The proposed adaptation exploits low-level visual cues accessible during test-time to improve both the stability and quality of resulting videos. We highlight the efficacy of our methodology of "test-time adaptation" through simple fine-tuning of one of these models, followed by significant stability gain via the integration of meta-learning techniques. Notably, significant improvement is achieved with only a single adaptation step. The versatility of the proposed algorithm is demonstrated by consistently improving the performance of various pixel-level synthesis models for video stabilization in real-world scenarios.
Cold-start problems are long-standing challenges for practical recommendations. Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenarios with few interactions. This paper addresses such problems using few-shot learning and meta learning. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. To accomplish this, we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to-end framework, namely Scenario-specific Sequential Meta learner (or s^2 meta). By doing so, our meta-learner produces a generic initial model through aggregating contextual information from a variety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge. Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation. Deployment is at the Guess You Like session, the front page of the Mobile Taobao.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.