We consider the linear discriminant analysis problem in the high-dimensional settings. In this work, we propose PANDA(PivotAl liNear Discriminant Analysis), a tuning-insensitive method in the sense that it requires very little effort to tune the parameters. Moreover, we prove that PANDA achieves the optimal convergence rate in terms of both the estimation error and misclassification rate. Our theoretical results are backed up by thorough numerical studies using both simulated and real datasets. In comparison with the existing methods, we observe that our proposed PANDA yields equal or better performance, and requires substantially less effort in parameter tuning.
Continuous monitoring and patient acuity assessments are key aspects of Intensive Care Unit (ICU) practice, but both are limited by time constraints imposed on healthcare providers. Moreover, anticipating clinical trajectories remains imprecise. The objectives of this study are to (1) develop an electronic phenotype of acuity using automated variable retrieval within the electronic health records and (2) describe transitions between acuity states that illustrate the clinical trajectories of ICU patients. We gathered two single-center, longitudinal electronic health record datasets for 51,372 adult ICU patients admitted to the University of Florida Health (UFH) Gainesville (GNV) and Jacksonville (JAX). We developed algorithms to quantify acuity status at four-hour intervals for each ICU admission and identify acuity phenotypes using continuous acuity status and k-means clustering approach. 51,073 admissions for 38,749 patients in the UFH GNV dataset and 22,219 admissions for 12,623 patients in the UFH JAX dataset had at least one ICU stay lasting more than four hours. There were three phenotypes: persistently stable, persistently unstable, and transitioning from unstable to stable. For stable patients, approximately 0.7%-1.7% would transition to unstable, 0.02%-0.1% would expire, 1.2%-3.4% would be discharged, and the remaining 96%-97% would remain stable in the ICU every four hours. For unstable patients, approximately 6%-10% would transition to stable, 0.4%-0.5% would expire, and the remaining 89%-93% would remain unstable in the ICU in the next four hours. We developed phenotyping algorithms for patient acuity status every four hours while admitted to the ICU. This approach may be useful in developing prognostic and clinical decision-support tools to aid patients, caregivers, and providers in shared decision-making processes regarding escalation of care and patient values.
Fitts' law has been widely employed as a research method for analyzing tasks within the domain of Human-Computer Interaction (HCI). However, its application to non-computer tasks has remained limited. This study aims to extend the application of Fitts' law to the realm of sports, specifically focusing on squash. Squash is a high-intensity sport that requires quick movements and precise shots. Our research investigates the effectiveness of utilizing Fitts' law to evaluate the task difficulty and effort level associated with executing and responding to various squash shots. By understanding the effort/information rate required for each shot, we can determine which shots are more effective in making the opponent work harder. Additionally, this knowledge can be valuable for coaches in designing training programs. However, since Fitts' law was primarily developed for human-computer interaction, we adapted it to fit the squash scenario. This paper provides an overview of Fitts' law and its relevance to sports, elucidates the motivation driving this investigation, outlines the methodology employed to explore this novel avenue, and presents the obtained results, concluding with key insights. We conducted experiments with different shots and players, collecting data on shot speed, player movement time, and distance traveled. Using this data, we formulated a modified version of Fitts' law specifically for squash. The results provide insights into the difficulty and effectiveness of various shots, offering valuable information for both players and coaches in the sport of squash.
Low rank approximation of a matrix (hereafter LRA) is a highly important area of Numerical Linear and Multilinear Algebra and Data Mining and Analysis. One can operate with LRA at sublinear cost, that is, by using much fewer memory cells and flops than an input matrix has entries, but no sublinear cost algorithm can compute accurate LRA of the worst case input matrices or even of the matrices of small families in our Appendix. Nevertheless we prove that Cross-Approximation celebrated algorithms and even more primitive sublinear cost algorithms output quite accurate LRA for a large subclass of the class of all matrices that admit LRA and in a sense for most of such matrices. Moreover, we accentuate the power of sublinear cost LRA by means of multiplicative pre-processing of an input matrix, and this also reveals a link between C-A algorithms and Randomized and Sketching LRA algorithms. Our tests are in good accordance with our formal study.
In this work, we study how the performance of a given direction changes with its sampling ratio in Multilingual Neural Machine Translation (MNMT). By training over 200 multilingual models with various model sizes, data sizes, and language directions, we find it interesting that the performance of certain translation direction does not always improve with the increase of its weight in the multi-task optimization objective. Accordingly, scalarization method leads to a multitask trade-off front that deviates from the traditional Pareto front when there exists data imbalance in the training corpus, which poses a great challenge to improve the overall performance of all directions. Based on our observations, we propose the Double Power Law to predict the unique performance trade-off front in MNMT, which is robust across various languages, data adequacy, and the number of tasks. Finally, we formulate the sample ratio selection problem in MNMT as an optimization problem based on the Double Power Law. In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget. We release the code at //github.com/pkunlp-icler/ParetoMNMT for reproduction.
Confidence intervals (CI) for the IPW estimators of the ATT and ATO might not always yield conservative CIs when using the 'robust sandwich variance' estimator. In this manuscript, we identify scenarios where this variance estimator can be employed to derive conservative CIs. Specifically, for the ATT, a conservative CI can be derived when there's a homogeneous treatment effect or the interaction effect surpasses the effect from the covariates alone. For the ATO, conservative CIs can be derived under certain conditions, such as when there are homogeneous treatment effects, when there exists significant treatment-confounder interactions, or when there's a large number of members in the control groups.
The research on Reconfigurable Intelligent Surfaces (RISs) has dominantly been focused on physical-layer aspects and analyses of the achievable adaptation of the wireless propagation environment. Compared to that, questions related to system-level integration of RISs have received less attention. We address this research gap by analyzing the necessary control/signaling operations that are necessary to integrate RIS as a new type of wireless infrastructure element. We build a general model for evaluating the impact of control operations along two dimensions: i) the allocated bandwidth of the control channels (in-band and out-of-band), and ii) the rate selection for the data channel (multiplexing or diversity). Specifically, the second dimension results in two generic transmission schemes, one based on channel estimation and the subsequent optimization of the RIS, while the other is based on sweeping through predefined RIS phase configurations. We analyze the communication performance in multiple setups built along these two dimensions. While necessarily simplified, our analysis reveals the basic trade-offs in RIS-assisted communication and the associated control operations. The main contribution of the paper is a methodology for systematic evaluation of the control overhead in RIS-aided networks, regardless of the specific control schemes used.
We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure convergence, and the (functional) central limit theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we propose online versions of the Durbin-Wu-Hausman and Sargan-Hansen tests that can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo simulations show that as the sample size increases, the SGMM matches the standard (offline) GMM in terms of estimation accuracy and gains over computational efficiency, indicating its practical value for both large-scale and online datasets. We demonstrate the efficacy of our approach by a proof of concept using two well known empirical examples with large sample sizes.
Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis problems and test-cases, are used to measure the performance of various LLMs on code synthesis. However, these test-cases can be limited in both quantity and quality for fully assessing the functional correctness of the generated code. Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus -- a code synthesis evaluation framework to rigorously benchmark the functional correctness of LLM-synthesized code. EvalPlus augments a given evaluation dataset with large amounts of test-cases newly produced by an automatic test input generator, powered by both LLM- and mutation-based strategies. While EvalPlus is general, we extend the test-cases of the popular HumanEval benchmark by 80x to build HumanEval+. Our extensive evaluation across 26 popular LLMs (e.g., GPT-4 and ChatGPT) demonstrates that HumanEval+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by up-to 19.3-28.9%. We also surprisingly found that test insufficiency can lead to mis-ranking. For example, both WizardCoder-CodeLlama and Phind-CodeLlama now outperform ChatGPT on HumanEval+, while none of them could on HumanEval. Our work not only indicates that prior popular code synthesis evaluation results do not accurately reflect the true performance of LLMs for code synthesis, but also opens up a new direction to improve such programming benchmarks through automated testing. We have open-sourced our tools, enhanced datasets as well as all LLM-generated code at //github.com/evalplus/evalplus to facilitate and accelerate future LLM-for-code research.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.