The COVID-19 pandemic in 2020 has caused sudden shocks in transportation systems, specifically the subway ridership patterns in New York City. Understanding the temporal pattern of subway ridership through statistical models is crucial during such shocks. However, many existing statistical frameworks may not be a good fit to analyze the ridership data sets during the pandemic since some of the modeling assumption might be violated during this time. In this paper, utilizing change point detection procedures, we propose a piece-wise stationary time series model to capture the nonstationary structure of subway ridership. Specifically, the model consists of several independent station based autoregressive integrated moving average (ARIMA) models concatenated together at certain time points. Further, data-driven algorithms are utilized to detect the changes of ridership patterns as well as to estimate the model parameters before and during the COVID-19 pandemic. The data sets of focus are daily ridership of subway stations in New York City for randomly selected stations. Fitting the proposed model to these data sets enhances our understanding of ridership changes during external shocks, both in terms of mean (average) changes as well as the temporal correlations.
Cities often lack up-to-date data analytics to evaluate and implement transport planning interventions to achieve sustainability goals, as traditional data sources are expensive, infrequent, and suffer from data latency. Mobile phone data provide an inexpensive source of geospatial information to capture human mobility at unprecedented geographic and temporal granularity. This paper proposes a method to estimate updated mode of transportation usage in a city, with novel usage of mobile phone application traces to infer previously hard to detect modes, such as bikes and ride-hailing/taxi. By using data fusion and matrix factorisation, we integrate socioeconomic and demographic attributes of the local resident population into the model. We tested the method in a case study of Santiago (Chile), and found that changes from 2012 to 2020 in mode of transportation inferred by the method are coherent with expectations from domain knowledge and the literature, such as ride-hailing trips replacing mass transport.
Semantic place annotation can provide individual semantics, which can be of great help in the field of trajectory data mining. Most existing methods rely on annotated or external data and require retraining following a change of region, thus preventing their large-scale applications. Herein, we propose an unsupervised method denoted as UPAPP for the semantic place annotation of trajectories using spatiotemporal information. The Bayesian Criterion is specifically employed to decompose the spatiotemporal probability of the candidate place into spatial probability, duration probability, and visiting time probability. Spatial information in ROI and POI data is subsequently adopted to calculate the spatial probability. In terms of the temporal probabilities, the Term Frequency Inverse Document Frequency weighting algorithm is used to count the potential visits to different place types in the trajectories, and generates the prior probabilities of the visiting time and duration. The spatiotemporal probability of the candidate place is then combined with the importance of the place category to annotate the visited places. Validation with a trajectory dataset collected by 709 volunteers in Beijing showed that our method achieved an overall and average accuracy of 0.712 and 0.720, respectively, indicating that the visited places can be annotated accurately without any external data.
Recent developments in Artificial Intelligence (AI) have fueled the emergence of human-AI collaboration, a setting where AI is a coequal partner. Especially in clinical decision-making, it has the potential to improve treatment quality by assisting overworked medical professionals. Even though research has started to investigate the utilization of AI for clinical decision-making, its potential benefits do not imply its adoption by medical professionals. While several studies have started to analyze adoption criteria from a technical perspective, research providing a human-centered perspective with a focus on AI's potential for becoming a coequal team member in the decision-making process remains limited. Therefore, in this work, we identify factors for the adoption of human-AI collaboration by conducting a series of semi-structured interviews with experts in the healthcare domain. We identify six relevant adoption factors and highlight existing tensions between them and effective human-AI collaboration.
Approximate-message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument ? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph admit rigorous SE equations, extending the reach of previous proofs, and proving a number of recent heuristic derivations of those equations. Our proof naturally includes non-separable functions and we show how existing refinements, such as spatial coupling or matrix-valued variables, can be combined with our framework.
In this paper, we consider a resilient consensus problem for the multi-agent network where some of the agents are subject to Byzantine attacks and may transmit erroneous state values to their neighbors. In particular, we develop an event-triggered update rule to tackle this problem as well as reduce the communication for each agent. Our approach is based on the mean subsequence reduced (MSR) algorithm with agents being capable to communicate with multi-hop neighbors. Since delays are critical in such an environment, we provide necessary graph conditions for the proposed algorithm to perform well with delays in the communication. We highlight that through multi-hop communication, the network connectivity can be reduced especially in comparison with the common onehop communication case. Lastly, we show the effectiveness of the proposed algorithm by a numerical example.
The rapid growth of biomedical literature poses a significant challenge for curation and interpretation. This has become more evident during the COVID-19 pandemic. LitCovid, a literature database of COVID-19 related papers in PubMed, has accumulated over 180,000 articles with millions of accesses. Approximately 10,000 new articles are added to LitCovid every month. A main curation task in LitCovid is topic annotation where an article is assigned with up to eight topics, e.g., Treatment and Diagnosis. The annotated topics have been widely used both in LitCovid (e.g., accounting for ~18% of total uses) and downstream studies such as network generation. However, it has been a primary curation bottleneck due to the nature of the task and the rapid literature growth. This study proposes LITMC-BERT, a transformer-based multi-label classification method in biomedical literature. It uses a shared transformer backbone for all the labels while also captures label-specific features and the correlations between label pairs. We compare LITMC-BERT with three baseline models on two datasets. Its micro-F1 and instance-based F1 are 5% and 4% higher than the current best results, respectively, and only requires ~18% of the inference time than the Binary BERT baseline. The related datasets and models are available via //github.com/ncbi/ml-transformer.
The success of large-scale models in recent years has increased the importance of statistical models with numerous parameters. Several studies have analyzed over-parameterized linear models with high-dimensional data that may not be sparse; however, existing results depend on the independent setting of samples. In this study, we analyze a linear regression model with dependent time series data under over-parameterization settings. We consider an estimator via interpolation and developed a theory for excess risk of the estimator under multiple dependence types. This theory can treat infinite-dimensional data without sparsity and handle long-memory processes in a unified manner. Moreover, we bound the risk in our theory via the integrated covariance and nondegeneracy of autocorrelation matrices. The results show that the convergence rate of risks with short-memory processes is identical to that of cases with independent data, while long-memory processes slow the convergence rate. We also present several examples of specific dependent processes that can be applied to our setting.
In randomized experiments, the actual treatments received by some experimental units may differ from their treatment assignments. This non-compliance issue often occurs in clinical trials, social experiments, and the applications of randomized experiments in many other fields. Under certain assumptions, the average treatment effect for the compliers is identifiable and equal to the ratio of the intention-to-treat effects of the potential outcomes to that of the potential treatment received. To improve the estimation efficiency, we propose three model-assisted estimators for the complier average treatment effect in randomized experiments with a binary outcome. We study their asymptotic properties, compare their efficiencies with that of the Wald estimator, and propose the Neyman-type conservative variance estimators to facilitate valid inferences. Moreover, we extend our methods and theory to estimate the multiplicative complier average treatment effect. Our analysis is randomization-based, allowing the working models to be misspecified. Finally, we conduct simulation studies to illustrate the advantages of the model-assisted methods and apply these analysis methods in a randomized experiment to evaluate the effect of academic services or incentives on academic performance.
Automatic text summarization has experienced substantial progress in recent years. With this progress, the question has arisen whether the types of summaries that are typically generated by automatic summarization models align with users' needs. Ter Hoeve et al (2020) answer this question negatively. Amongst others, they recommend focusing on generating summaries with more graphical elements. This is in line with what we know from the psycholinguistics literature about how humans process text. Motivated from these two angles, we propose a new task: summarization with graphical elements, and we verify that these summaries are helpful for a critical mass of people. We collect a high quality human labeled dataset to support research into the task. We present a number of baseline methods that show that the task is interesting and challenging. Hence, with this work we hope to inspire a new line of research within the automatic summarization community.
The Accumulate Protocol ("Accumulate") is an identity-based, Delegated Proof of Stake (DPoS) blockchain designed to power the digital economy through interoperability with Layer-1 blockchains, integration with enterprise tech stacks, and interfacing with the World Wide Web. Accumulate bypasses the trilemma of security, scalability, and decentralization by implementing a chain-of-chains architecture in which digital identities with the ability to manage keys, tokens, data, and other identities are treated as their own independent blockchains. This architecture allows these identities, known as Accumulate Digital Identifiers (ADIs), to be processed and validated in parallel over the Accumulate network. Each ADI also possesses a hierarchical set of keys with different priority levels that allow users to manage their security over time and create complex signature authorization schemes that expand the utility of multi-signature transactions. A two token system provides predictable costs for enterprise users, while anchoring all transactions to Layer-1 blockchains provides enterprise-grade security to everyone.