Researchers have increasingly turned to crowdfunding platforms to gain insights into entrepreneurial activity and dynamics. While previous studies have explored various factors influencing crowdfunding success, such as technology, communication, and marketing strategies, the role of visual elements that can be automatically extracted from images has received less attention. This is surprising, considering that crowdfunding platforms emphasize the importance of attention-grabbing and high-resolution images, and previous research has shown that image characteristics can significantly impact product evaluations. Indeed, a comprehensive review of empirical articles (n = 202) that utilized Kickstarter data, focusing on the incorporation of visual information in their analyses. Our findings reveal that only 29.70% controlled for the number of images, and less than 12% considered any image details. In this manuscript, we review the literature on image processing and its relevance to the business domain, highlighting two types of visual variables: visual counts (number of pictures and number of videos) and image details. Building upon previous work that discussed the role of color, composition and figure-ground relationships, we introduce visual scene elements that have not yet been explored in crowdfunding, including the number of faces, the number of concepts depicted, and the ease of identifying those concepts. To demonstrate the predictive value of visual counts and image details, we analyze Kickstarter data. Our results highlight that visual count features are two of the top three predictors of success. Our results also show that simple image detail features such as color matter a lot, and our proposed measures of visual scene elements can also be useful. We supplement our article with R and Python codes that help authors extract image details (//osf.io/ujnzp/).
Developing computational models of neural response is crucial for understanding sensory processing and neural computations. Current state-of-the-art neural network methods use temporal filters to handle temporal dependencies, resulting in an unrealistic and inflexible processing paradigm. Meanwhile, these methods target trial-averaged firing rates and fail to capture important features in spike trains. This work presents the temporal conditioning spiking latent variable models (TeCoS-LVM) to simulate the neural response to natural visual stimuli. We use spiking neurons to produce spike outputs that directly match the recorded trains. This approach helps to avoid losing information embedded in the original spike trains. We exclude the temporal dimension from the model parameter space and introduce a temporal conditioning operation to allow the model to adaptively explore and exploit temporal dependencies in stimuli sequences in a {\it natural paradigm}. We show that TeCoS-LVM models can produce more realistic spike activities and accurately fit spike statistics than powerful alternatives. Additionally, learned TeCoS-LVM models can generalize well to longer time scales. Overall, while remaining computationally tractable, our model effectively captures key features of neural coding systems. It thus provides a useful tool for building accurate predictive computational accounts for various sensory perception circuits.
Thermal sensation is crucial to enhancing our comprehension of the world and enhancing our ability to interact with it. Therefore, the development of thermal sensation presentation technologies holds significant potential, providing a novel method of interaction. Traditional technologies often leave residual heat in the system or the skin, affecting subsequent presentations. Our study focuses on presenting thermal sensations with low residual heat, especially cold sensations. To mitigate the impact of residual heat in the presentation system, we opted for a non-contact method, and to address the influence of residual heat on the skin, we present thermal sensations without significantly altering skin temperature. Specifically, we integrated two highly responsive and independent heat transfer mechanisms: convection via cold air and radiation via visible light, providing non-contact thermal stimuli. By rapidly alternating between perceptible decreases and imperceptible increases in temperature on the same skin area, we maintained near-constant skin temperature while presenting continuous cold sensations. In our experiments involving 15 participants, we observed that when the cooling rate was -0.2 to -0.24 degree celsius per second and the cooling time ratio was 30 to 50 %, more than 86.67 % of the participants perceived only persistent cold without any warmth.
Rock skipping is a highly dynamic and relatively complex task that can easily be performed by humans. This project aims to bring rock skipping into a robotic setting, utilizing the lessons we learned in Robotic Manipulation. Specifically, this project implements a system consisting of a robotic arm and dynamic environment to perform rock skipping in simulation. By varying important parameters such as release velocity, we hope to use our system to gain insight into the most important factors for maximizing the total number of skips. In addition, by implementing the system in simulation, we have a more rigorous and precise testing approach over these varied test parameters. However, this project experienced some limitations due to gripping inefficiencies and problems with release height trajectories which is further discussed in our report.
We investigate the high-dimensional linear regression problem in the presence of noise correlated with Gaussian covariates. This correlation, known as endogeneity in regression models, often arises from unobserved variables and other factors. It has been a major challenge in causal inference and econometrics. When the covariates are high-dimensional, it has been common to assume sparsity on the true parameters and estimate them using regularization, even with the endogeneity. However, when sparsity does not hold, it has not been well understood to control the endogeneity and high dimensionality simultaneously. This study demonstrates that an estimator without regularization can achieve consistency, that is, benign overfitting, under certain assumptions on the covariance matrix. Specifically, our results show that the error of this estimator converges to zero when the covariance matrices of correlated noise and instrumental variables satisfy a condition on their eigenvalues. We consider several extensions relaxing these conditions and conduct experiments to support our theoretical findings. As a technical contribution, we utilize the convex Gaussian minimax theorem (CGMT) in our dual problem and extend CGMT itself.
With data-outsourcing becoming commonplace, there grows a need for secure outsourcing of data and machine learning models. Namely, data and model owners (client) often have a need for their information to remain private and secure against the potentially untrusted computing resource (server) to whom they want to outsource said data and models to. Various approaches to privacy-preserving machine learning (PPML) have been devised with different techniques and solutions introduced in the past. These solutions often involved one of two compromises: (1) client-server interactions to allow intermediary rounds of decryption and re-encryption of data or (2) complex architectures for multi-party computation. This paper devises a paradigm using Fully Homomorphic Encryption (FHE) that minimizes architectural complexity and removes client-side involvement during the training and prediction lifecycle of machine learning models. In addition, the paradigm proposed in this work achieves both model security as well as data security. To remove client-side involvement, the devised paradigm proposes a no decryption approach that allows the server to handle PPML in its entirety without rounds of decryption and re-encryption. To the best of our knowledge, this paradigm is the first to achieve privacy-preserving decision tree training with no decryption while maintaining a simple client-server architecture.
Deep Learning has already been successfully applied to analyze industrial sensor data in a variety of relevant use cases. However, the opaque nature of many well-performing methods poses a major obstacle for real-world deployment. Explainable AI (XAI) and especially feature attribution techniques promise to enable insights about how such models form their decision. But the plain application of such methods often fails to provide truly informative and problem-specific insights to domain experts. In this work, we focus on the specific task of detecting faults in rolling element bearings from vibration signals. We propose a novel and domain-specific feature attribution framework that allows us to evaluate how well the underlying logic of a model corresponds with expert reasoning. Utilizing the framework we are able to validate the trustworthiness and to successfully anticipate the generalization ability of different well-performing deep learning models. Our methodology demonstrates how signal processing tools can effectively be used to enhance Explainable AI techniques and acts as a template for similar problems.
Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.