RESUMO
Over the past decades, massive Electronic Health Records (EHRs) have been accumulated in Intensive Care Unit (ICU) and many other healthcare scenarios. The rich and comprehensive information recorded presents an exceptional opportunity for patient outcome predictions. Nevertheless, due to the diversity of data modalities, EHRs exhibit a heterogeneous characteristic, raising a difficulty to organically leverage information from various modalities. It is an urgent need to capture the underlying correlations among different modalities. In this paper, we propose a novel framework named Multimodal Fusion Network (MFNet) for ICU patient outcome prediction. First, we incorporate multiple modality-specific encoders to learn different modality representations. Notably, a graph guided encoder is designed to capture underlying global relationships among medical codes, and a text encoder with pre-fine-tuning strategy is adopted to extract appropriate text representations. Second, we propose to pairwise merge multimodal representations with a tailored hierarchical fusion mechanism. The experiments conducted on the eICU-CRD dataset validate that MFNet achieves superior performance on mortality prediction and Length of Stay (LoS) prediction compared with various representative and state-of-the-art baselines. Moreover, comprehensive ablation study demonstrates the effectiveness of each component of MFNet.
RESUMO
Developing novel predictive models with complex biomedical information is challenging due to various idiosyncrasies related to heterogeneity, standardization or sparseness of the data. We previously introduced a person-centric ontology to organize information about individual patients, and a representation learning framework to extract person-centric knowledge graphs (PKGs) and to train Graph Neural Networks (GNNs). In this paper, we propose a systematic approach to examine the results of GNN models trained with both structured and unstructured information from the MIMIC-III dataset. Through ablation studies on different clinical, demographic, and social data, we show the robustness of this approach in identifying predictive features in PKGs for the task of readmission prediction.
Assuntos
Redes Neurais de Computação , Humanos , Readmissão do PacienteRESUMO
The inclusion of patient representatives as study consultants brings diverse perspectives, insights, and experiences to clinical trial design and execution, and their role in the clinical trial development process is being increasingly recognized and valued. The APPETIZE study evaluated the palatability of, and preference for, three potassium binders for treating hyperkalemia in patients with chronic kidney disease. A core aspect of the development of this study was the inclusion of a patient representative during the design stage. Here, I describe the process of patient involvement in the APPETIZE study design (ClinicalTrials.gov Identifier: NCT04566653), the resultant positive impacts, and key learnings. A patient with chronic kidney disease was invited to be a member of the APPETIZE trial design team. This patient representative attended study team meetings and provided invaluable input into protocol development, questionnaire selection, design of patient information sheets and consent forms, and primary manuscript structure. These critical insights resulted in an enhanced trial design and generation of high-quality, patient-relevant data. APPETIZE provides an excellent example of a patient preference study that relied on input from multiple stakeholder groups, including, most notably, the patients themselves. This approach may serve as a model for early and deep patient engagement in the design and interpretation of clinical trials.
RESUMO
With the rapid growth and widespread application of electronic health records (EHRs), similar patient retrieval has become an important task for downstream clinical decision support such as diagnostic reference, treatment planning, etc. However, the high dimensionality, large volume, and heterogeneity of EHRs pose challenges to the efficient and accurate retrieval of patients with similar medical conditions to the current case. Several previous studies have attempted to alleviate these issues by using hash coding techniques, improving retrieval efficiency but merely exploring underlying characteristics among instances to preserve retrieval accuracy. In this paper, drug categories of instances recorded in EHRs are regarded as the ground truth to determine the pairwise similarity, and we consider the abundant semantic information within such multi-labels and propose a novel framework named Graph-guided Deep Hashing Networks (GDHN). To capture correlation dependencies among the multi-labels, we first construct a label graph where each node represents a drug category, then a graph convolution network (GCN) is employed to derive the multi-label embedding of each instance. Thus, we can utilize the learned multi-label embeddings to guide the patient hashing process to obtain more informative and discriminative hash codes. Extensive experiments have been conducted on two datasets, including a real-world dataset concerning IgA nephropathy from Peking University First Hospital, and a publicly available dataset from MIMIC-III, compared with traditional hashing methods and state-of-the-art deep hashing methods using three evaluation metrics. The results demonstrate that GDHN outperforms the competitors at different hash code lengths, validating the superiority of our proposal.
Assuntos
Benchmarking , Registros Eletrônicos de Saúde , Humanos , Aprendizagem , SemânticaRESUMO
Patient representation learning aims to encode meaningful information about the patient's Electronic Health Records (EHR) in the form of a mathematical representation. Recent advances in deep learning have empowered Patient representation learning methods with greater representational power, allowing the learned representations to significantly improve the performance of disease prediction models. However, the inherent shortcomings of deep learning models, such as the need for massive amounts of labeled data and inexplicability, limit the performance of deep learning-based Patient representation learning methods to further improvements. In particular, learning robust patient representations is challenging when patient data is missing or insufficient. Although data augmentation techniques can tackle this deficiency, the complex data processing further weakens the inexplicability of patient representation learning models. To address the above challenges, this paper proposes an Explainable and Augmented Patient Representation Learning for disease prediction (EAPR). EAPR utilizes data augmentation controlled by confidence interval to enhance patient representation in the presence of limited patient data. Moreover, EAPR proposes to use two-stage gradient backpropagation to address the problem of unexplainable patient representation learning models due to the complex data enhancement process. The experimental results on real clinical data validate the effectiveness and explainability of the proposed approach.
RESUMO
BACKGROUND: The role of patient participation and representation during crises, such as the COVID-19 pandemic, has been under-researched. Existing studies paint a pessimistic picture of patient representation during the pandemic. However, there are indications that patient representatives have adapted to the new situation and can contribute to the resilience of healthcare systems. This paper aims to further explore the potential contribution of patient representatives for healthcare system resilience during the COVID-19 pandemic. METHODS: The study used a qualitative approach. We conducted a thematic analysis on the following data: interviews with client council members (n = 32) and representatives from patient organizations (n = 6) and focus groups (n = 2) to investigate patient representation on both the national policy level and organizational level in the Netherlands. RESULTS: We identified the crisis discourse, the dependent position, the diversity of patient perspectives and the layered decision-making structure as themes that help to understand what made patient representation in pandemic times a struggle for national and local patient representatives. The analysis of the subjects these representatives put forward during decision-making shows that their input can play an important role in broadening discussions, challenging decisions, and suggesting alternatives during a crisis. We identified several strategies (e.g., collaborating with other actors, proactively putting subjects on the policy agenda, finding new ways of contacting their 'constituency') used by the patient representatives studied to exert influence despite the difficulties encountered. CONCLUSIONS: The struggle for patient representation during pandemic decision-making is a missed opportunity for resilient healthcare systems as these representatives can play a role in opening up discussions and putting different perspectives to the fore. Moreover, the adaptive strategies used by representatives to influence decision-making offer lessons for future representation activities. However, adaptations to the crisis decision-making structure are also needed to enable patient representatives to play their role. PATIENT CONTRIBUTION: We conducted interviews with patient representatives and discussed our preliminary findings with patient representatives during the focus groups. Zorgbelang, a patient organization supporting client councils and enabling and organizing patient participation for organizations and municipalities, was partner in this research and contributed to the interview guide, conducting interviews and focus groups. Additionally, the analysis made by the first author was discussed and refined multiple times with the partners of Zorgbelang and one of them co-authored this paper.
RESUMO
OBJECTIVE: Describe the demographic profile of US participants in Amgen clinical trials over a 10-year period and variations across therapeutic areas, indications, and geographies. METHODS: Cross-sectional retrospective study including participants enrolled (2005-2020) in phase 1-3 trials completed between January 1, 2012 and June 30, 2021. RESULTS: Among 31,619 participants enrolled across 258 trials, one-fifth represented racial minority populations (Asian, 3%; Black or African American, 17%; American Indian or Alaska Native, Native Hawaiian or Other Pacific Islander, multiracial, each < 1%); fewer than one-fifth (16%) represented an ethnic minority population (Hispanic or Latino). Compared with census data, representation of racial and ethnic groups varied across US states. Across most therapeutic areas (bone, cardiovascular, hematology/oncology, inflammation, metabolic disorders, neuroscience) except nephrology, participants were predominantly White (72-81%). A similar proportion of males and females were enrolled between 2005 and 2016; male representation was disproportionately higher than female between 2016 and 2020. Across most medical indications, the majority of participants were 18-65 years of age. CONCLUSIONS AND RELEVANCE: While the clinical research community is striving to achieve diversity and proportional representation across clinical trials, certain populations remain underrepresented. Our data provide a baseline assessment of the diversity and representation of US participants in Amgen-sponsored clinical trials and add to a growing body of evidence on the importance of diversity in clinical research. These data provide a foundation for strategies aimed at supporting more equitable and representative research, and a baseline from which to assess the impact of future strategies to advance health equity.
RESUMO
The availability of large-scale electronic health record datasets has led to the development of artificial intelligence (AI) methods for clinical risk prediction that help improve patient care. However, existing studies have shown that AI models suffer from severe performance decay after several years of deployment, which might be caused by various temporal dataset shifts. When the shift occurs, we have access to large-scale pre-shift data and small-scale post-shift data that are not enough to train new models in the post-shift environment. In this study, we propose a new method to address the issue. We reweight patients from the pre-shift environment to mitigate the distribution shift between pre- and post-shift environments. Moreover, we adopt a Kullback-Leibler divergence loss to force the models to learn similar patient representations in pre- and post-shift environments. Our experimental results show that our model efficiently mitigates temporal shifts, improving prediction performance.
RESUMO
OBJECTIVE: To represent a patient record with both time-invariant and time-varying features as a single vector using an end-to-end deep learning model, and further to predict the kidney failure (KF) status and mortality of heart failure (HF) patients. MATERIALS AND METHODS: The time-invariant EMR data included demographic information and comorbidities, and the time-varying EMR data were lab tests. We used a Transformer encoder module to represent the time-invariant data, and refined a long short-term memory (LSTM) with a Transformer encoder attached to the top to represent the time-varying data, taking the original measured values and their corresponding embedding vectors, masking vectors, and two types of time intervals as inputs. The proposed representations of patients with time-invariant and time-varying data were used to predict KF status (949 out of 5268 HF patients diagnosed with KF) and mortality (463 in-hospital deaths) for HF patients. Comparative experiments were conducted between the proposed model and some representative machine learning models. Ablation experiments were also performed around the time-varying data representation, including replacing the refined LSTM with the standard LSTM, GRU-D and T-LSTM, respectively, and removing the Transformer encoder and the time-varying data representation module, respectively. The visualization of the attention weights of the time-invariant and time-varying features was used to clinically interpret the predictive performance. We used the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), and the F1-score to evaluate the predictive performance of the models. RESULTS: The proposed model achieved superior performance, with average AUROCs, AUPRCs and F1-scores of 0.960, 0.610 and 0.759 for KF prediction and 0.937, 0.353 and 0.537 for mortality prediction, respectively. Predictive performance improved with the addition of time-varying data from longer time periods. The proposed model outperformed the comparison and ablation references in both prediction tasks. CONCLUSIONS: Both time-invariant and time-varying EMR data of patients could be efficiently represented by the proposed unified deep learning model, which shows higher performance in clinical prediction tasks. The way to use time-varying data in the current study is hopeful to be used in other kinds of time-varying data and other clinical tasks.
Assuntos
Insuficiência Cardíaca , Aprendizado de Máquina , Humanos , Pacientes , Comorbidade , Prognóstico , Insuficiência Cardíaca/diagnósticoRESUMO
BACKGROUND: Nowadays, patients with chronic diseases such as diabetes and hypertension have reached alarming numbers worldwide. These diseases increase the risk of developing acute complications and involve a substantial economic burden and demand for health resources. The widespread adoption of Electronic Health Records (EHRs) is opening great opportunities for supporting decision-making. Nevertheless, data extracted from EHRs are complex (heterogeneous, high-dimensional and usually noisy), hampering the knowledge extraction with conventional approaches. METHODS: We propose the use of the Denoising Autoencoder (DAE), a Machine Learning (ML) technique allowing to transform high-dimensional data into latent representations (LRs), thus addressing the main challenges with clinical data. We explore in this work how the combination of LRs with a visualization method can be used to map the patient data in a two-dimensional space, gaining knowledge about the distribution of patients with different chronic conditions. Furthermore, this representation can be also used to characterize the patient's health status evolution, which is of paramount importance in the clinical setting. RESULTS: To obtain clinical LRs, we considered real-world data extracted from EHRs linked to the University Hospital of Fuenlabrada in Spain. Experimental results showed the great potential of DAEs to identify patients with clinical patterns linked to hypertension, diabetes and multimorbidity. The procedure allowed us to find patients with the same main chronic disease but different clinical characteristics. Thus, we identified two kinds of diabetic patients with differences in their drug therapy (insulin and non-insulin dependant), and also a group of women affected by hypertension and gestational diabetes. We also present a proof of concept for mapping the health status evolution of synthetic patients when considering the most significant diagnoses and drugs associated with chronic patients. CONCLUSION: Our results highlighted the value of ML techniques to extract clinical knowledge, supporting the identification of patients with certain chronic conditions. Furthermore, the patient's health status progression on the two-dimensional space might be used as a tool for clinicians aiming to characterize health conditions and identify their more relevant clinical codes.
RESUMO
Parliamentary representation of persons living with dementia is important because it has the potential to enhance the lives of affected individuals. However, studies that take a closer look at the parliamentarization of dementia are few and far between. Drawing on recent advances in political science representation theory, this paper shows how parliamentary attention to persons living with dementia takes the form of representative claims. An interpretive analysis of 56 parliamentary documents revealed that German parliamentarians voiced what they considered to be the political wants and needs of persons living with dementia in different debates on a broad range of topics. They also created a political presence for persons living with dementia in representative claims about other groups - such as professional carers. Both types of claims were one-dimensional in nature: parliamentarians reduced persons living with dementia to their impairments, rendering them politically visible as vulnerable persons who need to be protected by the same society that their existence is putting a "burden" on. A more multi-dimensional political presence for persons living with dementia is required. In order to improve their dementia representation work, it is important that parliamentarians engage with persons living with dementia and understand how they wish to be engagedin parliament or otherwise.
Assuntos
Cuidadores , Demência , Humanos , Política , Pesquisa QualitativaRESUMO
There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient's data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.
Assuntos
Aprendizado Profundo , Humanos , Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
BACKGROUND: Patient advocacy organizations (PAOs) have an increasing influence on health policy and biomedical research, therefore, questions about the specific character of their responsibility arise: Can PAOs bear moral responsibility and, if so, to whom are they responsible, for what and on which normative basis? Although the concept of responsibility in healthcare is strongly discussed, PAOs particularly have rarely been systematically analyzed as morally responsible agents. The aim of the current paper is to analyze the character of PAOs' responsibility to provide guidance to themselves and to other stakeholders in healthcare. METHODS: Responsibility is presented as a concept with four reference points: (1) The subject, (2) the object, (3) the addressee and (4) the underlying normative standard. This four-point relationship is applied to PAOs and the dimensions of collectivity and prospectivity are analyzed in each reference point. RESULTS: Understood as collectives, PAOs are, in principle, capable of intentionality and able to act and, thus, fulfill one prerequisite for the attribution of moral responsibility. Given their common mission to represent those affected, PAOs can be seen as responsible for patients' representation and advocacy, primarily towards a certain group but secondarily in a broader social context. Various legal and political statements and the bioethical principles of justice, beneficence and empowerment can be used as a normative basis for attributing responsibility to PAOs. CONCLUSIONS: The understanding of responsibility as a four-point relation incorporating collective and forward-looking dimensions helps one to understand the PAOs' roles and responsibilities better. The analysis, thus, provides a basis for the debate about PAOs' contribution and cooperation in the healthcare sector.
Assuntos
Análise Ética , Defesa do Paciente , Beneficência , Humanos , Organizações , Justiça Social , Responsabilidade SocialRESUMO
BACKGROUND: The secondary use of structured electronic medical record (sEMR) data has become a challenge due to the diversity, sparsity, and high dimensionality of the data representation. Constructing an effective representation for sEMR data is becoming more and more crucial for subsequent data applications. OBJECTIVE: We aimed to apply the embedding technique used in the natural language processing domain for the sEMR data representation and to explore the feasibility and superiority of the embedding-based feature and patient representations in clinical application. METHODS: The entire training corpus consisted of records of 104,752 hospitalized patients with 13,757 medical concepts of disease diagnoses, physical examinations and procedures, laboratory tests, medications, etc. Each medical concept was embedded into a 200-dimensional real number vector using the Skip-gram algorithm with some adaptive changes from shuffling the medical concepts in a record 20 times. The average of vectors for all medical concepts in a patient record represented the patient. For embedding-based feature representation evaluation, we used the cosine similarities among the medical concept vectors to capture the latent clinical associations among the medical concepts. We further conducted a clustering analysis on stroke patients to evaluate and compare the embedding-based patient representations. The Hopkins statistic, Silhouette index (SI), and Davies-Bouldin index were used for the unsupervised evaluation, and the precision, recall, and F1 score were used for the supervised evaluation. RESULTS: The dimension of patient representation was reduced from 13,757 to 200 using the embedding-based representation. The average cosine similarity of the selected disease (subarachnoid hemorrhage) and its 15 clinically relevant medical concepts was 0.973. Stroke patients were clustered into two clusters with the highest SI (0.852). Clustering analyses conducted on patients with the embedding representations showed higher applicability (Hopkins statistic 0.931), higher aggregation (SI 0.862), and lower dispersion (Davies-Bouldin index 0.551) than those conducted on patients with reference representation methods. The clustering solutions for patients with the embedding-based representation achieved the highest F1 scores of 0.944 and 0.717 for two clusters. CONCLUSIONS: The feature-level embedding-based representations can reflect the potential clinical associations among medical concepts effectively. The patient-level embedding-based representation is easy to use as continuous input to standard machine learning algorithms and can bring performance improvements. It is expected that the embedding-based representation will be helpful in a wide range of secondary uses of sEMR data.
RESUMO
OBJECTIVES: To measure the accuracy (trueness and precision) of a facial scanner depending on the alignment method and the digitized surface area location. METHODS: Fourteen markers were adhered on a head mannequin and digitized using an industrial scanner (GOM Atos Q 3D 12â¯M; Carl Zeiss Industrielle Messtechnik GmbH). A control mesh was acquired. Subsequently, the mannequin was digitized using a facial scanner (Arc4; Bellus3D) (nâ¯=â¯30). The control mesh was delineated into 10 areas. Based on the alignment procedures, two groups were created: reference best fit (RBF group) and landmark-based best fit (LA group). The root mean square was used to calculate the discrepancy between the control mesh and each facial scan. A 2-way ANOVA and Tukey pairwise comparison tests were used to compare trueness and precision between the 2 groups across 10 areas (αâ¯=â¯.05). RESULTS: Both alignment algorithms (Pâ¯=â¯.007) and digitized area (Pâ¯<â¯.001) were significant predictors of trueness with a significant interaction between the two predictors (F (9, 580) =25.13, Pâ¯<â¯.001). Tukey pairwise comparison showed that there was a significant difference between mean trueness values of RBF (mean=0.53 mm) and LA (mean=0.55 mm) groups. Moreover, a significant difference was detected among the trueness values across surface areas. The A9-area (left tragus area) had the highest and A5-area (right cheek area) had the lowest mean trueness. Both alignment algorithm (Pâ¯<â¯.001) and digitized surface area (Pâ¯<â¯.001) were significant predictors of precision with a significant interaction between the two predictors (F (9, 580) =14.34, Pâ¯<â¯.001). Tukey pairwise comparison showed that there was a significant difference between mean precision values of RBF (mean=0.38 mm) and LA (mean=0.35 mm) groups. Moreover, a significant difference was detected among the precision values across surface areas. Comparing the surface areas, A9-area had the highest and A10-area (forehead area) had the lowest mean precision. CONCLUSIONS: Alignment procedures influenced on the scanning trueness and precision mean values, but the facial scanner accuracy values obtained were within the clinically acceptable accuracy threshold of less or equal than 2â¯mm. Furthermore, the scanning accuracy (for both trueness and precision) depended on the location of the scanned surface area, being more accurate on the middle of the face than on the sides of the face.
Assuntos
Técnica de Moldagem Odontológica , Modelos Dentários , Algoritmos , Desenho Assistido por Computador , Imageamento TridimensionalRESUMO
For a patient to be effective as a "patient representative" within a health-related organization, work and more than just accepting an honorific title is required. I argue that for a patient to be most effective as a patient representative requires different types of background knowledge and commitment than being a "patient advocate". Patients need to be cautious about how, when, and where they take on an official role of either an "advocate" or "representative", if they truly want to be a positive influence on health outcomes.
Assuntos
Defesa do Paciente , HumanosRESUMO
The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high performance across the majority of phenotypes. Most important, the multi-task pre-training is almost always either the best-performing model or performs tolerably close to the best-performing model, a property we refer to as robust. All these results lead us to conclude that this multi-task transfer learning architecture is a robust approach for developing generalized and transferable patient language representations for numerous phenotypes.
Assuntos
Idioma , Processamento de Linguagem Natural , HumanosRESUMO
OBJECTIVES: Patient representation learning refers to learning a dense mathematical representation of a patient that encodes meaningful information from Electronic Health Records (EHRs). This is generally performed using advanced deep learning methods. This study presents a systematic review of this field and provides both qualitative and quantitative analyses from a methodological perspective. METHODS: We identified studies developing patient representations from EHRs with deep learning methods from MEDLINE, EMBASE, Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. After screening 363 articles, 49 papers were included for a comprehensive data collection. RESULTS: Publications developing patient representations almost doubled each year from 2015 until 2019. We noticed a typical workflow starting with feeding raw data, applying deep learning models, and ending with clinical outcome predictions as evaluations of the learned representations. Specifically, learning representations from structured EHR data was dominant (37 out of 49 studies). Recurrent Neural Networks were widely applied as the deep learning architecture (Long short-term memory: 13 studies, Gated recurrent unit: 11 studies). Learning was mainly performed in a supervised manner (30 studies) optimized with cross-entropy loss. Disease prediction was the most common application and evaluation (31 studies). Benchmark datasets were mostly unavailable (28 studies) due to privacy concerns of EHR data, and code availability was assured in 20 studies. DISCUSSION & CONCLUSION: The existing predictive models mainly focus on the prediction of single diseases, rather than considering the complex mechanisms of patients from a holistic review. We show the importance and feasibility of learning comprehensive representations of patient EHR data through a systematic review. Advances in patient representation learning techniques will be essential for powering patient-level EHR analyses. Future work will still be devoted to leveraging the richness and potential of available EHR data. Reproducibility and transparency of reported results will hopefully improve. Knowledge distillation and advanced learning techniques will be exploited to assist the capability of learning patient representation further.
Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Humanos , Redes Neurais de Computação , Prognóstico , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series. METHODS: In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination. RESULTS: By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities. CONCLUSION: Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect.