RESUMO
BACKGROUND: Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions. METHODS: In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives. RESULTS: Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures. CONCLUSIONS: Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.
Assuntos
Benchmarking , Genômica , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/mortalidade , Análise de Sobrevida , Prognóstico , MultiômicaRESUMO
Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database 'The Cancer Genome Atlas' (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno's C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups-especially clinical variables-from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact:moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.
Assuntos
Benchmarking , Feminino , Humanos , Aprendizado de Máquina , Masculino , Neoplasias/genética , Neoplasias/patologia , Modelos de Riscos Proporcionais , Análise de SobrevidaRESUMO
Arterial blood pressure is one of the vital signs monitored mandatory in anaesthetised patients. Even short episodes of intraoperative hypotension are associated with increased risk for postoperative organ dysfunction such as acute kidney injury and myocardial injury. Since there is little evidence whether higher alarm thresholds in patient monitors can help prevent intraoperative hypotension, we analysed the blood pressure data before (group 1) and after (group 2) the implementation of altered hypotension alarm settings. The study was conducted as a retrospective observational cohort study in a large surgical centre with 32 operating theatres. Alarm thresholds for hypotension alarm for mean arterial pressure (MAP) were altered from 60 (before) to 65 mmHg for invasive measurement and 70 mmHg for noninvasive measurement. Blood pressure data from electronic anaesthesia records of 4222 patients (1982 and 2240 in group 1 and 2, respectively) with 406,623 blood pressure values undergoing noncardiac surgery were included. We analysed (A) the proportion of blood pressure measurements below the threshold among all measurements by quasi-binomial regression and (B) whether at least one blood pressure measurement below the threshold occurred by logistic regression. Hypotension was defined as MAP < 65 mmHg. There was no significant difference in overall proportions of hypotensive episodes for mean arterial pressure before and after the adjustment of alarm settings (mean proportion of values below 65 mmHg were 6.05% in group 1 and 5.99% in group 2). The risk of ever experiencing a hypotensive episode during anaesthesia was significantly lower in group 2 with an odds ratio of 0.84 (p = 0.029). In conclusion, higher alarm thresholds do not generally lead to less hypotensive episodes perioperatively. There was a slight but significant reduction of the occurrence of intraoperative hypotension in the presence of higher thresholds for blood pressure alarms. However, this reduction only seems to be present in patients with very few hypotensive episodes.
Assuntos
Pressão Arterial , Hipotensão , Humanos , Pressão Arterial/fisiologia , Estudos Retrospectivos , Complicações Pós-Operatórias/diagnóstico , Monitorização Intraoperatória/efeitos adversos , Hipotensão/diagnóstico , Hipotensão/etiologia , Estudos de Coortes , Pressão SanguíneaRESUMO
BACKGROUND: In the last few years, multi-omics data, that is, datasets containing different types of high-dimensional molecular variables for the same samples, have become increasingly available. To date, several comparison studies focused on feature selection methods for omics data, but to our knowledge, none compared these methods for the special case of multi-omics data. Given that these data have specific structures that differentiate them from single-omics data, it is unclear whether different feature selection strategies may be optimal for such data. In this paper, using 15 cancer multi-omics datasets we compared four filter methods, two embedded methods, and two wrapper methods with respect to their performance in the prediction of a binary outcome in several situations that may affect the prediction results. As classifiers, we used support vector machines and random forests. The methods were compared using repeated fivefold cross-validation. The accuracy, the AUC, and the Brier score served as performance metrics. RESULTS: The results suggested that, first, the chosen number of selected features affects the predictive performance for many feature selection methods but not all. Second, whether the features were selected by data type or from all data types concurrently did not considerably affect the predictive performance, but for some methods, concurrent selection took more time. Third, regardless of which performance measure was considered, the feature selection methods mRMR, the permutation importance of random forests, and the Lasso tended to outperform the other considered methods. Here, mRMR and the permutation importance of random forests already delivered strong predictive performance when considering only a few selected features. Finally, the wrapper methods were computationally much more expensive than the filter and embedded methods. CONCLUSIONS: We recommend the permutation importance of random forests and the filter method mRMR for feature selection using multi-omics data, where, however, mRMR is considerably more computationally costly.
Assuntos
Benchmarking , Neoplasias , Humanos , Neoplasias/genética , Máquina de Vetores de SuporteRESUMO
BACKGROUND: Spontaneous bacterial peritonitis (SBP) is a serious complication in patients with liver cirrhosis. In recent years, it has been postulated that the rate of multidrug-resistant organisms (MDROs) is increasing, especially in nosocomial SBP patients. Aim of the present work was to investigate this hypothesis and its possible clinical consequences. MATERIALS AND METHODS: One hundred and three culture-positive patients between 2007 and 2014 were compared with 81 patients between 2015 and 2017, to study the change of microbiological profiles and their clinical consequences. The cirrhosis patients with bacterascites requiring treatment were included as well. RESULTS: The most prevalent Gram-negative bacteria isolated from ascites were Enterobacterales (31.6%) and in Gram-positive pathogens Staphylococci (22.8%). There was a significant increase in MDROs (22.3% ICU 40.7%, P = .048), accompanied by an increased incidence of sepsis (from 21.4% to 37.0%, P = .021), hepatorenal syndrome (from 40.8% to 58.0%, P = .007) and the need of catecholamine therapy (from 21.4% to 38.8%, P = .036). Nosocomial origin correlated with higher MDRO proportion, more complications and lower antimicrobial susceptibility rates in 12 commonly used antibiotics. MDROs were confirmed as an isolated predictor for inpatient mortality and complications in multivariable logistic regression. CONCLUSIONS: The feeling in clinical practice that MDROs have increased in the last 11 years could be confirmed in our study in Munich, Germany. Nosocomial SBP correlated with significantly higher MDRO rates (nearly 50%) and complication rates. In our opinion, an antibiotic combination with comprehensive effect should be taken into account in nosocomial SBP patients in this region.
Assuntos
Infecções Bacterianas/microbiologia , Infecção Hospitalar/microbiologia , Farmacorresistência Bacteriana Múltipla , Peritonite/microbiologia , Sepse/microbiologia , Idoso , Ascite/epidemiologia , Ascite/microbiologia , Infecções Bacterianas/epidemiologia , Translocação Bacteriana , Catecolaminas/uso terapêutico , Infecção Hospitalar/epidemiologia , Infecções por Enterobacteriaceae/epidemiologia , Infecções por Enterobacteriaceae/microbiologia , Enterococcus , Feminino , Alemanha/epidemiologia , Infecções por Bactérias Gram-Negativas/epidemiologia , Infecções por Bactérias Gram-Negativas/microbiologia , Infecções por Bactérias Gram-Positivas/epidemiologia , Infecções por Bactérias Gram-Positivas/microbiologia , Síndrome Hepatorrenal/epidemiologia , Mortalidade Hospitalar , Humanos , Cirrose Hepática/epidemiologia , Masculino , Testes de Sensibilidade Microbiana , Pessoa de Meia-Idade , Peritonite/epidemiologia , Terapia de Substituição Renal , Respiração Artificial/estatística & dados numéricos , Estudos Retrospectivos , Sepse/epidemiologia , Infecções Estafilocócicas/epidemiologia , Infecções Estafilocócicas/microbiologia , Infecções Estreptocócicas/epidemiologia , Infecções Estreptocócicas/microbiologia , Vasoconstritores/uso terapêuticoRESUMO
BACKGROUND: In the last years more and more multi-omics data are becoming available, that is, data featuring measurements of several types of omics data for each patient. Using multi-omics data as covariate data in outcome prediction is both promising and challenging due to the complex structure of such data. Random forest is a prediction method known for its ability to render complex dependency patterns between the outcome and the covariates. Against this background we developed five candidate random forest variants tailored to multi-omics covariate data. These variants modify the split point selection of random forest to incorporate the block structure of multi-omics data and can be applied to any outcome type for which a random forest variant exists, such as categorical, continuous and survival outcomes. Using 20 publicly available multi-omics data sets with survival outcome we compared the prediction performances of the block forest variants with alternatives. We also considered the common special case of having clinical covariates and measurements of a single omics data type available. RESULTS: We identify one variant termed "block forest" that outperformed all other approaches in the comparison study. In particular, it performed significantly better than standard random survival forest (adjusted p-value: 0.027). The two best performing variants have in common that the block choice is randomized in the split point selection procedure. In the case of having clinical covariates and a single omics data type available, the improvements of the variants over random survival forest were larger than in the case of the multi-omics data. The degrees of improvements over random survival forest varied strongly across data sets. Moreover, considering all clinical covariates mandatorily improved the performance. This result should however be interpreted with caution, because the level of predictive information contained in clinical covariates depends on the specific application. CONCLUSIONS: The new prediction method block forest for multi-omics data can significantly improve the prediction performance of random forest and outperformed alternatives in the comparison. Block forest is particularly effective for the special case of using clinical covariates in combination with measurements of a single omics data type.
Assuntos
Aprendizado de Máquina , Genômica , Humanos , Análise de SobrevidaRESUMO
Ideally, prediction rules should be published in such a way that readers may apply them, for example, to make predictions for their own data. While this is straightforward for simple prediction rules, such as those based on the logistic regression model, this is much more difficult for complex prediction rules derived by machine learning tools. We conducted a survey of articles reporting prediction rules that were constructed using the random forest algorithm and published in PLOS ONE in 2014-2015 in the field "medical and health sciences", with the aim of identifying issues related to their applicability. Making a prediction rule reproducible is a possible way to ensure that it is applicable; thus reproducibility is also examined in our survey. The presented prediction rules were applicable in only 2 of 30 identified papers, while for further eight prediction rules it was possible to obtain the necessary information by contacting the authors. Various problems, such as nonresponse of the authors, hampered the applicability of prediction rules in the other cases. Based on our experiences from this illustrative survey, we formulate a set of recommendations for authors who aim to make complex prediction rules applicable for readers. All data including the description of the considered studies and analysis codes are available as supplementary materials.
Assuntos
Biometria/métodos , Medicina , Ciência , SoftwareRESUMO
BACKGROUND: The inclusion of high-dimensional omics data in prediction models has become a well-studied topic in the last decades. Although most of these methods do not account for possibly different types of variables in the set of covariates available in the same dataset, there are many such scenarios where the variables can be structured in blocks of different types, e.g., clinical, transcriptomic, and methylation data. To date, there exist a few computationally intensive approaches that make use of block structures of this kind. RESULTS: In this paper we present priority-Lasso, an intuitive and practical analysis strategy for building prediction models based on Lasso that takes such block structures into account. It requires the definition of a priority order of blocks of data. Lasso models are calculated successively for every block and the fitted values of every step are included as an offset in the fit of the next step. We apply priority-Lasso in different settings on an acute myeloid leukemia (AML) dataset consisting of clinical variables, cytogenetics, gene mutations and expression variables, and compare its performance on an independent validation dataset to the performance of standard Lasso models. CONCLUSION: The results show that priority-Lasso is able to keep pace with Lasso in terms of prediction accuracy. Variables of blocks with higher priorities are favored over variables of blocks with lower priority, which results in easily usable and transportable models for clinical practice.
Assuntos
Genômica/métodos , Software , Humanos , Estimativa de Kaplan-Meier , Leucemia Mieloide Aguda/genética , Reprodutibilidade dos Testes , Fatores de Risco , Resultado do TratamentoRESUMO
Motivation: To date most medical tests derived by applying classification methods to high-dimensional molecular data are hardly used in clinical practice. This is partly because the prediction error resulting when applying them to external data is usually much higher than internal error as evaluated through within-study validation procedures. We suggest the use of addon normalization and addon batch effect removal techniques in this context to reduce systematic differences between external data and the original dataset with the aim to improve prediction performance. Results: We evaluate the impact of addon normalization and seven batch effect removal methods on cross-study prediction performance for several common classifiers using a large collection of microarray gene expression datasets, showing that some of these techniques reduce prediction error. Availability and Implementation: All investigated addon methods are implemented in our R package bapred. Contact: hornung@ibe.med.uni-muenchen.de. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Valor Preditivo dos Testes , Projetos de Pesquisa , Algoritmos , Conjuntos de Dados como Assunto , Humanos , Análise de Sequência de RNARESUMO
BACKGROUND: In patients undergoing general anaesthesia, intraoperative hypotension occurs frequently and is associated with adverse outcomes such as postoperative acute kidney failure, myocardial infarction or stroke. A history of chronic hypertension renders patients more susceptible to a decrease in blood pressure (BP) after induction of general anaesthesia. As a patient's BP is generally monitored intermittently via an upper arm cuff, there may be a delay in the detection of hypotension by the anaesthetist. OBJECTIVE: The current study investigates whether the presence of continuous BP monitoring leads to improved BP stability. DESIGN: Randomised, controlled and single-centre study. PATIENTS: A total of 160 orthopaedic patients undergoing general anaesthesia with a history of chronic hypertension. INTERVENTION: The patients were randomised to either a study group (nâ=â77) that received continuous non-invasive BP monitoring in addition to oscillometric intermittent monitoring, or a control group (nâ=â83) whose BP was monitored intermittently only. The interval for oscillometric measurements in both groups was set to 3âmin. After induction of general anaesthesia, oscillometric BP values of the two groups were compared for the first hour of the procedure. Anaesthetists were blinded to the purpose of the study. MAIN OUTCOME MEASURE: BP stability and hypotensive events. RESULTS: There was no difference in baseline BP between the groups. After adjustment for multiple testing, mean arterial BP in the study group was significantly higher than in the control group at 12 and 15âmin. Meanâ±âSD for study and control group, respectively were: 12âmin, 102â±â24 vs. 90â±â26âmmHg (Pâ=â0.039) and 15âmin, 102â±â21 vs. 90â±â23âmmHg (Pâ=â0.023). Hypotensive readings below a mean pressure of 55âmmHg occurred more often in the control group (25 vs. 7, Pâ=â0.047). CONCLUSION: Continuous monitoring contributes to BP stability in the studied population. TRIAL REGISTRATION: NCT02519101.
Assuntos
Anestesia Geral/métodos , Determinação da Pressão Arterial/métodos , Pressão Sanguínea/fisiologia , Monitorização Intraoperatória/métodos , Procedimentos Ortopédicos/métodos , Idoso , Anestesia Geral/efeitos adversos , Anestesia Geral/tendências , Determinação da Pressão Arterial/tendências , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Monitorização Intraoperatória/tendências , Procedimentos Ortopédicos/efeitos adversos , Procedimentos Ortopédicos/tendências , Estudos ProspectivosRESUMO
BACKGROUND: In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN. RESULTS: FAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal. CONCLUSIONS: As seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice.
Assuntos
Biologia Computacional , Conjuntos de Dados como Assunto , HumanosRESUMO
BACKGROUND: In applications of supervised statistical learning in the biomedical field it is necessary to assess the prediction error of the respective prediction rules. Often, data preparation steps are performed on the dataset-in its entirety-before training/test set based prediction error estimation by cross-validation (CV)-an approach referred to as "incomplete CV". Whether incomplete CV can result in an optimistically biased error estimate depends on the data preparation step under consideration. Several empirical studies have investigated the extent of bias induced by performing preliminary supervised variable selection before CV. To our knowledge, however, the potential bias induced by other data preparation steps has not yet been examined in the literature. In this paper we investigate this bias for two common data preparation steps: normalization and principal component analysis for dimension reduction of the covariate space (PCA). Furthermore we obtain preliminary results for the following steps: optimization of tuning parameters, variable filtering by variance and imputation of missing values. METHODS: We devise the easily interpretable and general measure CVIIM ("CV Incompleteness Impact Measure") to quantify the extent of bias induced by incomplete CV with respect to a data preparation step of interest. This measure can be used to determine whether a specific data preparation step should, as a general rule, be performed in each CV iteration or whether an incomplete CV procedure would be acceptable in practice. We apply CVIIM to large collections of microarray datasets to answer this question for normalization and PCA. RESULTS: Performing normalization on the entire dataset before CV did not result in a noteworthy optimistic bias in any of the investigated cases. In contrast, when performing PCA before CV, medium to strong underestimates of the prediction error were observed in multiple settings. CONCLUSIONS: While the investigated forms of normalization can be safely performed before CV, PCA has to be performed anew in each CV split to protect against optimistic bias.
Assuntos
Interpretação Estatística de Dados , Análise de Componente Principal , Análise de Regressão , Viés de Seleção , Algoritmos , Humanos , Análise de Sequência com Séries de OligonucleotídeosRESUMO
The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for l = 1 , ⯠, nsplits : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified nsplits value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42979-021-00920-1.
RESUMO
Lung adenocarcinoma (LUAD) is a common and very lethal cancer. Accurate staging is a prerequisite for its effective diagnosis and treatment. Therefore, improving the accuracy of the stage prediction of LUAD patients is of great clinical relevance. Previous works have mainly focused on single genomic data information or a small number of different omics data types concurrently for generating predictive models. A few of them have considered multi-omics data from genome to proteome. We used a publicly available dataset to illustrate the potential of multi-omics data for stage prediction in LUAD. In particular, we investigated the roles of the specific omics data types in the prediction process. We used a self-developed method, Omics-MKL, for stage prediction that combines an existing feature ranking technique Minimum Redundancy and Maximum Relevance (mRMR), which avoids redundancy among the selected features, and multiple kernel learning (MKL), applying different kernels for different omics data types. Each of the considered omics data types individually provided useful prediction results. Moreover, using multi-omics data delivered notably better results than using single-omics data. Gene expression and methylation information seem to play vital roles in the staging of LUAD. The Omics-MKL method retained 70 features after the selection process. Of these, 21 (30%) were methylation features and 34 (48.57%) were gene expression features. Moreover, 18 (25.71%) of the selected features are known to be related to LUAD, and 29 (41.43%) to lung cancer in general. Using multi-omics data from genome to proteome for predicting the stage of LUAD seems promising because each omics data type may improve the accuracy of the predictions. Here, methylation and gene expression data may play particularly important roles.
Assuntos
Adenocarcinoma de Pulmão/genética , Neoplasias Pulmonares/genética , Adenocarcinoma de Pulmão/patologia , Biomarcadores Tumorais/genética , Variações do Número de Cópias de DNA/genética , Metilação de DNA/genética , Feminino , Expressão Gênica/genética , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Genômica/métodos , Humanos , Neoplasias Pulmonares/patologia , Masculino , Pessoa de Meia-Idade , PrognósticoRESUMO
Given that a substantial proportion of the subgroup of COVID-19 patients that face a severe disease course are younger than 60 years, it is critical to understand the disease-specific characteristics of young COVID-19 patients. Risk factors for a severe disease course for young COVID-19 patients and possible non-linear influences remain unknown. Data were analyzed from COVID-19 patients with clinical outcome in a single hospital in Wuhan, China, collected retrospectively from Jan 24th to Mar 27th. Clinical, demographic, treatment and laboratory data were collected from patients' medical records. Uni- and multivariable analysis using logistic regression and random forest, with the latter allowing the study of non-linear influences, were performed to investigate the clinical characteristics of a severe disease course. A total of 762 young patients (median age 47 years, interquartile range [IQR] 38-55, range 18-60; 55.9% female) were included, as well as 714 elderly patients as a comparison group. Among the young patients, 362 (47.5%) had a severe/critical disease course and the mean age was statistically significantly higher in the severe subgroup than in the mild subgroup (59.3 vs. 56.0, Student's t-test: p < 0.001). The uni- and multivariable analysis suggested that several covariates such as elevated levels of serum amyloid A (SAA), C-reactive protein (CRP) and lactate dehydrogenase (LDH), and decreased lymphocyte counts influence disease severity independently of age. Elevated levels of complement C3 (odds ratio [OR] 15.6, 95% CI 2.41-122.3; p = 0.039) are particularly associated with the risk of developing severe COVID-19 specifically in young patients, whereas no such influence seems to exist for elderly patients. Additional analysis suggests that the influence of complement C3 in young patients is independent of age, gender, and comorbidities. Variable importance values and partial dependence plots obtained using random forests delivered additional insights, in particular indicating non-linear influences of risk factors on disease severity. This study identified increased levels of complement C3 as a unique risk factor for adverse outcomes specific to young COVID-19 patients.
Assuntos
COVID-19/sangue , Complemento C3/análise , Adolescente , Adulto , Área Sob a Curva , COVID-19/imunologia , China/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Análise Multivariada , Dinâmica não Linear , Estudos Retrospectivos , Fatores de Risco , Índice de Gravidade de Doença , Adulto JovemRESUMO
Background: Kawasaki disease (KD) is the leading cause of acquired heart disease in children. However, distinguishing KD from febrile infections early in the disease course remains difficult. Our goal was to estimate the immune cell composition in KD patients and febrile controls (FC), and to develop a tool for KD diagnosis. Methods: We used a machine-learning algorithm, CIBERSORT, to estimate the proportions of 22 immune cell types based on blood samples from children with KD and FC. Using these immune cell compositions, a diagnostic score for predicting KD was then constructed based on LASSO regression for binary outcomes. Results: In the training set (n = 496), a model was fit which consisted of eight types of immune cells. The area under the curve (AUC) values for diagnosing KD in a held-out test set (n = 212) and an external validation set (n = 36) were 0.80 and 0.77, respectively. The most common cell types in KD blood samples were monocytes, neutrophils, CD4+-naïve and CD8+ T cells, and M0 macrophages. The diagnostic score was highly correlated to genes that had been previously reported as associated with KD, such as interleukins and chemokine receptors, and enriched in reported pathways, such as IL-6/JAK/STAT3 and TNFα signaling pathways. Conclusion: Altogether, the diagnostic score for predicting KD could potentially serve as a biomarker. Prospective studies could evaluate how incorporating the diagnostic score into a clinical algorithm would improve diagnostic accuracy further.
RESUMO
OBJECTIVES: Venoarterial extracorporeal life support (ECLS) has emerged as a potentially life-saving treatment option in therapy-refractory cardiocirculatory failure, but longer-term outcome is poorly defined. Here, we present a comprehensive follow-up analysis covering all major organ systems. METHODS: From February 2012 to December 2016, 180 patients were treated with ECLS for therapy-refractory cardiogenic shock or cardiac arrest. The 30-day survival was 43.9%, and 30-day survivors (n = 79) underwent follow-up analysis with the assessment of medium-term survival, quality of life, neuropsychological, cardiopulmonary and end-organ status. RESULTS: After a median of 1.9 (1.1-3.6) years (182.4 patient years), 45 of the 79 patients (57.0%) were alive, 35.4% had died and 7.6% were lost to follow-up. Follow-up survival estimates were 78.0% at 1, 61.2% at 3 and 55.1% at 5 years. NYHA class at follow-up was ≤II for 83.3%. The median creatinine was 1.1 (1.0-1.4) mg/dl, and the median bilirubin was 0.8 (0.5-1.0) mg/dl. No patient required dialysis. Overall, 94.4% were free from moderate or severe disability, although 11.1% needed care. Full re-integration into social life was reported by 58.3%, and 39.4% were working. Quality of life was favourable for mental components, but a subset showed deficits in physical aspects. While age was the only peri-implantation parameter significantly predicting medium-term survival, adverse events and functional status at discharge or 30 days were strong predictors. CONCLUSIONS: This study demonstrates positive medium-term outcome with high rates of independence in daily life and self-care but a subset of 10-20% suffered from sustained impairments. Our results indicate that peri-implantation parameters lack predictive power but downstream morbidity and functional status at discharge or 30 days can help identify patients at risk for poor recovery.
Assuntos
Oxigenação por Membrana Extracorpórea , Oxigenação por Membrana Extracorpórea/efeitos adversos , Parada Cardíaca/diagnóstico , Parada Cardíaca/terapia , Humanos , Qualidade de Vida , Estudos Retrospectivos , Choque Cardiogênico , Resultado do TratamentoRESUMO
In health research, statistical methods are frequently used to address a wide variety of research questions. For almost every analytical challenge, different methods are available. But how do we choose between different methods and how do we judge whether the chosen method is appropriate for our specific study? Like in any science, in statistics, experiments can be run to find out which methods should be used under which circumstances. The main objective of this paper is to demonstrate that simulation studies, that is, experiments investigating synthetic data with known properties, are an invaluable tool for addressing these questions. We aim to provide a first introduction to simulation studies for data analysts or, more generally, for researchers involved at different levels in the analyses of health data, who (1) may rely on simulation studies published in statistical literature to choose their statistical methods and who, thus, need to understand the criteria of assessing the validity and relevance of simulation results and their interpretation; and/or (2) need to understand the basic principles of designing statistical simulations in order to efficiently collaborate with more experienced colleagues or start learning to conduct their own simulations. We illustrate the implementation of a simulation study and the interpretation of its results through a simple example inspired by recent literature, which is completely reproducible using the R-script available from online supplemental file 1.
Assuntos
Aprendizagem , Projetos de Pesquisa , Simulação por Computador , Humanos , PesquisadoresRESUMO
BACKGROUND: Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. The model development process requires two independent discovery and validation data sets. Each of them may contain samples collected in a single center or a collection of samples from multiple centers. Multi-center data tend to be more heterogeneous than single-center data but are less affected by potential site-specific biases. Optimal use of limited data resources for discovery and validation with respect to the expected success of a study requires dispassionate, objective decision-making. In this work, we addressed the impact of the choice of single-center and multi-center data as discovery and validation data sets, and assessed how this impact depends on the three data characteristics signal strength, number of informative features and sample size. METHODS: We set up a simulation study to quantify the predictive performance of a model trained and validated on different combinations of in silico single-center and multi-center data. The standard bioinformatical analysis workflow of batch correction, feature selection and parameter estimation was emulated. For the determination of model quality, four measures were used: false discovery rate, prediction error, chance of successful validation (significant correlation of predicted and true validation data outcome) and model calibration. RESULTS: In agreement with literature about generalizability of signatures, prognostic models fitted to multi-center data consistently outperformed their single-center counterparts when the prediction error was the quality criterion of interest. However, for low signal strengths and small sample sizes, single-center discovery sets showed superior performance with respect to false discovery rate and chance of successful validation. CONCLUSIONS: With regard to decision making, this simulation study underlines the importance of study aims being defined precisely a priori. Minimization of the prediction error requires multi-center discovery data, whereas single-center data are preferable with respect to false discovery rate and chance of successful validation when the expected signal or sample size is low. In contrast, the choice of validation data solely affects the quality of the estimator of the prediction error, which was more precise on multi-center validation data.
Assuntos
Biologia Computacional/métodos , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Estudos Multicêntricos como Assunto , Neoplasias/radioterapia , Humanos , Prognóstico , Tolerância a Radiação/genéticaRESUMO
BACKGROUND/AIMS: Spontaneous bacterial peritonitis (SBP) is a life-threatening complication of advanced cirrhosis. By studying the susceptibility of isolated organisms and analyzing empirical antibiotic therapy combined with clinical outcomes, we aimed to find an improved empirical antibiotic therapy by considering the individual acute-on-chronic liver failure (ACLF) grade for patients with or without sepsis. METHODS: Clinical outcomes of 182 patients were assessed retrospectively with multivariable regression analysis. Each of the 223 isolates was individually evaluated regarding susceptibility results and intrinsic resistances. RESULTS: Piperacillin/tazobactam had the highest antimicrobial susceptibility among monotherapies/fixed combinations, which was significantly lower than combination therapies such as meropenem-linezolid (75.3% vs. 98.5%, Pâ¯<â¯0.001). The sensitivity of pathogens to empirical antibiotic therapy correlated with significantly lower inpatient mortality (18.9% vs. 37.0%, Pâ¯=â¯0.018), shorter inpatient stay (16.3⯱â¯10.2 vs. 26.4⯱â¯21.0 days, Pâ¯=â¯0.053) and shorter intensive care treatment (2.1⯱â¯4.5 vs. 7.9⯱â¯15.4 days, Pâ¯=â¯0.016). The largest difference of mortality was observed in patients with ACLF grade 3 (54.5% vs. 73.1% [sensitive vs. non-sensitive]). CONCLUSION: All SBP patients benefited from efficient empirical antibiotic therapy, regarding the reduced inpatient mortality and complications. For SBP patients with ACLF grade 3 without sepsis, the combination therapy with meropenem-linezolid may be suitable considering the susceptibility results and the concentration in the peritoneal cavity.