Pesquisa | BVS IEC

1.

Large-scale benchmark study of survival prediction methods using multi-omics data.

Herrmann, Moritz; Probst, Philipp; Hornung, Roman; Jurinovic, Vindi; Boulesteix, Anne-Laure.

Brief Bioinform ; 22(3)2021 05 20.

Artigo em Inglês | MEDLINE | ID: mdl-32823283

RESUMO

Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database 'The Cancer Genome Atlas' (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno's C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups-especially clinical variables-from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact:moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.

Assuntos

Benchmarking , Feminino , Humanos , Aprendizado de Máquina , Masculino , Neoplasias/genética , Neoplasias/patologia , Modelos de Riscos Proporcionais , Análise de Sobrevida

2.

Do higher alarm thresholds for arterial blood pressure lead to less perioperative hypotension? A retrospective, observational cohort study.

Meidert, Agnes S; Hornung, Roman; Christmann, Tina; Aue, Elisa; Dahal, Chetana; Dolch, Michael E; Briegel, Josef.

J Clin Monit Comput ; 37(1): 275-285, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-35796851

RESUMO

Arterial blood pressure is one of the vital signs monitored mandatory in anaesthetised patients. Even short episodes of intraoperative hypotension are associated with increased risk for postoperative organ dysfunction such as acute kidney injury and myocardial injury. Since there is little evidence whether higher alarm thresholds in patient monitors can help prevent intraoperative hypotension, we analysed the blood pressure data before (group 1) and after (group 2) the implementation of altered hypotension alarm settings. The study was conducted as a retrospective observational cohort study in a large surgical centre with 32 operating theatres. Alarm thresholds for hypotension alarm for mean arterial pressure (MAP) were altered from 60 (before) to 65 mmHg for invasive measurement and 70 mmHg for noninvasive measurement. Blood pressure data from electronic anaesthesia records of 4222 patients (1982 and 2240 in group 1 and 2, respectively) with 406,623 blood pressure values undergoing noncardiac surgery were included. We analysed (A) the proportion of blood pressure measurements below the threshold among all measurements by quasi-binomial regression and (B) whether at least one blood pressure measurement below the threshold occurred by logistic regression. Hypotension was defined as MAP < 65 mmHg. There was no significant difference in overall proportions of hypotensive episodes for mean arterial pressure before and after the adjustment of alarm settings (mean proportion of values below 65 mmHg were 6.05% in group 1 and 5.99% in group 2). The risk of ever experiencing a hypotensive episode during anaesthesia was significantly lower in group 2 with an odds ratio of 0.84 (p = 0.029). In conclusion, higher alarm thresholds do not generally lead to less hypotensive episodes perioperatively. There was a slight but significant reduction of the occurrence of intraoperative hypotension in the presence of higher thresholds for blood pressure alarms. However, this reduction only seems to be present in patients with very few hypotensive episodes.

Assuntos

Pressão Arterial , Hipotensão , Humanos , Pressão Arterial/fisiologia , Estudos Retrospectivos , Complicações Pós-Operatórias/diagnóstico , Monitorização Intraoperatória/efeitos adversos , Hipotensão/diagnóstico , Hipotensão/etiologia , Estudos de Coortes , Pressão Sanguínea

3.

Benchmark study of feature selection strategies for multi-omics data.

Li, Yingxia; Mansmann, Ulrich; Du, Shangming; Hornung, Roman.

BMC Bioinformatics ; 23(1): 412, 2022 Oct 05.

Artigo em Inglês | MEDLINE | ID: mdl-36199022

RESUMO

BACKGROUND: In the last few years, multi-omics data, that is, datasets containing different types of high-dimensional molecular variables for the same samples, have become increasingly available. To date, several comparison studies focused on feature selection methods for omics data, but to our knowledge, none compared these methods for the special case of multi-omics data. Given that these data have specific structures that differentiate them from single-omics data, it is unclear whether different feature selection strategies may be optimal for such data. In this paper, using 15 cancer multi-omics datasets we compared four filter methods, two embedded methods, and two wrapper methods with respect to their performance in the prediction of a binary outcome in several situations that may affect the prediction results. As classifiers, we used support vector machines and random forests. The methods were compared using repeated fivefold cross-validation. The accuracy, the AUC, and the Brier score served as performance metrics. RESULTS: The results suggested that, first, the chosen number of selected features affects the predictive performance for many feature selection methods but not all. Second, whether the features were selected by data type or from all data types concurrently did not considerably affect the predictive performance, but for some methods, concurrent selection took more time. Third, regardless of which performance measure was considered, the feature selection methods mRMR, the permutation importance of random forests, and the Lasso tended to outperform the other considered methods. Here, mRMR and the permutation importance of random forests already delivered strong predictive performance when considering only a few selected features. Finally, the wrapper methods were computationally much more expensive than the filter and embedded methods. CONCLUSIONS: We recommend the permutation importance of random forests and the filter method mRMR for feature selection using multi-omics data, where, however, mRMR is considerably more computationally costly.

Assuntos

Benchmarking , Neoplasias , Humanos , Neoplasias/genética , Máquina de Vetores de Suporte

4.

Patients with cirrhosis and SBP: Increase in multidrug-resistant organisms and complications.

Li, Hanwei; Wieser, Andreas; Zhang, Jiang; Liss, Ingrid; Markwardt, Daniel; Hornung, Roman; Neumann-Cip, Anna C; Mayerle, Julia; Gerbes, Alexander; Steib, Christian J.

Eur J Clin Invest ; 50(2): e13198, 2020 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-31886517

RESUMO

BACKGROUND: Spontaneous bacterial peritonitis (SBP) is a serious complication in patients with liver cirrhosis. In recent years, it has been postulated that the rate of multidrug-resistant organisms (MDROs) is increasing, especially in nosocomial SBP patients. Aim of the present work was to investigate this hypothesis and its possible clinical consequences. MATERIALS AND METHODS: One hundred and three culture-positive patients between 2007 and 2014 were compared with 81 patients between 2015 and 2017, to study the change of microbiological profiles and their clinical consequences. The cirrhosis patients with bacterascites requiring treatment were included as well. RESULTS: The most prevalent Gram-negative bacteria isolated from ascites were Enterobacterales (31.6%) and in Gram-positive pathogens Staphylococci (22.8%). There was a significant increase in MDROs (22.3% ICU 40.7%, P = .048), accompanied by an increased incidence of sepsis (from 21.4% to 37.0%, P = .021), hepatorenal syndrome (from 40.8% to 58.0%, P = .007) and the need of catecholamine therapy (from 21.4% to 38.8%, P = .036). Nosocomial origin correlated with higher MDRO proportion, more complications and lower antimicrobial susceptibility rates in 12 commonly used antibiotics. MDROs were confirmed as an isolated predictor for inpatient mortality and complications in multivariable logistic regression. CONCLUSIONS: The feeling in clinical practice that MDROs have increased in the last 11 years could be confirmed in our study in Munich, Germany. Nosocomial SBP correlated with significantly higher MDRO rates (nearly 50%) and complication rates. In our opinion, an antibiotic combination with comprehensive effect should be taken into account in nosocomial SBP patients in this region.

Assuntos

Infecções Bacterianas/microbiologia , Infecção Hospitalar/microbiologia , Farmacorresistência Bacteriana Múltipla , Peritonite/microbiologia , Sepse/microbiologia , Idoso , Ascite/epidemiologia , Ascite/microbiologia , Infecções Bacterianas/epidemiologia , Translocação Bacteriana , Catecolaminas/uso terapêutico , Infecção Hospitalar/epidemiologia , Infecções por Enterobacteriaceae/epidemiologia , Infecções por Enterobacteriaceae/microbiologia , Enterococcus , Feminino , Alemanha/epidemiologia , Infecções por Bactérias Gram-Negativas/epidemiologia , Infecções por Bactérias Gram-Negativas/microbiologia , Infecções por Bactérias Gram-Positivas/epidemiologia , Infecções por Bactérias Gram-Positivas/microbiologia , Síndrome Hepatorrenal/epidemiologia , Mortalidade Hospitalar , Humanos , Cirrose Hepática/epidemiologia , Masculino , Testes de Sensibilidade Microbiana , Pessoa de Meia-Idade , Peritonite/epidemiologia , Terapia de Substituição Renal , Respiração Artificial/estatística & dados numéricos , Estudos Retrospectivos , Sepse/epidemiologia , Infecções Estafilocócicas/epidemiologia , Infecções Estafilocócicas/microbiologia , Infecções Estreptocócicas/epidemiologia , Infecções Estreptocócicas/microbiologia , Vasoconstritores/uso terapêutico

5.

Block Forests: random forests for blocks of clinical and omics covariate data.

Hornung, Roman; Wright, Marvin N.

BMC Bioinformatics ; 20(1): 358, 2019 Jun 27.

Artigo em Inglês | MEDLINE | ID: mdl-31248362

RESUMO

BACKGROUND: In the last years more and more multi-omics data are becoming available, that is, data featuring measurements of several types of omics data for each patient. Using multi-omics data as covariate data in outcome prediction is both promising and challenging due to the complex structure of such data. Random forest is a prediction method known for its ability to render complex dependency patterns between the outcome and the covariates. Against this background we developed five candidate random forest variants tailored to multi-omics covariate data. These variants modify the split point selection of random forest to incorporate the block structure of multi-omics data and can be applied to any outcome type for which a random forest variant exists, such as categorical, continuous and survival outcomes. Using 20 publicly available multi-omics data sets with survival outcome we compared the prediction performances of the block forest variants with alternatives. We also considered the common special case of having clinical covariates and measurements of a single omics data type available. RESULTS: We identify one variant termed "block forest" that outperformed all other approaches in the comparison study. In particular, it performed significantly better than standard random survival forest (adjusted p-value: 0.027). The two best performing variants have in common that the block choice is randomized in the split point selection procedure. In the case of having clinical covariates and a single omics data type available, the improvements of the variants over random survival forest were larger than in the case of the multi-omics data. The degrees of improvements over random survival forest varied strongly across data sets. Moreover, considering all clinical covariates mandatorily improved the performance. This result should however be interpreted with caution, because the level of predictive information contained in clinical covariates depends on the specific application. CONCLUSIONS: The new prediction method block forest for multi-omics data can significantly improve the prediction performance of random forest and outperformed alternatives in the comparison. Block forest is particularly effective for the special case of using clinical covariates in combination with measurements of a single omics data type.

Assuntos

Aprendizado de Máquina , Genômica , Humanos , Análise de Sobrevida

6.

Making complex prediction rules applicable for readers: Current practice in random forest literature and recommendations.

Boulesteix, Anne-Laure; Janitza, Silke; Hornung, Roman; Probst, Philipp; Busen, Hannah; Hapfelmeier, Alexander.

Biom J ; 61(5): 1314-1328, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-30069934

RESUMO

Ideally, prediction rules should be published in such a way that readers may apply them, for example, to make predictions for their own data. While this is straightforward for simple prediction rules, such as those based on the logistic regression model, this is much more difficult for complex prediction rules derived by machine learning tools. We conducted a survey of articles reporting prediction rules that were constructed using the random forest algorithm and published in PLOS ONE in 2014-2015 in the field "medical and health sciences", with the aim of identifying issues related to their applicability. Making a prediction rule reproducible is a possible way to ensure that it is applicable; thus reproducibility is also examined in our survey. The presented prediction rules were applicable in only 2 of 30 identified papers, while for further eight prediction rules it was possible to obtain the necessary information by contacting the authors. Various problems, such as nonresponse of the authors, hampered the applicability of prediction rules in the other cases. Based on our experiences from this illustrative survey, we formulate a set of recommendations for authors who aim to make complex prediction rules applicable for readers. All data including the description of the considered studies and analysis codes are available as supplementary materials.

Assuntos

Biometria/métodos , Medicina , Ciência , Software

7.

Priority-Lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data.

Klau, Simon; Jurinovic, Vindi; Hornung, Roman; Herold, Tobias; Boulesteix, Anne-Laure.

BMC Bioinformatics ; 19(1): 322, 2018 Sep 12.

Artigo em Inglês | MEDLINE | ID: mdl-30208855

RESUMO

BACKGROUND: The inclusion of high-dimensional omics data in prediction models has become a well-studied topic in the last decades. Although most of these methods do not account for possibly different types of variables in the set of covariates available in the same dataset, there are many such scenarios where the variables can be structured in blocks of different types, e.g., clinical, transcriptomic, and methylation data. To date, there exist a few computationally intensive approaches that make use of block structures of this kind. RESULTS: In this paper we present priority-Lasso, an intuitive and practical analysis strategy for building prediction models based on Lasso that takes such block structures into account. It requires the definition of a priority order of blocks of data. Lasso models are calculated successively for every block and the fitted values of every step are included as an offset in the fit of the next step. We apply priority-Lasso in different settings on an acute myeloid leukemia (AML) dataset consisting of clinical variables, cytogenetics, gene mutations and expression variables, and compare its performance on an independent validation dataset to the performance of standard Lasso models. CONCLUSION: The results show that priority-Lasso is able to keep pace with Lasso in terms of prediction accuracy. Variables of blocks with higher priorities are favored over variables of blocks with lower priority, which results in easily usable and transportable models for clinical practice.

Assuntos

Genômica/métodos , Software , Humanos , Estimativa de Kaplan-Meier , Leucemia Mieloide Aguda/genética , Reprodutibilidade dos Testes , Fatores de Risco , Resultado do Tratamento

8.

Improving cross-study prediction through addon batch effect adjustment or addon normalization.

Hornung, Roman; Causeur, David; Bernau, Christoph; Boulesteix, Anne-Laure.

Bioinformatics ; 33(3): 397-404, 2017 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-27797760

RESUMO

Motivation: To date most medical tests derived by applying classification methods to high-dimensional molecular data are hardly used in clinical practice. This is partly because the prediction error resulting when applying them to external data is usually much higher than internal error as evaluated through within-study validation procedures. We suggest the use of addon normalization and addon batch effect removal techniques in this context to reduce systematic differences between external data and the original dataset with the aim to improve prediction performance. Results: We evaluate the impact of addon normalization and seven batch effect removal methods on cross-study prediction performance for several common classifiers using a large collection of microarray gene expression datasets, showing that some of these techniques reduce prediction error. Availability and Implementation: All investigated addon methods are implemented in our R package bapred. Contact: hornung@ibe.med.uni-muenchen.de. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Valor Preditivo dos Testes , Projetos de Pesquisa , Algoritmos , Conjuntos de Dados como Assunto , Humanos , Análise de Sequência de RNA

9.

The impact of continuous non-invasive arterial blood pressure monitoring on blood pressure stability during general anaesthesia in orthopaedic patients: A randomised trial.

Meidert, Agnes S; Nold, Johanna S; Hornung, Roman; Paulus, Alexander C; Zwißler, Bernhard; Czerner, Stephan.

Eur J Anaesthesiol ; 34(11): 716-722, 2017 11.

Artigo em Inglês | MEDLINE | ID: mdl-28922340

RESUMO

BACKGROUND: In patients undergoing general anaesthesia, intraoperative hypotension occurs frequently and is associated with adverse outcomes such as postoperative acute kidney failure, myocardial infarction or stroke. A history of chronic hypertension renders patients more susceptible to a decrease in blood pressure (BP) after induction of general anaesthesia. As a patient's BP is generally monitored intermittently via an upper arm cuff, there may be a delay in the detection of hypotension by the anaesthetist. OBJECTIVE: The current study investigates whether the presence of continuous BP monitoring leads to improved BP stability. DESIGN: Randomised, controlled and single-centre study. PATIENTS: A total of 160 orthopaedic patients undergoing general anaesthesia with a history of chronic hypertension. INTERVENTION: The patients were randomised to either a study group (nâ=â77) that received continuous non-invasive BP monitoring in addition to oscillometric intermittent monitoring, or a control group (nâ=â83) whose BP was monitored intermittently only. The interval for oscillometric measurements in both groups was set to 3âmin. After induction of general anaesthesia, oscillometric BP values of the two groups were compared for the first hour of the procedure. Anaesthetists were blinded to the purpose of the study. MAIN OUTCOME MEASURE: BP stability and hypotensive events. RESULTS: There was no difference in baseline BP between the groups. After adjustment for multiple testing, mean arterial BP in the study group was significantly higher than in the control group at 12 and 15âmin. Meanâ±âSD for study and control group, respectively were: 12âmin, 102â±â24 vs. 90â±â26âmmHg (Pâ=â0.039) and 15âmin, 102â±â21 vs. 90â±â23âmmHg (Pâ=â0.023). Hypotensive readings below a mean pressure of 55âmmHg occurred more often in the control group (25 vs. 7, Pâ=â0.047). CONCLUSION: Continuous monitoring contributes to BP stability in the studied population. TRIAL REGISTRATION: NCT02519101.

Assuntos

Anestesia Geral/métodos , Determinação da Pressão Arterial/métodos , Pressão Sanguínea/fisiologia , Monitorização Intraoperatória/métodos , Procedimentos Ortopédicos/métodos , Idoso , Anestesia Geral/efeitos adversos , Anestesia Geral/tendências , Determinação da Pressão Arterial/tendências , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Monitorização Intraoperatória/tendências , Procedimentos Ortopédicos/efeitos adversos , Procedimentos Ortopédicos/tendências , Estudos Prospectivos

10.

Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment.

Hornung, Roman; Boulesteix, Anne-Laure; Causeur, David.

BMC Bioinformatics ; 17: 27, 2016 Jan 12.

Artigo em Inglês | MEDLINE | ID: mdl-26753519

RESUMO

BACKGROUND: In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN. RESULTS: FAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal. CONCLUSIONS: As seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice.

Assuntos

Biologia Computacional , Conjuntos de Dados como Assunto , Humanos

11.

A measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization.

Hornung, Roman; Bernau, Christoph; Truntzer, Caroline; Wilson, Rory; Stadler, Thomas; Boulesteix, Anne-Laure.

BMC Med Res Methodol ; 15: 95, 2015 Nov 04.

Artigo em Inglês | MEDLINE | ID: mdl-26537575

RESUMO

BACKGROUND: In applications of supervised statistical learning in the biomedical field it is necessary to assess the prediction error of the respective prediction rules. Often, data preparation steps are performed on the dataset-in its entirety-before training/test set based prediction error estimation by cross-validation (CV)-an approach referred to as "incomplete CV". Whether incomplete CV can result in an optimistically biased error estimate depends on the data preparation step under consideration. Several empirical studies have investigated the extent of bias induced by performing preliminary supervised variable selection before CV. To our knowledge, however, the potential bias induced by other data preparation steps has not yet been examined in the literature. In this paper we investigate this bias for two common data preparation steps: normalization and principal component analysis for dimension reduction of the covariate space (PCA). Furthermore we obtain preliminary results for the following steps: optimization of tuning parameters, variable filtering by variance and imputation of missing values. METHODS: We devise the easily interpretable and general measure CVIIM ("CV Incompleteness Impact Measure") to quantify the extent of bias induced by incomplete CV with respect to a data preparation step of interest. This measure can be used to determine whether a specific data preparation step should, as a general rule, be performed in each CV iteration or whether an incomplete CV procedure would be acceptable in practice. We apply CVIIM to large collections of microarray datasets to answer this question for normalization and PCA. RESULTS: Performing normalization on the entire dataset before CV did not result in a noteworthy optimistic bias in any of the investigated cases. In contrast, when performing PCA before CV, medium to strong underestimates of the prediction error were observed in multiple settings. CONCLUSIONS: While the investigated forms of normalization can be safely performed before CV, PCA has to be performed anew in each CV split to protect against optimistic bias.

Assuntos

Interpretação Estatística de Dados , Análise de Componente Principal , Análise de Regressão , Viés de Seleção , Algoritmos , Humanos , Análise de Sequência com Séries de Oligonucleotídeos

12.

Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests.

Hornung, Roman.

SN Comput Sci ; 3(1): 1, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34723205

RESUMO

The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for l = 1 , â¯ , nsplits : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified nsplits value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42979-021-00920-1.

13.

Synergistic Effects of Different Levels of Genomic Data for the Staging of Lung Adenocarcinoma: An Illustrative Study.

Li, Yingxia; Mansmann, Ulrich; Du, Shangming; Hornung, Roman.

Genes (Basel) ; 12(12)2021 11 24.

Artigo em Inglês | MEDLINE | ID: mdl-34946821

RESUMO

Lung adenocarcinoma (LUAD) is a common and very lethal cancer. Accurate staging is a prerequisite for its effective diagnosis and treatment. Therefore, improving the accuracy of the stage prediction of LUAD patients is of great clinical relevance. Previous works have mainly focused on single genomic data information or a small number of different omics data types concurrently for generating predictive models. A few of them have considered multi-omics data from genome to proteome. We used a publicly available dataset to illustrate the potential of multi-omics data for stage prediction in LUAD. In particular, we investigated the roles of the specific omics data types in the prediction process. We used a self-developed method, Omics-MKL, for stage prediction that combines an existing feature ranking technique Minimum Redundancy and Maximum Relevance (mRMR), which avoids redundancy among the selected features, and multiple kernel learning (MKL), applying different kernels for different omics data types. Each of the considered omics data types individually provided useful prediction results. Moreover, using multi-omics data delivered notably better results than using single-omics data. Gene expression and methylation information seem to play vital roles in the staging of LUAD. The Omics-MKL method retained 70 features after the selection process. Of these, 21 (30%) were methylation features and 34 (48.57%) were gene expression features. Moreover, 18 (25.71%) of the selected features are known to be related to LUAD, and 29 (41.43%) to lung cancer in general. Using multi-omics data from genome to proteome for predicting the stage of LUAD seems promising because each omics data type may improve the accuracy of the predictions. Here, methylation and gene expression data may play particularly important roles.

Assuntos

Adenocarcinoma de Pulmão/genética , Neoplasias Pulmonares/genética , Adenocarcinoma de Pulmão/patologia , Biomarcadores Tumorais/genética , Variações do Número de Cópias de DNA/genética , Metilação de DNA/genética , Feminino , Expressão Gênica/genética , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Genômica/métodos , Humanos , Neoplasias Pulmonares/patologia , Masculino , Pessoa de Meia-Idade , Prognóstico

14.

Complement C3 identified as a unique risk factor for disease severity among young COVID-19 patients in Wuhan, China.

Cheng, Weiting; Hornung, Roman; Xu, Kai; Yang, Cai Hong; Li, Jian.

Sci Rep ; 11(1): 7857, 2021 04 12.

Artigo em Inglês | MEDLINE | ID: mdl-33846344

RESUMO

Given that a substantial proportion of the subgroup of COVID-19 patients that face a severe disease course are younger than 60 years, it is critical to understand the disease-specific characteristics of young COVID-19 patients. Risk factors for a severe disease course for young COVID-19 patients and possible non-linear influences remain unknown. Data were analyzed from COVID-19 patients with clinical outcome in a single hospital in Wuhan, China, collected retrospectively from Jan 24th to Mar 27th. Clinical, demographic, treatment and laboratory data were collected from patients' medical records. Uni- and multivariable analysis using logistic regression and random forest, with the latter allowing the study of non-linear influences, were performed to investigate the clinical characteristics of a severe disease course. A total of 762 young patients (median age 47 years, interquartile range [IQR] 38-55, range 18-60; 55.9% female) were included, as well as 714 elderly patients as a comparison group. Among the young patients, 362 (47.5%) had a severe/critical disease course and the mean age was statistically significantly higher in the severe subgroup than in the mild subgroup (59.3 vs. 56.0, Student's t-test: p < 0.001). The uni- and multivariable analysis suggested that several covariates such as elevated levels of serum amyloid A (SAA), C-reactive protein (CRP) and lactate dehydrogenase (LDH), and decreased lymphocyte counts influence disease severity independently of age. Elevated levels of complement C3 (odds ratio [OR] 15.6, 95% CI 2.41-122.3; p = 0.039) are particularly associated with the risk of developing severe COVID-19 specifically in young patients, whereas no such influence seems to exist for elderly patients. Additional analysis suggests that the influence of complement C3 in young patients is independent of age, gender, and comorbidities. Variable importance values and partial dependence plots obtained using random forests delivered additional insights, in particular indicating non-linear influences of risk factors on disease severity. This study identified increased levels of complement C3 as a unique risk factor for adverse outcomes specific to young COVID-19 patients.

Assuntos

COVID-19/sangue , Complemento C3/análise , Adolescente , Adulto , Área Sob a Curva , COVID-19/imunologia , China/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Análise Multivariada , Dinâmica não Linear , Estudos Retrospectivos , Fatores de Risco , Índice de Gravidade de Doença , Adulto Jovem

15.

A Diagnostic Model for Kawasaki Disease Based on Immune Cell Characterization From Blood Samples.

Du, Shangming; Mansmann, Ulrich; Geisler, Benjamin P; Li, Yingxia; Hornung, Roman.

Front Pediatr ; 9: 769937, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-35071130

RESUMO

Background: Kawasaki disease (KD) is the leading cause of acquired heart disease in children. However, distinguishing KD from febrile infections early in the disease course remains difficult. Our goal was to estimate the immune cell composition in KD patients and febrile controls (FC), and to develop a tool for KD diagnosis. Methods: We used a machine-learning algorithm, CIBERSORT, to estimate the proportions of 22 immune cell types based on blood samples from children with KD and FC. Using these immune cell compositions, a diagnostic score for predicting KD was then constructed based on LASSO regression for binary outcomes. Results: In the training set (n = 496), a model was fit which consisted of eight types of immune cells. The area under the curve (AUC) values for diagnosing KD in a held-out test set (n = 212) and an external validation set (n = 36) were 0.80 and 0.77, respectively. The most common cell types in KD blood samples were monocytes, neutrophils, CD4+-naïve and CD8+ T cells, and M0 macrophages. The diagnostic score was highly correlated to genes that had been previously reported as associated with KD, such as interleukins and chemokine receptors, and enriched in reported pathways, such as IL-6/JAK/STAT3 and TNFα signaling pathways. Conclusion: Altogether, the diagnostic score for predicting KD could potentially serve as a biomarker. Prospective studies could evaluate how incorporating the diagnostic score into a clinical algorithm would improve diagnostic accuracy further.

16.

Extracorporeal life support in therapy-refractory cardiocirculatory failure: looking beyond 30 days.

Guenther, Sabina P W; Hornung, Roman; Joskowiak, Dominik; Vlachea, Polyxeni; Feil, Katharina; Orban, Martin; Peterss, Sven; Born, Frank; Hausleiter, Jörg; Massberg, Steffen; Hagl, Christian.

Interact Cardiovasc Thorac Surg ; 32(4): 607-615, 2021 04 19.

Artigo em Inglês | MEDLINE | ID: mdl-33347585

RESUMO

OBJECTIVES: Venoarterial extracorporeal life support (ECLS) has emerged as a potentially life-saving treatment option in therapy-refractory cardiocirculatory failure, but longer-term outcome is poorly defined. Here, we present a comprehensive follow-up analysis covering all major organ systems. METHODS: From February 2012 to December 2016, 180 patients were treated with ECLS for therapy-refractory cardiogenic shock or cardiac arrest. The 30-day survival was 43.9%, and 30-day survivors (n = 79) underwent follow-up analysis with the assessment of medium-term survival, quality of life, neuropsychological, cardiopulmonary and end-organ status. RESULTS: After a median of 1.9 (1.1-3.6) years (182.4 patient years), 45 of the 79 patients (57.0%) were alive, 35.4% had died and 7.6% were lost to follow-up. Follow-up survival estimates were 78.0% at 1, 61.2% at 3 and 55.1% at 5 years. NYHA class at follow-up was ≤II for 83.3%. The median creatinine was 1.1 (1.0-1.4) mg/dl, and the median bilirubin was 0.8 (0.5-1.0) mg/dl. No patient required dialysis. Overall, 94.4% were free from moderate or severe disability, although 11.1% needed care. Full re-integration into social life was reported by 58.3%, and 39.4% were working. Quality of life was favourable for mental components, but a subset showed deficits in physical aspects. While age was the only peri-implantation parameter significantly predicting medium-term survival, adverse events and functional status at discharge or 30 days were strong predictors. CONCLUSIONS: This study demonstrates positive medium-term outcome with high rates of independence in daily life and self-care but a subset of 10-20% suffered from sustained impairments. Our results indicate that peri-implantation parameters lack predictive power but downstream morbidity and functional status at discharge or 30 days can help identify patients at risk for poor recovery.

Assuntos

Oxigenação por Membrana Extracorpórea , Oxigenação por Membrana Extracorpórea/efeitos adversos , Parada Cardíaca/diagnóstico , Parada Cardíaca/terapia , Humanos , Qualidade de Vida , Estudos Retrospectivos , Choque Cardiogênico , Resultado do Tratamento

17.

Introduction to statistical simulations in health research.

Boulesteix, Anne-Laure; Groenwold, Rolf Hh; Abrahamowicz, Michal; Binder, Harald; Briel, Matthias; Hornung, Roman; Morris, Tim P; Rahnenführer, Jörg; Sauerbrei, Willi.

BMJ Open ; 10(12): e039921, 2020 12 13.

Artigo em Inglês | MEDLINE | ID: mdl-33318113

RESUMO

In health research, statistical methods are frequently used to address a wide variety of research questions. For almost every analytical challenge, different methods are available. But how do we choose between different methods and how do we judge whether the chosen method is appropriate for our specific study? Like in any science, in statistics, experiments can be run to find out which methods should be used under which circumstances. The main objective of this paper is to demonstrate that simulation studies, that is, experiments investigating synthetic data with known properties, are an invaluable tool for addressing these questions. We aim to provide a first introduction to simulation studies for data analysts or, more generally, for researchers involved at different levels in the analyses of health data, who (1) may rely on simulation studies published in statistical literature to choose their statistical methods and who, thus, need to understand the criteria of assessing the validity and relevance of simulation results and their interpretation; and/or (2) need to understand the basic principles of designing statistical simulations in order to efficiently collaborate with more experienced colleagues or start learning to conduct their own simulations. We illustrate the implementation of a simulation study and the interpretation of its results through a simple example inspired by recent literature, which is completely reproducible using the R-script available from online supplemental file 1.

Assuntos

Aprendizagem , Projetos de Pesquisa , Simulação por Computador , Humanos , Pesquisadores

18.

Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study.

Samaga, Daniel; Hornung, Roman; Braselmann, Herbert; Hess, Julia; Zitzelsberger, Horst; Belka, Claus; Boulesteix, Anne-Laure; Unger, Kristian.

Radiat Oncol ; 15(1): 109, 2020 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-32410693

RESUMO

BACKGROUND: Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. The model development process requires two independent discovery and validation data sets. Each of them may contain samples collected in a single center or a collection of samples from multiple centers. Multi-center data tend to be more heterogeneous than single-center data but are less affected by potential site-specific biases. Optimal use of limited data resources for discovery and validation with respect to the expected success of a study requires dispassionate, objective decision-making. In this work, we addressed the impact of the choice of single-center and multi-center data as discovery and validation data sets, and assessed how this impact depends on the three data characteristics signal strength, number of informative features and sample size. METHODS: We set up a simulation study to quantify the predictive performance of a model trained and validated on different combinations of in silico single-center and multi-center data. The standard bioinformatical analysis workflow of batch correction, feature selection and parameter estimation was emulated. For the determination of model quality, four measures were used: false discovery rate, prediction error, chance of successful validation (significant correlation of predicted and true validation data outcome) and model calibration. RESULTS: In agreement with literature about generalizability of signatures, prognostic models fitted to multi-center data consistently outperformed their single-center counterparts when the prediction error was the quality criterion of interest. However, for low signal strengths and small sample sizes, single-center discovery sets showed superior performance with respect to false discovery rate and chance of successful validation. CONCLUSIONS: With regard to decision making, this simulation study underlines the importance of study aims being defined precisely a priori. Minimization of the prediction error requires multi-center discovery data, whereas single-center data are preferable with respect to false discovery rate and chance of successful validation when the expected signal or sample size is low. In contrast, the choice of validation data solely affects the quality of the estimator of the prediction error, which was more precise on multi-center validation data.

Assuntos

Biologia Computacional/métodos , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Estudos Multicêntricos como Assunto , Neoplasias/radioterapia , Humanos , Prognóstico , Tolerância a Radiação/genética

19.

Evaluating the best empirical antibiotic therapy in patients with acute-on-chronic liver failure and spontaneous bacterial peritonitis.

Wieser, Andreas; Li, Hanwei; Zhang, Jiang; Liss, Ingrid; Markwardt, Daniel; Hornung, Roman; Suerbaum, Sebastian; Mayerle, Julia; Gerbes, Alexander L; Steib, Christian J.

Dig Liver Dis ; 51(9): 1300-1307, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-30944073

RESUMO

BACKGROUND/AIMS: Spontaneous bacterial peritonitis (SBP) is a life-threatening complication of advanced cirrhosis. By studying the susceptibility of isolated organisms and analyzing empirical antibiotic therapy combined with clinical outcomes, we aimed to find an improved empirical antibiotic therapy by considering the individual acute-on-chronic liver failure (ACLF) grade for patients with or without sepsis. METHODS: Clinical outcomes of 182 patients were assessed retrospectively with multivariable regression analysis. Each of the 223 isolates was individually evaluated regarding susceptibility results and intrinsic resistances. RESULTS: Piperacillin/tazobactam had the highest antimicrobial susceptibility among monotherapies/fixed combinations, which was significantly lower than combination therapies such as meropenem-linezolid (75.3% vs. 98.5%, Pâ¯<â¯0.001). The sensitivity of pathogens to empirical antibiotic therapy correlated with significantly lower inpatient mortality (18.9% vs. 37.0%, Pâ¯=â¯0.018), shorter inpatient stay (16.3â¯±â¯10.2 vs. 26.4â¯±â¯21.0 days, Pâ¯=â¯0.053) and shorter intensive care treatment (2.1â¯±â¯4.5 vs. 7.9â¯±â¯15.4 days, Pâ¯=â¯0.016). The largest difference of mortality was observed in patients with ACLF grade 3 (54.5% vs. 73.1% [sensitive vs. non-sensitive]). CONCLUSION: All SBP patients benefited from efficient empirical antibiotic therapy, regarding the reduced inpatient mortality and complications. For SBP patients with ACLF grade 3 without sepsis, the combination therapy with meropenem-linezolid may be suitable considering the susceptibility results and the concentration in the peritoneal cavity.

Assuntos

Insuficiência Hepática Crônica Agudizada/tratamento farmacológico , Antibacterianos/administração & dosagem , Peritonite/tratamento farmacológico , Insuficiência Hepática Crônica Agudizada/classificação , Insuficiência Hepática Crônica Agudizada/mortalidade , Adulto , Idoso , Antibacterianos/efeitos adversos , Feminino , Humanos , Tempo de Internação/estatística & dados numéricos , Masculino , Testes de Sensibilidade Microbiana , Pessoa de Meia-Idade , Peritonite/microbiologia , Peritonite/mortalidade , Estudos Retrospectivos

20.

On the overestimation of random forest's out-of-bag error.

Janitza, Silke; Hornung, Roman.

PLoS One ; 13(8): e0201904, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30080866

RESUMO

The ensemble method random forests has become a popular classification tool in bioinformatics and related fields. The out-of-bag error is an error estimation technique often used to evaluate the accuracy of a random forest and to select appropriate values for tuning parameters, such as the number of candidate predictors that are randomly drawn for a split, referred to as mtry. However, for binary classification problems with metric predictors it has been shown that the out-of-bag error can overestimate the true prediction error depending on the choices of random forests parameters. Based on simulated and real data this paper aims to identify settings for which this overestimation is likely. It is, moreover, questionable whether the out-of-bag error can be used in classification tasks for selecting tuning parameters like mtry, because the overestimation is seen to depend on the parameter mtry. The simulation-based and real-data based studies with metric predictor variables performed in this paper show that the overestimation is largest in balanced settings and in settings with few observations, a large number of predictor variables, small correlations between predictors and weak effects. There was hardly any impact of the overestimation on tuning parameter selection. However, although the prediction performance of random forests was not substantially affected when using the out-of-bag error for tuning parameter selection in the present studies, one cannot be sure that this applies to all future data. For settings with metric predictor variables it is therefore strongly recommended to use stratified subsampling with sampling fractions that are proportional to the class sizes for both tuning parameter selection and error estimation in random forests. This yielded less biased estimates of the true prediction error. In unbalanced settings, in which there is a strong interest in predicting observations from the smaller classes well, sampling the same number of observations from each class is a promising alternative.

Assuntos

Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/mortalidade , Neoplasias/terapia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA