RESUMO
MOTIVATION: Valid statistical inference is crucial for decision-making but difficult to obtain in supervised learning with multimodal data, e.g. combinations of clinical features, genomic data, and medical images. Multimodal data often warrants the use of black-box algorithms, for instance, random forests or neural networks, which impede the use of traditional variable significance tests. RESULTS: We address this problem by proposing the use of COvariance MEasure Tests (COMETs), which are calibrated and powerful tests that can be combined with any sufficiently predictive supervised learning algorithm. We apply COMETs to several high-dimensional, multimodal data sets to illustrate (i) variable significance testing for finding relevant mutations modulating drug-activity, (ii) modality selection for predicting survival in liver cancer patients with multiomics data, and (iii) modality selection with clinical features and medical imaging data. In all applications, COMETs yield results consistent with domain knowledge without requiring data-driven pre-processing, which may invalidate type I error control. These novel applications with high-dimensional multimodal data corroborate prior results on the power and robustness of COMETs for significance testing. AVAILABILITY AND IMPLEMENTATION: COMETs are implemented in the cometsR package available on CRAN and pycometsPython library available on GitHub. Source code for reproducing all results is available at https://github.com/LucasKook/comets. All data sets used in this work are openly available.
Assuntos
Algoritmos , Aprendizado de Máquina Supervisionado , Humanos , Neoplasias Hepáticas/genética , Biologia Computacional/métodosRESUMO
Comparative simulation studies are workhorse tools for benchmarking statistical methods. As with other empirical studies, the success of simulation studies hinges on the quality of their design, execution, and reporting. If not conducted carefully and transparently, their conclusions may be misleading. In this paper, we discuss various questionable research practices, which may impact the validity of simulation studies, some of which cannot be detected or prevented by the current publication process in statistics journals. To illustrate our point, we invent a novel prediction method with no expected performance gain and benchmark it in a preregistered comparative simulation study. We show how easy it is to make the method appear superior over well-established competitor methods if questionable research practices are employed. Finally, we provide concrete suggestions for researchers, reviewers, and other academic stakeholders for improving the methodological quality of comparative simulation studies, such as preregistering simulation protocols, incentivizing neutral simulation studies, and code and data sharing.
Assuntos
Benchmarking , Simulação por ComputadorRESUMO
BACKGROUND: Retinal artery occlusion (RAO) may lead to irreversible blindness. For acute RAO, intravenous thrombolysis (IVT) can be considered as treatment. However, due to the rarity of RAO, data about IVT safety and effectiveness is limited. METHODS: From the multicenter database ThRombolysis for Ischemic Stroke Patients (TRISP), we retrospectively analyzed visual acuity (VA) at baseline and within 3 months in IVT and non-IVT treated RAO patients. Primary outcome was difference of VA between baseline and follow up (∆VA). Secondary outcomes were rates of visual recovery (defined as improvement of VA ⩾ 0.3 logMAR), and safety (symptomatic intracranial hemorrhage (sICH) according to ECASS II criteria, asymptomatic intracranial hemorrhage (ICH) and major extracranial bleeding). Statistical analysis was performed using parametric tests and a linear regression model adjusted for age, sex and baseline VA. RESULTS: We screened 200 patients with acute RAO and included 47 IVT and 34 non-IVT patients with complete information about recovery of vision. Visual Acuity at follow up significantly improved compared to baseline in IVT patients (∆VA 0.5 ± 0.8, p < 0.001) and non-IVT patients (∆VA 0.40 ± 1.1, p < 0.05). No significant differences in ∆VA and visual recovery rate were found between groups at follow up. Two asymptomatic ICH (4%) and one (2%) major extracranial bleeding (intraocular bleeding) occurred in the IVT group, while no bleeding events were reported in the non-IVT group. CONCLUSION: Our study provides real-life data from the largest cohort of IVT treated RAO patients published so far. While there is no evidence for superiority of IVT compared to conservative treatment, bleeding rates were low. A randomized controlled trial and standardized outcome assessments in RAO patients are justified to assess the net benefit of IVT in RAO.
Assuntos
Oclusão da Artéria Retiniana , Acidente Vascular Cerebral , Humanos , Acidente Vascular Cerebral/tratamento farmacológico , Estudos Retrospectivos , Terapia Trombolítica/efeitos adversos , Resultado do Tratamento , Hemorragias Intracranianas/etiologia , Oclusão da Artéria Retiniana/tratamento farmacológicoRESUMO
The advent of technological developments is allowing to gather large amounts of data in several research fields. Learning analytics (LA)/educational data mining has access to big observational unstructured data captured from educational settings and relies mostly on unsupervised machine learning (ML) algorithms to make sense of such type of data. Generalized additive models for location, scale, and shape (GAMLSS) are a supervised statistical learning framework that allows modeling all the parameters of the distribution of the response variable with respect to the explanatory variables. This article overviews the power and flexibility of GAMLSS in relation to some ML techniques. Also, GAMLSS' capability to be tailored toward causality via causal regularization is briefly commented. This overview is illustrated via a data set from the field of LA. This article is categorized under:Application Areas > Education and LearningAlgorithmic Development > StatisticsTechnologies > Machine Learning.
RESUMO
BACKGROUND: Despite evolving treatments, functional recovery in patients with large vessel occlusion stroke remains variable and outcome prediction challenging. Can we improve estimation of functional outcome with interpretable deep learning models using clinical and magnetic resonance imaging data? METHODS: In this observational study, we collected data of 222 patients with middle cerebral artery M1 segment occlusion who received mechanical thrombectomy. In a 5-fold cross validation, we evaluated interpretable deep learning models for predicting functional outcome in terms of modified Rankin scale at 3 months using clinical variables, diffusion weighted imaging and perfusion weighted imaging, and a combination thereof. Based on 50 test patients, we compared model performances to those of 5 experienced stroke neurologists. Prediction performance for ordinal (modified Rankin scale score, 0-6) and binary (modified Rankin scale score, 0-2 versus 3-6) functional outcome was assessed using discrimination and calibration measures like area under the receiver operating characteristic curve and accuracy (percentage of correctly classified patients). RESULTS: In the cross validation, the model based on clinical variables and diffusion weighted imaging achieved the highest binary prediction performance (area under the receiver operating characteristic curve, 0.766 [0.727-0.803]). Performance of models using clinical variables or diffusion weighted imaging only was lower. Adding perfusion weighted imaging did not improve outcome prediction. On the test set of 50 patients, binary prediction performance between model (accuracy, 60% [55.4%-64.4%]) and neurologists (accuracy, 60% [55.8%-64.21%]) was similar when using clinical data. However, models significantly outperformed neurologists when imaging data were provided, alone or in combination with clinical variables (accuracy, 72% [67.8%-76%] versus 64% [59.8%-68.4%] with clinical and imaging data). Prediction performance of neurologists with comparable experience varied strongly. CONCLUSIONS: We hypothesize that early prediction of functional outcome in large vessel occlusion stroke patients may be significantly improved if neurologists are supported by interpretable deep learning models.
Assuntos
Isquemia Encefálica , Aprendizado Profundo , AVC Isquêmico , Acidente Vascular Cerebral , Humanos , Neurologistas , Trombectomia/métodos , Acidente Vascular Cerebral/diagnóstico por imagem , Acidente Vascular Cerebral/cirurgia , Prognóstico , Resultado do Tratamento , Estudos Retrospectivos , Isquemia Encefálica/terapiaRESUMO
PURPOSE: Carbon dioxide (CO2) increases cerebral perfusion. The effect of CO2 on apnea tolerance, such as after anesthesia induction, is unknown. This study aimed to assess if cerebral apnea tolerance can be improved in obese patients under general anesthesia when comparing O2/Air (95%O2) to O2/CO2 (95%O2/5%CO2). METHODS: In this single-center, single-blinded, randomized crossover trial, 30 patients 18-65 years, with body mass index > 35 kg/m2, requiring general anesthesia for bariatric surgery, underwent two apneas that were preceded by ventilation with either O2/Air or O2/CO2 in random order. After anesthesia induction, intubation, and ventilation with O2/Air or O2/CO2 for 10 min, apnea was performed until the cerebral tissue oxygenation index (TOI) dropped by a relative 20% from baseline (primary endpoint) or oxygen saturation (SpO2) reached 80% (safety abortion criterion). The intervention was then repeated with the second substance. RESULTS: The safety criterion was reached in all patients before cerebral TOI decreased by 20%. The time until SpO2 dropped to 80% was similar in the two groups (+ 6 s with O2/CO2, 95%CI -7 to 19 s, p = 0.37). Cerebral TOI and PaO2 were higher after O2/CO2 (+ 1.5%; 95%CI: from 0.3 to 2.6; p = 0.02 and + 0.6 kPa; 95%CI: 0.1 to 1.1; p = 0.02). CONCLUSION: O2/CO2 improves cerebral TOI and PaO2 in anesthetized bariatric patients. Better apnea tolerance could not be confirmed.
Assuntos
Apneia , Dióxido de Carbono , Humanos , Estudos Cross-Over , Oxigênio , ObesidadeRESUMO
In many medical applications, interpretable models with high prediction performance are sought. Often, those models are required to handle semistructured data like tabular and image data. We show how to apply deep transformation models (DTMs) for distributional regression that fulfill these requirements. DTMs allow the data analyst to specify (deep) neural networks for different input modalities making them applicable to various research questions. Like statistical models, DTMs can provide interpretable effect estimates while achieving the state-of-the-art prediction performance of deep neural networks. In addition, the construction of ensembles of DTMs that retain model structure and interpretability allows quantifying epistemic and aleatoric uncertainty. In this study, we compare several DTMs, including baseline-adjusted models, trained on a semistructured data set of 407 stroke patients with the aim to predict ordinal functional outcome three months after stroke. We follow statistical principles of model-building to achieve an adequate trade-off between interpretability and flexibility while assessing the relative importance of the involved data modalities. We evaluate the models for an ordinal and dichotomized version of the outcome as used in clinical practice. We show that both tabular clinical and brain imaging data are useful for functional outcome prediction, whereas models based on tabular data only outperform those based on imaging data only. There is no substantial evidence for improved prediction when combining both data modalities. Overall, we highlight that DTMs provide a powerful, interpretable approach to analyzing semistructured data and that they have the potential to support clinical decision-making.
Assuntos
AVC Isquêmico , Acidente Vascular Cerebral , Humanos , Redes Neurais de Computação , PrognósticoRESUMO
Numerous software tools exist for data-independent acquisition (DIA) analysis of clinical samples, necessitating their comprehensive benchmarking. We present a benchmark dataset comprising real-world inter-patient heterogeneity, which we use for in-depth benchmarking of DIA data analysis workflows for clinical settings. Combining spectral libraries, DIA software, sparsity reduction, normalization, and statistical tests results in 1428 distinct data analysis workflows, which we evaluate based on their ability to correctly identify differentially abundant proteins. From our dataset, we derive bootstrap datasets of varying sample sizes and use the whole range of bootstrap datasets to robustly evaluate each workflow. We find that all DIA software suites benefit from using a gas-phase fractionated spectral library, irrespective of the library refinement used. Gas-phase fractionation-based libraries perform best against two out of three reference protein lists. Among all investigated statistical tests non-parametric permutation-based statistical tests consistently perform best.
Assuntos
Benchmarking , Proteômica , Humanos , Proteoma/análise , Proteômica/métodos , Software , Fluxo de TrabalhoRESUMO
Prediction models often fail if train and test data do not stem from the same distribution. Out-of-distribution (OOD) generalization to unseen, perturbed test data is a desirable but difficult-to-achieve property for prediction models and in general requires strong assumptions on the data generating process (DGP). In a causally inspired perspective on OOD generalization, the test data arise from a specific class of interventions on exogenous random variables of the DGP, called anchors. Anchor regression models, introduced by Rothenhäusler et al. (J R Stat Soc Ser B 83(2):215-246, 2021. 10.1111/rssb.12398), protect against distributional shifts in the test data by employing causal regularization. However, so far anchor regression has only been used with a squared-error loss which is inapplicable to common responses such as censored continuous or ordinal data. Here, we propose a distributional version of anchor regression which generalizes the method to potentially censored responses with at least an ordered sample space. To this end, we combine a flexible class of parametric transformation models for distributional regression with an appropriate causal regularizer under a more general notion of residuals. In an exemplary application and several simulation scenarios we demonstrate the extent to which OOD generalization is possible.
RESUMO
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) has become the most commonly used technique in explorative proteomic research. A variety of open-source tools for peptide-spectrum matching have become available. Most analyses of explorative MS data are performed using conventional settings, such as fully specific enzymatic constraints. Here we evaluated the impact of the fragment mass tolerance in combination with the enzymatic constraints on the performance of three search engines. Three open-source search engines (Myrimatch, X! Tandem, and MSGF+) were evaluated concerning the suitability in semi- and unspecific searches as well as the importance of accurate fragment mass spectra in non-specific peptide searches. We then performed a semispecific reanalysis of the published NCI-60 deep proteome data applying the most suited parameters. Semi- and unspecific LC-MS/MS data analyses particularly benefit from accurate fragment mass spectra while this effect is less pronounced for conventional, fully specific peptide-spectrum matching. Search speed differed notably between the three search engines for semi- and non-specific peptide-spectrum matching. Semispecific reanalysis of NCI-60 proteome data revealed hundreds of previously undescribed N-terminal peptides, including cases of proteolytic processing or likely alternative translation start sites, some of which were ubiquitously present in all cell lines of the reanalyzed panel. Highly accurate MS2 fragment data in combination with modern open-source search algorithms enable the confident identification of semispecific peptides from large proteomic datasets. The identification of previously undescribed N-terminal peptides in published studies highlights the potential of future reanalysis and data mining in proteomic datasets.
RESUMO
The obligate intracellular bacterium Chlamydia trachomatis replicates in a cytosolic vacuole in human epithelial cells. Infection of human cells with C. trachomatis causes substantial changes to many host cell-signalling pathways, but the molecular basis of such influence is not well understood. Studies of gene transcription of the infected cell have shown altered transcription of many host cell genes, indicating a transcriptional response of the host cell to the infection. We here describe that infection of HeLa cells with C. trachomatis as well as infection of murine cells with Chlamydia muridarum substantially inhibits protein synthesis of the infected host cell. This inhibition was accompanied by changes to the ribosomal profile of the infected cell indicative of a block of translation initiation, most likely as part of a stress response. The Chlamydia protease-like activity factor (CPAF) also reduced protein synthesis in uninfected cells, although CPAF-deficient C. trachomatis showed no defect in this respect. Analysis of polysomal mRNA as a proxy of actively transcribed mRNA identified a number of biological processes differentially affected by chlamydial infection. Mapping of differentially regulated genes onto a protein interaction network identified nodes of up- and down-regulated networks during chlamydial infection. Proteomic analysis of protein synthesis further suggested translational regulation of host cell functions by chlamydial infection. These results demonstrate reprogramming of the host cell during chlamydial infection through the alteration of protein synthesis.
Assuntos
Chlamydia trachomatis/patogenicidade , Animais , Endopeptidases/metabolismo , Células HeLa , Interações Hospedeiro-Patógeno , Humanos , Camundongos , Biossíntese de Proteínas/fisiologia , Proteômica/métodos , RNA Mensageiro/metabolismo , Transdução de Sinais/fisiologiaRESUMO
Copper (Cu) is a bioelement essential for a myriad of enzymatic reactions, which when present in high concentration leads to cytotoxicity. Whereas Cu toxicity is usually assumed to originate from the metal's ability to enhance lipid peroxidation, the role of oxidative stress has remained uncertain since no antioxidant therapy has ever been effective. Here we show that Cu overload induces cell death independently of the metal's ability to oxidize the intracellular milieu. In fact, cells neither lose control of their thiol homeostasis until briefly before the onset of cell death, nor trigger a consistent antioxidant response. As expected, glutathione (GSH) protects the cell from Cu-mediated cytotoxicity but, surprisingly, fully independent of its reactive thiol. Moreover, the oxidation state of extracellular Cu is irrelevant as cells accumulate the metal as cuprous ions. We provide evidence that cell death is driven by the interaction of cuprous ions with proteins which impairs protein folding and promotes aggregation. Consequently, cells mostly react to Cu by mounting a heat shock response and trying to restore protein homeostasis. The protective role of GSH is based on the binding of cuprous ions, thus preventing the metal interaction with proteins. Due to the high intracellular content of GSH, it is depleted near the Cu entry site, and hence Cu can interact with proteins and cause aggregation and cytotoxicity immediately below the plasma membrane.
Assuntos
Morte Celular , Cobre/toxicidade , Fibroblastos/efeitos dos fármacos , Glutationa/farmacologia , Neoplasias/prevenção & controle , Estresse Oxidativo , Dobramento de Proteína , Animais , Biomarcadores/química , Biomarcadores/metabolismo , Células Cultivadas , Fibroblastos/metabolismo , Fibroblastos/patologia , Perfilação da Expressão Gênica , Humanos , Peroxidação de Lipídeos , Camundongos , Neoplasias/metabolismo , Neoplasias/patologia , Agregados Proteicos/efeitos dos fármacos , Espécies Reativas de Oxigênio/metabolismoRESUMO
BACKGROUND: Renal oncocytomas (ROs) are benign epithelial tumors of the kidney whereas chromophobe renal cell carcinoma (chRCCs) are malignant renal tumors. The latter constitute 5-7% of renal neoplasias. ROs and chRCCs show pronounced molecular and histological similarities, which renders their differentiation demanding. We aimed for the differential proteome profiling of ROs and early-stage chRCCs in order to better understand distinguishing protein patterns. METHODS: We employed formalin-fixed, paraffin-embedded samples (six RO cases, six chRCC cases) together with isotopic triplex dimethylation and a pooled reference standard to enable cohort-wide quantitative comparison. For lysosomal-associated membrane protein 1 (LAMP1) and integrin alpha-V (ITGAV) we performed corroborative immunohistochemistry (IHC) in an extended cohort of 42 RO cases and 31 chRCC cases. RESULTS: At 1% false discovery rate, we identified > 3900 proteins, of which > 2400 proteins were consistently quantified in at least four RO and four chRCC cases. The proteomic expression profiling discriminated ROs and chRCCs and highlighted established features such as accumulation of mitochondrial proteins in ROs together with emphasizing the accumulation of endo-lysosomal proteins in chRCCs. In line with the proteomic data, IHC showed enrichment of LAMP1 in chRCC and of ITGAV in RO. CONCLUSION: We present one of the first differential proteome profiling studies on ROs and chRCCs and highlight differential abundance of LAMP1 and ITGAV in these renal tumors.