Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 114
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Diabetologia ; 67(5): 885-894, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38374450

RESUMO

AIMS/HYPOTHESIS: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value. METHODS: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel's C statistic. RESULTS: Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0-11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3-11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HDL-cholesterol and C-peptide performed better in both cohorts (DCS, C 0.74 [95% CI 0.67, 0.81]; GoDARTS, C 0.73 [95% CI 0.69, 0.77]). Two proteins, lactadherin and proto-oncogene tyrosine-protein kinase receptor, were most consistently selected and slightly improved model performance. CONCLUSIONS/INTERPRETATION: Using machine learning approaches, we show that insulin requirement risk can be modestly well predicted by predominantly clinical variables. Inclusion of molecular markers improves the prognostic performance beyond that of clinical variables by up to 5%. Such prognostic models could be useful for identifying people with diabetes at high risk of progressing quickly to treatment intensification. DATA AVAILABILITY: Summary statistics of lipidomic, proteomic and metabolomic data are available from a Shiny dashboard at https://rhapdata-app.vital-it.ch .


Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/metabolismo , Estudos Prospectivos , Peptídeo C , Proteômica , Insulina/uso terapêutico , Biomarcadores , Aprendizado de Máquina , Colesterol
2.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-37951587

RESUMO

MOTIVATION: In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. RESULTS: We propose an approach for integrating multiple sources of such prior information into penalized regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. AVAILABILITY AND IMPLEMENTATION: The proposed method is implemented in the R package transreg (https://github.com/lcsb-bds/transreg, https://cran.r-project.org/package=transreg).


Assuntos
Software , Simulação por Computador , Ciência da Implementação
3.
Proc Natl Acad Sci U S A ; 118(49)2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34873056

RESUMO

Preclinical models have been the workhorse of cancer research, producing massive amounts of drug response data. Unfortunately, translating response biomarkers derived from these datasets to human tumors has proven to be particularly challenging. To address this challenge, we developed TRANSACT, a computational framework that builds a consensus space to capture biological processes common to preclinical models and human tumors and exploits this space to construct drug response predictors that robustly transfer from preclinical models to human tumors. TRANSACT performs favorably compared to four competing approaches, including two deep learning approaches, on a set of 23 drug prediction challenges on The Cancer Genome Atlas and 226 metastatic tumors from the Hartwig Medical Foundation. We demonstrate that response predictions deliver a robust performance for a number of therapies of high clinical importance: platinum-based chemotherapies, gemcitabine, and paclitaxel. In contrast to other approaches, we demonstrate the interpretability of the TRANSACT predictors by correctly identifying known biomarkers of targeted therapies, and we propose potential mechanisms that mediate the resistance to two chemotherapeutic agents.


Assuntos
Ensaios de Seleção de Medicamentos Antitumorais/métodos , Perfilação da Expressão Gênica/métodos , Animais , Antineoplásicos/uso terapêutico , Biomarcadores Farmacológicos/metabolismo , Linhagem Celular Tumoral/efeitos dos fármacos , Aprendizado Profundo , Modelos Animais de Doenças , Previsões/métodos , Xenoenxertos , Humanos , Modelos Teóricos
4.
BMC Bioinformatics ; 24(1): 172, 2023 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-37101151

RESUMO

BACKGROUND: High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior information in the form of co-data, providing complementary data not on the samples, but on the variables. We consider adaptive ridge penalised generalised linear and Cox models, in which the variable-specific ridge penalties are adapted to the co-data to give a priori more weight to more important variables. The R-package ecpc originally accommodated various and possibly multiple co-data sources, including categorical co-data, i.e. groups of variables, and continuous co-data. Continuous co-data, however, were handled by adaptive discretisation, potentially inefficiently modelling and losing information. As continuous co-data such as external p values or correlations often arise in practice, more generic co-data models are needed. RESULTS: Here, we present an extension to the method and software for generic co-data models, particularly for continuous co-data. At the basis lies a classical linear regression model, regressing prior variance weights on the co-data. Co-data variables are then estimated with empirical Bayes moment estimation. After placing the estimation procedure in the classical regression framework, extension to generalised additive and shape constrained co-data models is straightforward. Besides, we show how ridge penalties may be transformed to elastic net penalties. In simulation studies we first compare various co-data models for continuous co-data from the extension to the original method. Secondly, we compare variable selection performance to other variable selection methods. The extension is faster than the original method and shows improved prediction and variable selection performance for non-linear co-data relations. Moreover, we demonstrate use of the package in several genomics examples throughout the paper. CONCLUSIONS: The R-package ecpc accommodates linear, generalised additive and shape constrained additive co-data models for the purpose of improved high-dimensional prediction and variable selection. The extended version of the package as presented here (version number 3.1.1 and higher) is available on ( https://cran.r-project.org/web/packages/ecpc/ ).


Assuntos
Genômica , Software , Teorema de Bayes , Simulação por Computador , Modelos Lineares
5.
Eur Radiol ; 33(4): 2850-2860, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36460924

RESUMO

OBJECTIVES: To externally validate a pre-treatment MR-based radiomics model predictive of locoregional control in oropharyngeal squamous cell carcinoma (OPSCC) and to assess the impact of differences between datasets on the predictive performance. METHODS: Radiomic features, as defined in our previously published radiomics model, were extracted from the primary tumor volumes of 157 OPSCC patients in a different institute. The developed radiomics model was validated using this cohort. Additionally, parameters influencing performance, such as patient subgroups, MRI acquisition, and post-processing steps on prediction performance will be investigated. For this analysis, matched subgroups (based on human papillomavirus (HPV) status of the tumor, T-stage, and tumor subsite) and a subgroup with only patients with 4-mm slice thickness were studied. Also the influence of harmonization techniques (ComBat harmonization, quantile normalization) and the impact of feature stability across observers and centers were studied. Model performances were assessed by area under the curve (AUC), sensitivity, and specificity. RESULTS: Performance of the published model (AUC/sensitivity/specificity: 0.74/0.75/0.60) drops when applied on the validation cohort (AUC/sensitivity/specificity: 0.64/0.68/0.60). The performance of the full validation cohort improves slightly when the model is validated using a patient group with comparable HPV status of the tumor (AUC/sensitivity/specificity: 0.68/0.74/0.60), using patients acquired with a slice thickness of 4 mm (AUC/sensitivity/specificity: 0.67/0.73/0.57), or when quantile harmonization was performed (AUC/sensitivity/specificity: 0.66/0.69/0.60). CONCLUSION: The previously published model shows its generalizability and can be applied on data acquired from different vendors and protocols. Harmonization techniques and subgroup definition influence performance of predictive radiomics models. KEY POINTS: • Radiomics, a noninvasive quantitative image analysis technique, can support the radiologist by enhancing diagnostic accuracy and/or treatment decision-making. • A previously published model shows its generalizability and could be applied on data acquired from different vendors and protocols.


Assuntos
Neoplasias Orofaríngeas , Infecções por Papillomavirus , Humanos , Imageamento por Ressonância Magnética/métodos , Sensibilidade e Especificidade , Neoplasias Orofaríngeas/diagnóstico por imagem , Estudos Retrospectivos
6.
Neuroradiology ; 65(4): 855-863, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36574026

RESUMO

PURPOSE: Patients with vanishing white matter (VWM) experience unremitting chronic neurological decline and stress-provoked episodes of rapid, partially reversible decline. Cerebral white matter abnormalities are progressive, without improvement, and are therefore unlikely to be related to the episodes. We determined which radiological findings are related to episodic decline. METHODS: MRI scans of VWM patients were retrospectively analyzed. Patients were grouped into A (never episodes) and B (episodes). Signal abnormalities outside the cerebral white matter were rated as absent, mild, or severe. A sum score was developed with abnormalities only seen in group B. The temporal relationship between signal abnormalities and episodes was determined by subdividing scans into those made before, less than 3 months after, and more than 3 months after onset of an episode. RESULTS: Five hundred forty-three examinations of 298 patients were analyzed. Mild and severe signal abnormalities in the caudate nucleus, putamen, globus pallidus, thalamus, midbrain, medulla oblongata, and severe signal abnormalities in the pons were only seen in group B. The sum score, constructed with these abnormalities, depended on the timing of the scan (χ2(2, 400) = 22.8; p < .001): it was least often abnormal before, most often abnormal with the highest value shortly after, and lower longer than 3 months after an episode. CONCLUSION: In VWM, signal abnormalities in brainstem, thalamus, and basal ganglia are related to episodic decline and can improve. Knowledge of the natural MRI history in VWM is important for clinical interpretation of MRI findings and crucial in therapy trials.


Assuntos
Leucoencefalopatias , Substância Branca , Leucoencefalopatias/diagnóstico por imagem , Leucoencefalopatias/patologia , Substância Branca/diagnóstico por imagem , Substância Branca/patologia , Humanos , Masculino , Feminino , Recém-Nascido , Lactente , Pré-Escolar , Criança , Adolescente , Adulto Jovem , Adulto , Pessoa de Meia-Idade , Imageamento por Ressonância Magnética
7.
Biostatistics ; 22(4): 723-737, 2021 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-31886488

RESUMO

In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (i) $p$-values from a previous study and (ii) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection but is not straightforward. We propose a group-regularized (logistic) elastic net regression method, where each penalty parameter corresponds to a group of features based on the external information. The method, termed gren, makes use of the Bayesian formulation of logistic elastic net regression to estimate both the model and penalty parameters in an approximate empirical-variational Bayes framework. Simulations and applications to three cancer genomics studies and one Alzheimer metabolomics study show that, if the partitioning of the features is informative, classification performance, and feature selection are indeed enhanced.


Assuntos
Genômica , Neoplasias , Teorema de Bayes , Humanos , Modelos Logísticos , Análise de Regressão
8.
Bioinformatics ; 37(14): 2012-2016, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-32437519

RESUMO

MOTIVATION: Machine learning in the biomedical sciences should ideally provide predictive and interpretable models. When predicting outcomes from clinical or molecular features, applied researchers often want to know which features have effects, whether these effects are positive or negative and how strong these effects are. Regression analysis includes this information in the coefficients but typically renders less predictive models than more advanced machine learning techniques. RESULTS: Here, we propose an interpretable meta-learning approach for high-dimensional regression. The elastic net provides a compromise between estimating weak effects for many features and strong effects for some features. It has a mixing parameter to weight between ridge and lasso regularization. Instead of selecting one weighting by tuning, we combine multiple weightings by stacking. We do this in a way that increases predictivity without sacrificing interpretability. AVAILABILITY AND IMPLEMENTATION: The R package starnet is available on GitHub (https://github.com/rauschenberger/starnet) and CRAN (https://CRAN.R-project.org/package=starnet).


Assuntos
Aprendizado de Máquina , Software , Humanos , Análise de Regressão
9.
Biom J ; 64(7): 1289-1306, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35730912

RESUMO

The features in a high-dimensional biomedical prediction problem are often well described by low-dimensional latent variables (or factors). We use this to include unlabeled features and additional information on the features when building a prediction model. Such additional feature information is often available in biomedical applications. Examples are annotation of genes, metabolites, or p-values from a previous study. We employ a Bayesian factor regression model that jointly models the features and the outcome using Gaussian latent variables. We fit the model using a computationally efficient variational Bayes method, which scales to high dimensions. We use the extra information to set up a prior model for the features in terms of hyperparameters, which are then estimated through empirical Bayes. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predicts oral cancer metastasis from RNAseq data.


Assuntos
Algoritmos , Projetos de Pesquisa , Teorema de Bayes , Distribuição Normal
10.
Int J Mol Sci ; 23(19)2022 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-36232432

RESUMO

Patients with inflammatory bowel disease (IBD) produce enhanced immunoglobulin A (IgA) against the microbiota compared to healthy individuals, which has been correlated with disease severity. Since IgA complexes can potently activate myeloid cells via the IgA receptor FcαRI (CD89), excessive IgA production may contribute to IBD pathology. However, the cellular mechanisms that contribute to dysregulated IgA production in IBD are poorly understood. Here, we demonstrate that intestinal FcαRI-expressing myeloid cells (i.e., monocytes and neutrophils) are in close contact with B lymphocytes in the lamina propria of IBD patients. Furthermore, stimulation of FcαRI-on monocytes triggered production of cytokines and chemokines that regulate B-cell differentiation and migration, including interleukin-6 (IL6), interleukin-10 (IL10), tumour necrosis factor-α (TNFα), a proliferation-inducing ligand (APRIL), and chemokine ligand-20 (CCL20). In vitro, these cytokines promoted IgA isotype switching in human B cells. Moreover, when naïve B lymphocytes were cultured in vitro in the presence of FcαRI-stimulated monocytes, enhanced IgA isotype switching was observed compared to B cells that were cultured with non-stimulated monocytes. Taken together, FcαRI-activated monocytes produced a cocktail of cytokines, as well as chemokines, that stimulated IgA switching in B cells, and close contact between B cells and myeloid cells was observed in the colons of IBD patients. As such, we hypothesize that, in IBD, IgA complexes activate myeloid cells, which in turn can result in excessive IgA production, likely contributing to disease pathology. Interrupting this loop may, therefore, represent a novel therapeutic strategy.


Assuntos
Doenças Inflamatórias Intestinais , Interleucina-10 , Linfócitos B , Citocinas , Humanos , Imunoglobulina A , Switching de Imunoglobulina , Isotipos de Imunoglobulinas , Interleucina-6 , Ligantes , Monócitos , Fator de Necrose Tumoral alfa
11.
Stat Med ; 40(26): 5910-5925, 2021 11 20.
Artigo em Inglês | MEDLINE | ID: mdl-34438466

RESUMO

Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co-data, provide information on the covariates, such as genomic location or P-values from external studies. We use multiple and various co-data to define possibly overlapping or hierarchically structured groups of covariates. These are then used to estimate adaptive multi-group ridge penalties for generalized linear and Cox models. Available group adaptive methods primarily target for settings with few groups, and therefore likely overfit for non-informative, correlated or many groups, and do not account for known structure on group level. To handle these issues, our method combines empirical Bayes estimation of the hyperparameters with an extra level of flexible shrinkage. This renders a uniquely flexible framework as any type of shrinkage can be used on the group level. We describe various types of co-data and propose suitable forms of hypershrinkage. The method is very versatile, as it allows for integration and weighting of multiple co-data sets, inclusion of unpenalized covariates and posterior variable selection. For three cancer genomics applications we demonstrate improvements compared to other models in terms of performance, variable selection stability and validation.


Assuntos
Genômica , Teorema de Bayes , Humanos , Modelos de Riscos Proporcionais
12.
J Pathol ; 250(3): 288-298, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31784980

RESUMO

Screening to detect colorectal cancer (CRC) in an early or premalignant state is an effective method to reduce CRC mortality rates. Current stool-based screening tests, e.g. fecal immunochemical test (FIT), have a suboptimal sensitivity for colorectal adenomas and difficulty distinguishing adenomas at high risk of progressing to cancer from those at lower risk. We aimed to identify stool protein biomarker panels that can be used for the early detection of high-risk adenomas and CRC. Proteomics data (LC-MS/MS) were collected on stool samples from adenoma (n = 71) and CRC patients (n = 81) as well as controls (n = 129). Colorectal adenoma tissue samples were characterized by low-coverage whole-genome sequencing to determine their risk of progression based on specific DNA copy number changes. Proteomics data were used for logistic regression modeling to establish protein biomarker panels. In total, 15 of the adenomas (15.8%) were defined as high risk of progressing to cancer. A protein panel, consisting of haptoglobin (Hp), LAMP1, SYNE2, and ANXA6, was identified for the detection of high-risk adenomas (sensitivity of 53% at specificity of 95%). Two panels, one consisting of Hp and LRG1 and one of Hp, LRG1, RBP4, and FN1, were identified for high-risk adenomas and CRCs detection (sensitivity of 66% and 62%, respectively, at specificity of 95%). Validation of Hp as a biomarker for high-risk adenomas and CRCs was performed using an antibody-based assay in FIT samples from a subset of individuals from the discovery series (n = 158) and an independent validation series (n = 795). Hp protein was significantly more abundant in high-risk adenoma FIT samples compared to controls in the discovery (p = 0.036) and the validation series (p = 9e-5). We conclude that Hp, LAMP1, SYNE2, LRG1, RBP4, FN1, and ANXA6 may be of value as stool biomarkers for early detection of high-risk adenomas and CRCs. © 2019 Authors. Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.


Assuntos
Adenoma/diagnóstico , Biomarcadores Tumorais/metabolismo , Neoplasias Colorretais/diagnóstico , Detecção Precoce de Câncer/métodos , Fezes , Adenoma/metabolismo , Cromatografia Líquida , Neoplasias Colorretais/metabolismo , Progressão da Doença , Humanos , Proteômica , Sensibilidade e Especificidade , Espectrometria de Massas em Tandem
13.
Biom J ; 63(2): 289-304, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33155717

RESUMO

In precision medicine, a common problem is drug sensitivity prediction from cancer tissue cell lines. These types of problems entail modelling multivariate drug responses on high-dimensional molecular feature sets in typically >1000 cell lines. The dimensions of the problem require specialised models and estimation methods. In addition, external information on both the drugs and the features is often available. We propose to model the drug responses through a linear regression with shrinkage enforced through a normal inverse Gaussian prior. We let the prior depend on the external information, and estimate the model and external information dependence in an empirical-variational Bayes framework. We demonstrate the usefulness of this model in both a simulated setting and in the publicly available Genomics of Drug Sensitivity in Cancer data.


Assuntos
Genômica , Preparações Farmacêuticas , Teorema de Bayes , Distribuição Normal , Medicina de Precisão
14.
Bioinformatics ; 35(14): i510-i519, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510654

RESUMO

MOTIVATION: Cell lines and patient-derived xenografts (PDXs) have been used extensively to understand the molecular underpinnings of cancer. While core biological processes are typically conserved, these models also show important differences compared to human tumors, hampering the translation of findings from pre-clinical models to the human setting. In particular, employing drug response predictors generated on data derived from pre-clinical models to predict patient response remains a challenging task. As very large drug response datasets have been collected for pre-clinical models, and patient drug response data are often lacking, there is an urgent need for methods that efficiently transfer drug response predictors from pre-clinical models to the human setting. RESULTS: We show that cell lines and PDXs share common characteristics and processes with human tumors. We quantify this similarity and show that a regression model cannot simply be trained on cell lines or PDXs and then applied on tumors. We developed PRECISE, a novel methodology based on domain adaptation that captures the common information shared amongst pre-clinical models and human tumors in a consensus representation. Employing this representation, we train predictors of drug response on pre-clinical data and apply these predictors to stratify human tumors. We show that the resulting domain-invariant predictors show a small reduction in predictive performance in the pre-clinical domain but, importantly, reliably recover known associations between independent biomarkers and their companion drugs on human tumors. AVAILABILITY AND IMPLEMENTATION: PRECISE and the scripts for running our experiments are available on our GitHub page (https://github.com/NKI-CCB/PRECISE). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antineoplásicos , Neoplasias , Animais , Antineoplásicos/farmacologia , Fenômenos Biológicos , Modelos Animais de Doenças , Previsões , Humanos , Neoplasias/tratamento farmacológico , Software
15.
Eur Radiol ; 30(11): 6311-6321, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-32500196

RESUMO

OBJECTIVES: Head and neck squamous cell carcinoma (HNSCC) shows a remarkable heterogeneity between tumors, which may be captured by a variety of quantitative features extracted from diagnostic images, termed radiomics. The aim of this study was to develop and validate MRI-based radiomic prognostic models in oral and oropharyngeal cancer. MATERIALS AND METHODS: Native T1-weighted images of four independent, retrospective (2005-2013), patient cohorts (n = 102, n = 76, n = 89, and n = 56) were used to delineate primary tumors, and to extract 545 quantitative features from. Subsequently, redundancy filtering and factor analysis were performed to handle collinearity in the data. Next, radiomic prognostic models were trained and validated to predict overall survival (OS) and relapse-free survival (RFS). Radiomic features were compared to and combined with prognostic models based on standard clinical parameters. Performance was assessed by integrated area under the curve (iAUC). RESULTS: In oral cancer, the radiomic model showed an iAUC of 0.69 (OS) and 0.70 (RFS) in the validation cohort, whereas the iAUC in the oropharyngeal cancer validation cohort was 0.71 (OS) and 0.74 (RFS). By integration of radiomic and clinical variables, the most accurate models were defined (iAUC oral cavity, 0.72 (OS) and 0.74 (RFS); iAUC oropharynx, 0.81 (OS) and 0.78 (RFS)), and these combined models outperformed prognostic models based on standard clinical variables only (p < 0.001). CONCLUSIONS: MRI radiomics is feasible in HNSCC despite the known variability in MRI vendors and acquisition protocols, and radiomic features added information to prognostic models based on clinical parameters. KEY POINTS: • MRI radiomics can predict overall survival and relapse-free survival in oral and HPV-negative oropharyngeal cancer. • MRI radiomics provides additional prognostic information to known clinical variables, with the best performance of the combined models. • Variation in MRI vendors and acquisition protocols did not influence performance of radiomic prognostic models.


Assuntos
Neoplasias de Cabeça e Pescoço/diagnóstico por imagem , Imageamento por Ressonância Magnética , Recidiva Local de Neoplasia/diagnóstico por imagem , Radiometria , Carcinoma de Células Escamosas de Cabeça e Pescoço/diagnóstico por imagem , Idoso , Área Sob a Curva , Biomarcadores , Comorbidade , Intervalo Livre de Doença , Análise Fatorial , Feminino , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Variações Dependentes do Observador , Prognóstico , Reprodutibilidade dos Testes , Estudos Retrospectivos , Resultado do Tratamento
16.
BMC Biol ; 17(1): 50, 2019 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-31234833

RESUMO

BACKGROUND: Identification of imprinted genes, demonstrating a consistent preference towards the paternal or maternal allelic expression, is important for the understanding of gene expression regulation during embryonic development and of the molecular basis of developmental disorders with a parent-of-origin effect. Combining allelic analysis of RNA-Seq data with phased genotypes in family trios provides a powerful method to detect parent-of-origin biases in gene expression. RESULTS: We report findings in 296 family trios from two large studies: 165 lymphoblastoid cell lines from the 1000 Genomes Project and 131 blood samples from the Genome of the Netherlands (GoNL) participants. Based on parental haplotypes, we identified > 2.8 million transcribed heterozygous SNVs phased for parental origin and developed a robust statistical framework for measuring allelic expression. We identified a total of 45 imprinted genes and one imprinted unannotated transcript, including multiple imprinted transcripts showing incomplete parental expression bias that was located adjacent to strongly imprinted genes. For example, PXDC1, a gene which lies adjacent to the paternally expressed gene FAM50B, shows a 2:1 paternal expression bias. Other imprinted genes had promoter regions that coincide with sites of parentally biased DNA methylation identified in the blood from uniparental disomy (UPD) samples, thus providing independent validation of our results. Using the stranded nature of the RNA-Seq data in lymphoblastoid cell lines, we identified multiple loci with overlapping sense/antisense transcripts, of which one is expressed paternally and the other maternally. Using a sliding window approach, we searched for imprinted expression across the entire genome, identifying a novel imprinted putative lncRNA in 13q21.2. Overall, we identified 7 transcripts showing parental bias in gene expression which were not reported in 4 other recent RNA-Seq studies of imprinting. CONCLUSIONS: Our methods and data provide a robust and high-resolution map of imprinted gene expression in the human genome.


Assuntos
Alelos , Expressão Gênica/genética , Impressão Genômica/genética , Haplótipos/genética , Análise Química do Sangue , Linhagem Celular , Humanos , Análise de Sequência de RNA
17.
Int J Cancer ; 144(2): 372-379, 2019 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-30192375

RESUMO

Offering self-sampling for HPV testing improves the effectiveness of current cervical screening programs by increasing population coverage. Molecular markers directly applicable on self-samples are needed to stratify HPV-positive women at risk of cervical cancer (so-called triage) and to avoid over-referral and overtreatment. Deregulated microRNAs (miRNAs) have been implicated in the development of cervical cancer, and represent potential triage markers. However, it is unknown whether deregulated miRNA expression is reflected in self-samples. Our study is the first to establish genome-wide miRNA profiles in HPV-positive self-samples to identify miRNAs that can predict the presence of CIN3 and cervical cancer in self-samples. Small RNA sequencing (sRNA-Seq) was conducted to determine genome-wide miRNA expression profiles in 74 HPV-positive self-samples of women with and without cervical precancer (CIN3). The optimal miRNA marker panel for CIN3 detection was determined by GRridge, a penalized method on logistic regression. Six miRNAs were validated by qPCR in 191 independent HPV-positive self-samples. Classification of sRNA-Seq data yielded a 9-miRNA marker panel with a combined area under the curve (AUC) of 0.89 for CIN3 detection. Validation by qPCR resulted in a combined AUC of 0.78 for CIN3+ detection. Our study shows that deregulated miRNA expression associated with CIN3 and cervical cancer development can be detected by sRNA-Seq in HPV-positive self-samples. Validation by qPCR indicates that miRNA expression analysis offers a promising novel molecular triage strategy for CIN3 and cervical cancer detection applicable to self-samples.


Assuntos
Triagem e Testes Direto ao Consumidor/métodos , Detecção Precoce de Câncer/métodos , Infecções por Papillomavirus/diagnóstico , Displasia do Colo do Útero/diagnóstico , Neoplasias do Colo do Útero/diagnóstico , Adulto , Feminino , Estudo de Associação Genômica Ampla , Humanos , MicroRNAs/análise , Pessoa de Meia-Idade , Sensibilidade e Especificidade , Neoplasias do Colo do Útero/genética , Neoplasias do Colo do Útero/virologia , Esfregaço Vaginal/métodos , Displasia do Colo do Útero/genética , Displasia do Colo do Útero/virologia
18.
Bioinformatics ; 33(10): 1572-1574, 2017 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-28073760

RESUMO

SUMMARY: Our aim is to improve omics based prediction and feature selection using multiple sources of auxiliary information: co-data. Adaptive group regularized ridge regression (GRridge) was proposed to achieve this by estimating additional group-based penalty parameters through an empirical Bayes method at a low computational cost. We illustrate the GRridge method and software on RNA sequencing datasets. The method boosts the performance of an ordinary ridge regression and outperforms other classifiers. Post-hoc feature selection maintains the predictive ability of the classifier with far fewer markers. AVAILABILITY AND IMPLEMENTATION: GRridge is an R package that includes a vignette. It is freely available at ( https://bioconductor.org/packages/GRridge/ ). All information and R scripts used in this study, including those on retrieval and processing of the co-data, are available from http://github.com/markvdwiel/GRridgeCodata . CONTACT: mark.vdwiel@vumc.nl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Modelos Genéticos , Análise de Sequência de RNA/métodos , Software , Teorema de Bayes , Detecção Precoce de Câncer/métodos , Feminino , Humanos , Anotação de Sequência Molecular , Neoplasias do Colo do Útero/diagnóstico , Neoplasias do Colo do Útero/genética
19.
BMC Bioinformatics ; 18(1): 584, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29281963

RESUMO

BACKGROUND: Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting. RESULTS: Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. CONCLUSION: The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.


Assuntos
Algoritmos , Bases de Dados como Assunto , Teorema de Bayes , Humanos , Neoplasias/genética , Curva ROC , Fatores de Tempo
20.
Genome Res ; 24(12): 2022-32, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25236618

RESUMO

Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ∼0.1× genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost.


Assuntos
Biologia Computacional , Variações do Número de Cópias de DNA , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Composição de Bases , Linhagem Celular Tumoral , Hibridização Genômica Comparativa , Biologia Computacional/métodos , Genômica/métodos , Humanos , Neoplasias/genética , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA