RESUMEN
The application of deep learning to spatial transcriptomics (ST) can reveal relationships between gene expression and tissue architecture. Prior work has demonstrated that inferring gene expression from tissue histomorphology can discern these spatial molecular markers to enable population scale studies, reducing the fiscal barriers associated with large-scale spatial profiling. However, while most improvements in algorithmic performance have focused on improving model architectures, little is known about how the quality of tissue preparation and imaging can affect deep learning model training for spatial inference from morphology and its potential for widespread clinical adoption. Prior studies for ST inference from histology typically utilize manually stained frozen sections with imaging on non-clinical grade scanners. Training such models on ST cohorts is also costly. We hypothesize that adopting tissue processing and imaging practices that mirror standards for clinical implementation (permanent sections, automated tissue staining, and clinical grade scanning) can significantly improve model performance. An enhanced specimen processing and imaging protocol was developed for deep learning-based ST inference from morphology. This protocol featured the Visium CytAssist assay to permit automated hematoxylin and eosin staining (e.g. Leica Bond), 40×-resolution imaging, and joining of multiple patients' tissue sections per capture area prior to ST profiling. Using a cohort of 13 pathologic T Stage-III stage colorectal cancer patients, we compared the performance of models trained on slide prepared using enhanced versus traditional (i.e. manual staining and low-resolution imaging) protocols. Leveraging Inceptionv3 neural networks, we predicted gene expression across serial, histologically-matched tissue sections using whole slide images (WSI) from both protocols. The data Shapley was used to quantify and compare marginal performance gains on a patient-by-patient basis attributed to using the enhanced protocol versus the actual costs of spatial profiling. Findings indicate that training and validating on WSI acquired through the enhanced protocol as opposed to the traditional method resulted in improved performance at lower fiscal cost. In the realm of ST, the enhancement of deep learning architectures frequently captures the spotlight; however, the significance of specimen processing and imaging is often understated. This research, informed through a game-theoretic lens, underscores the substantial impact that specimen preparation/imaging can have on spatial transcriptomic inference from morphology. It is essential to integrate such optimized processing protocols to facilitate the identification of prognostic markers at a larger scale.
Asunto(s)
Aprendizaje Profundo , Transcriptoma , Humanos , Perfilación de la Expresión Génica/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Neoplasias Colorrectales/diagnóstico por imagenRESUMEN
Over 150,000 Americans are diagnosed with colorectal cancer (CRC) every year, and annually >50,000 individuals are estimated to die of CRC, necessitating improvements in screening, prognostication, disease management, and therapeutic options. CRC tumors are removed en bloc with surrounding vasculature and lymphatics. Examination of regional lymph nodes at the time of surgical resection is essential for prognostication. Developing alternative approaches to indirectly assess recurrence risk would have utility in cases where lymph node yield is incomplete or inadequate. Spatially dependent, immune cell-specific (eg, tumor-infiltrating lymphocytes), proteomic, and transcriptomic expression patterns inside and around the tumor-the tumor immune microenvironment-can predict nodal/distant metastasis and probe the coordinated immune response from the primary tumor site. The comprehensive characterization of tumor-infiltrating lymphocytes and other immune infiltrates is possible using highly multiplexed spatial omics technologies, such as the GeoMX Digital Spatial Profiler. In this study, machine learning and differential co-expression analyses helped identify biomarkers from Digital Spatial Profiler-assayed protein expression patterns inside, at the invasive margin, and away from the tumor, associated with extracellular matrix remodeling (eg, granzyme B and fibronectin), immune suppression (eg, forkhead box P3), exhaustion and cytotoxicity (eg, CD8), Programmed death ligand 1-expressing dendritic cells, and neutrophil proliferation, among other concomitant alterations. Further investigation of these biomarkers may reveal independent risk factors of CRC metastasis that can be formulated into low-cost, widely available assays.
Asunto(s)
Neoplasias del Colon , Neoplasias Colorrectales , Humanos , Proteómica , Neoplasias Colorrectales/metabolismo , Biomarcadores/metabolismo , Ganglios Linfáticos , Neoplasias del Colon/patología , Linfocitos Infiltrantes de Tumor , Microambiente Tumoral , Biomarcadores de Tumor/metabolismoRESUMEN
Intraoperative margin analysis is crucial for the successful removal of cutaneous squamous cell carcinomas (cSCC). Artificial intelligence technologies (AI) have previously demonstrated potential for facilitating rapid and complete tumour removal using intraoperative margin assessment for basal cell carcinoma. However, the varied morphologies of cSCC present challenges for AI margin assessment. The aim of this study was to develop and evaluate the accuracy of an AI algorithm for real-time histologic margin analysis of cSCC. To do this, a retrospective cohort study was conducted using frozen cSCC section slides. These slides were scanned and annotated, delineating benign tissue structures, inflammation and tumour to develop an AI algorithm for real-time margin analysis. A convolutional neural network workflow was used to extract histomorphological features predictive of cSCC. This algorithm demonstrated proof of concept for identifying cSCC with high accuracy, highlighting the potential for integration of AI into the surgical workflow. Incorporation of AI algorithms may improve efficiency and completeness of real-time margin assessment for cSCC removal, particularly in cases of moderately and poorly differentiated tumours/neoplasms. Further algorithmic improvement incorporating surrounding tissue context is necessary to remain sensitive to the unique epidermal landscape of well-differentiated tumours, and to map tumours to their original anatomical position/orientation.
Asunto(s)
Carcinoma Basocelular , Carcinoma de Células Escamosas , Aprendizaje Profundo , Neoplasias Cutáneas , Humanos , Carcinoma de Células Escamosas/patología , Cirugía de Mohs , Neoplasias Cutáneas/patología , Estudios Retrospectivos , Secciones por Congelación , Inteligencia Artificial , Carcinoma Basocelular/patologíaRESUMEN
MicroRNAs (miRNA) in extracellular vesicles and particles (EVPs) in maternal circulation during pregnancy and in human milk postpartum are hypothesized to facilitate maternal-offspring communication via epigenetic regulation. However, factors influencing maternal EVP miRNA profiles during these two critical developmental windows remain largely unknown. In a pilot study of 54 mother-child dyads in the New Hampshire Birth Cohort Study, we profiled 798 EVP miRNAs, using the NanoString nCounter platform, in paired maternal second-trimester plasma and mature (6-week) milk samples. In adjusted models, total EVP miRNA counts were lower for plasma samples collected in the afternoon compared with the morning (p = 0.024). Infant age at sample collection was inversely associated with total miRNA counts in human milk EVPs (p = 0.040). Milk EVP miRNA counts were also lower among participants who were multiparous after delivery (p = 0.047), had a pre-pregnancy BMI > 25 kg/m2 (p = 0.037), or delivered their baby via cesarean section (p = 0.021). In post hoc analyses, we also identified 22 specific EVP miRNA that were lower among participants who delivered their baby via cesarean section (Q < 0.05). Target genes of delivery mode-associated miRNAs were over-represented in pathways related to satiety signaling in infants (e.g., CCKR signaling) and mammary gland development and lactation (e.g., FGF signaling, EGF receptor signaling). In conclusion, we identified several key factors that may influence maternal EVP miRNA composition during two critical developmental windows, which should be considered in future studies investigating EVP miRNA roles in maternal and child health.
Asunto(s)
Vesículas Extracelulares , MicroARNs , Lactante , Humanos , Embarazo , Femenino , MicroARNs/metabolismo , Leche Humana/metabolismo , Cesárea , Estudios de Cohortes , Epigénesis Genética , Proyectos Piloto , Periodo Posparto , Vesículas Extracelulares/genética , Vesículas Extracelulares/metabolismoRESUMEN
Stratifying breast cancer into specific molecular or histologic subtypes aids in therapeutic decision-making and predicting outcomes; however, these subtypes may not be as distinct as previously thought. Patients with luminal-like, estrogen receptor (ER)-expressing tumors have better prognosis than patients with more aggressive, triple-negative or basal-like tumors. There is, however, a subset of luminal-like tumors that express lower levels of ER, which exhibit more basal-like features. We have found that breast tumors expressing lower levels of ER, traditionally considered to be luminal-like, represent a distinct subset of breast cancer characterized by the emergence of basal-like features. Lineage tracing of low-ER tumors in the MMTV-PyMT mouse mammary tumor model revealed that basal marker-expressing cells arose from normal luminal epithelial cells, suggesting that luminal-to-basal plasticity is responsible for the evolution and emergence of basal-like characteristics. This plasticity allows tumor cells to gain a new lumino-basal phenotype, thus leading to intratumoral lumino-basal heterogeneity. Single-cell RNA sequencing revealed SOX10 as a potential driver for this plasticity, which is known among breast tumors to be almost exclusively expressed in triple-negative breast cancer (TNBC) and was also found to be highly expressed in low-ER tumors. These findings suggest that basal-like tumors may result from the evolutionary progression of luminal tumors with low ER expression.
Asunto(s)
Neoplasias Mamarias Animales , Receptores de Estrógenos , Animales , Ratones , Fenotipo , Expresión Génica , Modelos Animales de EnfermedadRESUMEN
Noninvasive detection of aberrant DNA methylation could provide invaluable biomarkers for earlier detection of triple-negative breast cancer (TNBC) which could help clinicians with easier and more efficient treatment options. We evaluated genome-wide DNA methylation data derived from TNBC and normal breast tissues, peripheral blood of TNBC cases and controls and reference samples of sorted blood and mammary cells. Differentially methylated regions (DMRs) between TNBC and normal breast tissues were stringently selected, verified and externally validated. A machine-learning algorithm was applied to select the top DMRs, which then were evaluated on plasma-derived circulating cell-free DNA (cfDNA) samples of TNBC patients and healthy controls. We identified 23 DMRs accounting for the methylation profile of blood cells and reference mammary cells and then selected six top DMRs for cfDNA analysis. We quantified un-/methylated copies of these DMRs by droplet digital PCR analysis in a plasma test set from TNBC patients and healthy controls and confirmed our findings obtained on tissues. Differential cfDNA methylation was confirmed in an independent validation set of plasma samples. A methylation score combining signatures of the top three DMRs overlapping with the SPAG6, LINC10606 and TBCD/ZNF750 genes had the best capability to discriminate TNBC patients from controls (AUC = 0.78 in the test set and AUC = 0.74 in validation set). Our findings demonstrate the usefulness of cfDNA-based methylation signatures as noninvasive liquid biopsy markers for the diagnosis of TNBC.
Asunto(s)
Ácidos Nucleicos Libres de Células , Neoplasias de la Mama Triple Negativas , Humanos , Metilación de ADN , Neoplasias de la Mama Triple Negativas/diagnóstico , Neoplasias de la Mama Triple Negativas/genética , Neoplasias de la Mama Triple Negativas/patología , Biomarcadores de Tumor/genética , ADN , Ácidos Nucleicos Libres de Células/genética , Marcadores Genéticos , Biopsia Líquida , Proteínas Asociadas a Microtúbulos/genética , Factores de Transcripción/genética , Proteínas Supresoras de Tumor/genéticaRESUMEN
BACKGROUND: We developed a deep learning algorithm to evaluate defecatory patterns to identify dyssynergic defecation using 3-dimensional high definition anal manometry (3D-HDAM). AIMS: We developed a 3D-HDAM deep learning algorithm to evaluate for dyssynergia. METHODS: Spatial-temporal data were extracted from consecutive 3D-HDAM studies performed between 2018 and 2020 at Dartmouth-Hitchcock Health. The technical procedure and gold standard definition of dyssynergia were based on the London consensus, adapted to the needs of 3D-HDAM technology. Three machine learning models were generated: (1) traditional machine learning informed by conventional anorectal function metrics, (2) deep learning, and (3) a hybrid approach. Diagnostic accuracy was evaluated using bootstrap sampling to calculate area-under-the-curve (AUC). To evaluate overfitting, models were validated by adding 502 simulated defecation maneuvers with diagnostic ambiguity. RESULTS: 302 3D-HDAM studies representing 1208 simulated defecation maneuvers were included (average age 55.2 years; 80.5% women). The deep learning model had comparable diagnostic accuracy [AUC 0.91 (95% confidence interval 0.89-0.93)] to traditional [AUC 0.93(0.92-0.95)] and hybrid [AUC 0.96(0.94-0.97)] predictive models in training cohorts. However, the deep learning model handled ambiguous tests more cautiously than other models; the deep learning model was more likely to designate an ambiguous test as inconclusive [odds ratio 4.21(2.78-6.38)] versus traditional/hybrid approaches. CONCLUSIONS: Deep learning is capable of considering complex spatial-temporal information on 3D-HDAM technology. Future studies are needed to evaluate the clinical context of these preliminary findings.
Asunto(s)
Aprendizaje Profundo , Defecación , Humanos , Femenino , Persona de Mediana Edad , Masculino , Manometría/métodos , Canal Anal , Ataxia , Estreñimiento/diagnósticoRESUMEN
Cytomegalovirus (CMV) is a highly prevalent human herpes virus that exerts a strong influence on immune repertoire which may influence cancer risk. We have tested whether CMV immunoglobulin G (IgG) serostatus is associated with immune cell proportions (n = 132 population controls), human papillomavirus (HPV) co-infection and head and neck cancer risk (n = 184 cancer cases and 188 controls) and patient survival. CMV status was not associated with the proportion of Natural Killer cells, B cells or the neutrophil-to-lymphocyte ratio. However, CD8+ T cells increased with increasing categories of IgG titers (P =1.7 × 10-10), and titers were inversely associated with the CD4:CD8 ratio (P = 5.6 × 10-5). Despite these differences in T cell proportions, CMV was not associated with HPV16 co-infection. CMV seropositivity was similar in cases (52%) and controls (47%) and was not associated with patient survival (hazard ratio [HR] 1.14, 95% confidence interval [CI]: 0.70 to 1.86). However, those patients with the highest titers had the worst survival (HR 1.91, 95% CI: 1.13 to 3.23). Tumor-based data from The Cancer Genome Atlas demonstrated that the presence of CMV transcripts was associated with worse patient survival (HR 1.79, 95% CI: 0.96 to 2.78). These findings confirm that a history of CMV infection alters T cell proportions, but this does not translate to HPV16 co-infection or head and neck cancer risk. Our data suggest that high titers and active CMV virus in the tumor environment may confer worse survival.
Asunto(s)
Coinfección , Infecciones por Citomegalovirus , Neoplasias de Cabeza y Cuello , Linfocitos T CD8-positivos , Coinfección/complicaciones , Citomegalovirus , Infecciones por Citomegalovirus/complicaciones , Humanos , Inmunoglobulina GRESUMEN
Prior candidate gene studies have shown tumor suppressor DNA methylation in breast milk related with history of breast biopsy, an established risk factor for breast cancer. To further establish the utility of breast milk as a tissue-specific biospecimen for investigations of breast carcinogenesis, we measured genome-wide DNA methylation in breast milk from women with and without a diagnosis of breast cancer in two independent cohorts. DNA methylation was assessed using Illumina HumanMethylation450k in 87 breast milk samples. Through an epigenome-wide association study we explored CpG sites associated with a breast cancer diagnosis in the prospectively collected milk samples from the breast that would develop cancer compared with women without a diagnosis of breast cancer using linear mixed effects models adjusted for history of breast biopsy, age, RefFreeCellMix cell estimates, time of delivery, array chip and subject as random effect. We identified 58 differentially methylated CpG sites associated with a subsequent breast cancer diagnosis (q-value <0.05). Nearly all CpG sites associated with a breast cancer diagnosis were hypomethylated in cases compared with controls and were enriched for CpG islands. In addition, inferred repeat element methylation was lower in breast milk DNA from cases compared to controls, and cases exhibited increased estimated epigenetic mitotic tick rate as well as DNA methylation age compared with controls. Breast milk has utility as a biospecimen for prospective assessment of disease risk, for understanding the underlying molecular basis of breast cancer risk factors and improving primary and secondary prevention of breast cancer.
Asunto(s)
Neoplasias de la Mama/diagnóstico , Metilación de ADN , Epigénesis Genética , Regulación Neoplásica de la Expresión Génica , Leche Humana/química , Adolescente , Adulto , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Estudios de Casos y Controles , Femenino , Humanos , Persona de Mediana Edad , Pronóstico , Estudios Prospectivos , Adulto JovenRESUMEN
BACKGROUND: Cellular compositions of solid tumor microenvironments are heterogeneous, varying across patients and tumor types. High-resolution profiling of the tumor microenvironment cell composition is crucial to understanding its biological and clinical implications. Previously, tumor microenvironment gene expression and DNA methylation-based deconvolution approaches have been shown to deconvolve major cell types. However, existing methods lack accuracy and specificity to tumor type and include limited identification of individual cell types. RESULTS: We employed a novel tumor-type-specific hierarchical model using DNA methylation data to deconvolve the tumor microenvironment with high resolution, accuracy, and specificity. The deconvolution algorithm is named HiTIMED. Seventeen cell types from three major tumor microenvironment components can be profiled (tumor, immune, angiogenic) by HiTIMED, and it provides tumor-type-specific models for twenty carcinoma types. We demonstrate the prognostic significance of cell types that other tumor microenvironment deconvolution methods do not capture. CONCLUSION: We developed HiTIMED, a DNA methylation-based algorithm, to estimate cell proportions in the tumor microenvironment with high resolution and accuracy. HiTIMED deconvolution is amenable to archival biospecimens providing high-resolution profiles enabling to study of clinical and biological implications of variation and composition of the tumor microenvironment.
Asunto(s)
Metilación de ADN , Neoplasias , Humanos , Metilación de ADN/genética , Microambiente Tumoral , Algoritmos , Neoplasias/genética , Epigénesis GenéticaRESUMEN
Stem cell maturation is a fundamental, yet poorly understood aspect of human development. We devised a DNA methylation signature deeply reminiscent of embryonic stem cells (a fetal cell origin signature, FCO) to interrogate the evolving character of multiple human tissues. The cell fraction displaying this FCO signature was highly dependent upon developmental stage (fetal versus adult), and in leukocytes, it described a dynamic transition during the first 5 yr of life. Significant individual variation in the FCO signature of leukocytes was evident at birth, in childhood, and throughout adult life. The genes characterizing the signature included transcription factors and proteins intimately involved in embryonic development. We defined and applied a DNA methylation signature common among human fetal hematopoietic progenitor cells and have shown that this signature traces the lineage of cells and informs the study of stem cell heterogeneity in humans under homeostatic conditions.
Asunto(s)
Linaje de la Célula , Metilación de ADN , Células Madre Embrionarias/metabolismo , Regulación del Desarrollo de la Expresión Génica , Adulto , Niño , Células Madre Embrionarias/citología , Células Madre Hematopoyéticas/citología , Células Madre Hematopoyéticas/metabolismo , Humanos , Recién NacidoRESUMEN
Non-alcoholic steatohepatitis (NASH) is a fatty liver disease characterized by accumulation of fat in hepatocytes with concurrent inflammation and is associated with morbidity, cirrhosis and liver failure. After extraction of a liver core biopsy, tissue sections are stained with hematoxylin and eosin (H&E) to grade NASH activity, and stained with trichrome to stage fibrosis. Methods to computationally transform one stain into another on digital whole slide images (WSI) can lessen the need for additional physical staining besides H&E, reducing personnel, equipment, and time costs. Generative adversarial networks (GAN) have shown promise for virtual staining of tissue. We conducted a large-scale validation study of the viability of GANs for H&E to trichrome conversion on WSI (n = 574). Pathologists were largely unable to distinguish real images from virtual/synthetic images given a set of twelve Turing Tests. We report high correlation between staging of real and virtual stains ([Formula: see text]; 95% CI: 0.84-0.88). Stages assigned to both virtual and real stains correlated similarly with a number of clinical biomarkers and progression to End Stage Liver Disease (Hazard Ratio HR = 2.06, 95% CI: 1.36-3.12, p < 0.001 for real stains; HR = 2.02, 95% CI: 1.40-2.92, p < 0.001 for virtual stains). Our results demonstrate that virtual trichrome technologies may offer a software solution that can be employed in the clinical setting as a diagnostic decision aid.
Asunto(s)
Compuestos Azo , Colorantes , Eosina Amarillenta-(YS) , Interpretación de Imagen Asistida por Computador , Cirrosis Hepática/diagnóstico , Hígado/patología , Verde de Metilo , Microscopía , Redes Neurales de la Computación , Enfermedad del Hígado Graso no Alcohólico/diagnóstico , Coloración y Etiquetado , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Biopsia , Niño , Toma de Decisiones Clínicas , Técnicas de Apoyo para la Decisión , Femenino , Hematoxilina , Humanos , Cirrosis Hepática/patología , Masculino , Persona de Mediana Edad , Enfermedad del Hígado Graso no Alcohólico/patología , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Estudios Retrospectivos , Índice de Severidad de la Enfermedad , Programas Informáticos , Adulto JovenRESUMEN
BACKGROUND: DNA methylation (DNAm) is an epigenetic regulator of gene expression programs that can be altered by environmental exposures, aging, and in pathogenesis. Traditional analyses that associate DNAm alterations with phenotypes suffer from multiple hypothesis testing and multi-collinearity due to the high-dimensional, continuous, interacting and non-linear nature of the data. Deep learning analyses have shown much promise to study disease heterogeneity. DNAm deep learning approaches have not yet been formalized into user-friendly frameworks for execution, training, and interpreting models. Here, we describe MethylNet, a DNAm deep learning method that can construct embeddings, make predictions, generate new data, and uncover unknown heterogeneity with minimal user supervision. RESULTS: The results of our experiments indicate that MethylNet can study cellular differences, grasp higher order information of cancer sub-types, estimate age and capture factors associated with smoking in concordance with known differences. CONCLUSION: The ability of MethylNet to capture nonlinear interactions presents an opportunity for further study of unknown disease, cellular heterogeneity and aging processes.
Asunto(s)
Metilación de ADN , Aprendizaje Profundo , Interfaz Usuario-Computador , Envejecimiento/genética , Islas de CpG , Humanos , Neoplasias/genética , Neoplasias/patologíaRESUMEN
Asbestos describes a group of naturally occurring fibrous silicate mineral compounds that have been associated with a number of respiratory maladies, including mesothelioma and lung cancer. In addition, based primarily on epidemiologic studies, asbestos has been implicated as a risk factor for laryngeal and pharyngeal squamous cell carcinoma (SCC). The main objective of this work was to strengthen existing evidence via empirical demonstration of persistent asbestos fibers embedded in the tissue surrounding laryngeal and pharyngeal SCC, thus providing a more definitive biological link between exposure and disease. Six human papillomavirus (HPV)-negative laryngeal (n = 4) and pharyngeal (n = 2) SCC cases with a history working in an asbestos-exposed occupation were selected from a large population-based case-control study of head and neck cancer. A laryngeal SCC case with no history of occupational asbestos exposure was included as a control. Tissue cores were obtained from adjacent nonneoplastic tissue in tumor blocks from the initial primary tumor resection, and mineral fiber analysis was performed using a scanning electron microscope equipped with an energy dispersive X-ray analyzer (EDXA). Chrysotile asbestos fiber bundles were identified in 3/6 of evaluated cases with a history of occupational asbestos exposure. All three cases had tumors originating in the larynx. In addition, a wollastonite fiber of unclear significance was identified one of the HPV-negative pharyngeal SCC cases. No mineral fibers were identified in adjacent tissue of the case without occupational exposure. The presence of asbestos fibers in the epithelial tissue surrounding laryngeal SCC in cases with a history of occupational asbestos exposure adds a key line of physical evidence implicating asbestos as an etiologic factor.
Asunto(s)
Asbestos Serpentinas/efectos adversos , Neoplasias Laríngeas/etiología , Exposición Profesional/efectos adversos , Carcinoma de Células Escamosas de Cabeza y Cuello/etiología , Anciano , Asbestos Serpentinas/análisis , Estudios de Casos y Controles , Células Epiteliales/química , Células Epiteliales/ultraestructura , Humanos , Neoplasias Laríngeas/química , Neoplasias Laríngeas/ultraestructura , Laringe/química , Laringe/ultraestructura , Masculino , Persona de Mediana Edad , Fibras Minerales/efectos adversos , Fibras Minerales/análisis , Medición de Riesgo , Factores de Riesgo , Carcinoma de Células Escamosas de Cabeza y Cuello/química , Carcinoma de Células Escamosas de Cabeza y Cuello/ultraestructuraRESUMEN
SUMMARY: Performing highly parallelized preprocessing of methylation array data using Python can accelerate data preparation for downstream methylation analyses, including large scale production-ready machine learning pipelines. We present a highly reproducible, scalable pipeline (PyMethylProcess) that can be quickly set-up and deployed through Docker and PIP. AVAILABILITY AND IMPLEMENTATION: Project Home Page: https://github.com/Christensen-Lab-Dartmouth/PyMethylProcess. Available on PyPI (pymethylprocess), Docker (joshualevy44/pymethylprocess). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Metilación de ADN , Flujo de Trabajo , Biología Computacional , Aprendizaje Automático , Programas InformáticosRESUMEN
BACKGROUND: BRCA1-mutated cancers exhibit deficient homologous recombination (HR) DNA repair, resulting in extensive copy number alterations and genome instability. HR deficiency can also arise in tumors without a BRCA1 mutation. Compared with other breast tumors, HR-deficient, BRCA1-like tumors exhibit worse prognosis but selective chemotherapeutic sensitivity. Presently, patients with triple negative breast cancer (TNBC) who do not respond to hormone endocrine-targeting therapy are given cytotoxic chemotherapy. However, more recent evidence showed a similar genomic profile between BRCA1-deficient TNBCs and hormone-receptor-positive tumors. Characterization of the somatic alterations of BRCA1-like hormone-receptor-positive breast tumors as a group, which is currently lacking, can potentially help develop biomarkers for identifying additional patients who might respond to chemotherapy. METHODS: We retrained and validated a copy-number-based support vector machine (SVM) classifier to identify HR-deficient, BRCA1-like breast tumors. We applied this classifier to The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) breast tumors. We assessed mutational profiles and proliferative capacity by covariate-adjusted linear models and identified differentially methylated regions using DMRcate in BRCA1-like hormone-receptor-positive tumors. RESULTS: Of the breast tumors in TCGA and METABRIC, 22% (651/2925) were BRCA1-like. Stratifying on hormone-receptor status, 13% (302/2405) receptor-positive and 69% (288/417) triple-negative tumors were BRCA1-like. Among the hormone-receptor-positive subgroup, BRCA1-like tumors showed significantly increased mutational burden and proliferative capacity (both P < 0.05). Genome-scale DNA methylation analysis of BRCA1-like tumors identified 202 differentially methylated gene regions, including hypermethylated BRCA1. Individually significant CpGs were enriched for enhancer regions (P < 0.05). The hypermethylated gene sets were enriched for DNA and chromatin conformation (all Bonferroni P < 0.05). CONCLUSIONS: To provide insights into alternative classification and potential therapeutic targeting strategies of BRCA1-like hormone-receptor-positive tumors we developed and applied a novel copy number classifier to identify BRCA1-like hormone-receptor-positive tumors and their characteristic somatic alteration profiles.
Asunto(s)
Proteína BRCA1/genética , Neoplasias de la Mama/genética , Variaciones en el Número de Copia de ADN/genética , Epigenómica/métodos , Máquina de Vectores de Soporte , Adulto , Anciano , Mama/patología , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/patología , Islas de CpG/genética , Metilación de ADN/genética , Conjuntos de Datos como Asunto , Femenino , Recombinación Homóloga/genética , Humanos , Persona de Mediana Edad , Regiones Promotoras Genéticas/genética , Receptores de Estrógenos/metabolismo , Receptores de Progesterona/metabolismo , Análisis de SupervivenciaRESUMEN
Recent advances in cell-type deconvolution approaches are adding to our understanding of the biology underlying disease development and progression. DNA methylation (DNAm) can be used as a biomarker of cell types, and through deconvolution approaches, to infer underlying cell type proportions. Cell-type deconvolution algorithms have two main categories: reference-based and reference-free. Reference-based algorithms are supervised methods that determine the underlying composition of cell types within a sample by leveraging differentially methylated regions (DMRs) specific to cell type, identified from DNAm measures of purified cell populations. Reference-free algorithms are unsupervised methods for use when cell-type specific DMRs are not available, allowing scientists to estimate putative cellular proportions or control for potential confounding from cell type. Reference-based deconvolution is typically applied to blood samples and has potentiated our understanding of the relation between immune profiles and disease by allowing estimation of immune cell proportions from archival DNA. Bioinformatic analyses using DNAm to infer immune cell proportions, part of a new field known as Immunomethylomics, provides a new direction for consideration in epigenome wide association studies (EWAS).
Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Animales , Simulación por Computador , Metilación de ADN/genética , Metilación de ADN/fisiología , Estudio de Asociación del Genoma Completo , HumanosRESUMEN
BACKGROUND: Differentiated cells that arise from stem cells in early development contain DNA methylation features that provide a memory trace of their fetal cell origin (FCO). The FCO signature was developed to estimate the proportion of cells in a mixture of cell types that are of fetal origin and are reminiscent of embryonic stem cell lineage. Here we implemented the FCO signature estimation method to compare the fraction of cells with the FCO signature in tumor tissues and their corresponding nontumor normal tissues. METHODS: We applied our FCO algorithm to discovery data sets obtained from The Cancer Genome Atlas (TCGA) and replication data sets obtained from the Gene Expression Omnibus (GEO) data repository. Wilcoxon rank sum tests, linear regression models with adjustments for potential confounders and non-parametric randomization-based tests were used to test the association of FCO proportion between tumor tissues and nontumor normal tissues. P-values of < 0.05 were considered statistically significant. RESULTS: Across 20 different tumor types we observed a consistently lower FCO signature in tumor tissues compared with nontumor normal tissues, with 18 observed to have significantly lower FCO fractions in tumor tissue (total n = 6,795 tumor, n = 922 nontumor, P < 0.05). We replicated our findings in 15 tumor types using data from independent subjects in 15 publicly available data sets (total n = 740 tumor, n = 424 nontumor, P < 0.05). CONCLUSIONS: The results suggest that cancer development itself is substantially devoid of recapitulation of normal embryologic processes. Our results emphasize the distinction between DNA methylation in normal tightly regulated stem cell driven differentiation and cancer stem cell reprogramming that involves altered methylation in the service of great cell heterogeneity and plasticity.
Asunto(s)
Metilación de ADN/genética , Células Madre Embrionarias Humanas/metabolismo , Neoplasias/genética , Células Madre Neoplásicas/metabolismo , Adulto , Algoritmos , Plasticidad de la Célula , Reprogramación Celular/genética , Islas de CpG , Epigénesis Genética , Femenino , Heterogeneidad Genética , Sitios Genéticos , Humanos , Modelos Lineales , Masculino , Neoplasias/patología , Embarazo , Estadísticas no Paramétricas , TranscriptomaRESUMEN
BACKGROUND AND AIMS: Serrated polyps (SPs) and conventional high-risk adenomas (HRAs) derive from two distinct biological pathways but can also occur synchronously. Adults with synchronous SPs and adenomas have been shown to be a high-risk group and may have a unique risk factor profile that differs from adults with conventional HRAs alone. We used the population-based New Hampshire Colonoscopy Registry (NHCR) to examine the risk profile of individuals with synchronous conventional HRAs and SPs. METHODS: Our study population included 20,281 first time screening colonoscopies from asymptomatic NHCR participants 40 years or older between 2004-15. Exams were categorized by findings: (1) normal, (2) HRA only (adenomas ≥ 1 cm, villous, high grade dysplasia, multiple adenomas ( > 2) and adenocarcinoma), (3) clinically significant SP (CSSP) only (any hyperplastic polyp ≥ 1 cm, sessile serrated adenomas/polyps or traditional serrated adenomas), and (4) synchronous HRA + CSSP. Risk factors examined included exposure of interest, smoking (never, past, and current/pack years), as well as age, sex, alcohol, education, and family history of colorectal cancer (CRC). Multivariable unconditional logistic regression tested the relation of risk factors with having synchronous HRA + CSSP versus having a normal exam or HRA alone. RESULTS: Among NHCR participants with 18,354 screening colonoscopies (with complete smoking, sex, bowel preparation data, and adequate preparation) there were 16,495 normal; 1309 HRA alone; 461 CSSP alone, and 89 synchronous HRA + CSSP. Current smoking was associated with an almost threefold increased risk for HRA or CSSP, and an eightfold risk for synchronous HRA + CSSP (aOR = 8.66; 95% CI: 4.73-15.86) compared to normal exams. Adults with synchronous HRA + CSSP were threefold more likely to be current smokers than those with HRA alone (aOR = 3.27; 95% CI:1.74-6.16). CONCLUSIONS: Our data suggest that current smokers may be at a higher risk for synchronous CSSP + HRA even when compared to having HRA alone.
Asunto(s)
Adenoma/epidemiología , Pólipos del Colon/epidemiología , Neoplasias Colorrectales/epidemiología , Neoplasias Primarias Múltiples/epidemiología , Fumar/epidemiología , Adenoma/diagnóstico por imagen , Adenoma/etiología , Adenoma/patología , Anciano , Estudios de Cohortes , Colon/diagnóstico por imagen , Colon/patología , Pólipos del Colon/diagnóstico por imagen , Pólipos del Colon/etiología , Pólipos del Colon/patología , Colonoscopía/estadística & datos numéricos , Neoplasias Colorrectales/diagnóstico por imagen , Neoplasias Colorrectales/etiología , Neoplasias Colorrectales/patología , Femenino , Humanos , Masculino , Tamizaje Masivo/estadística & datos numéricos , Persona de Mediana Edad , Neoplasias Primarias Múltiples/diagnóstico por imagen , Neoplasias Primarias Múltiples/etiología , Neoplasias Primarias Múltiples/patología , New Hampshire/epidemiología , Sistema de Registros/estadística & datos numéricos , Factores de Riesgo , Fumar/efectos adversosRESUMEN
Bidirectional gene promoters affect the transcription of two genes, leading to the hypothesis that they should exhibit protection against genetic or epigenetic changes in cancer. Therefore, they provide an excellent opportunity to learn about promoter susceptibility to somatic alteration in tumors. We tested this hypothesis using data from genome-scale DNA methylation (14 cancer types), simple somatic mutation (10 cancer types), and copy number variation profiling (14 cancer types). For DNA methylation, the difference in rank differential methylation between tumor and tumor-adjacent normal matched samples based on promoter type was tested by the Wilcoxon rank sum test. Logistic regression was used to compare differences in simple somatic mutations. For copy number alteration, a mixed effects logistic regression model was used. The change in methylation between non-diseased tissues and their tumor counterparts was significantly greater in single compared to bidirectional promoters across all 14 cancer types examined. Similarly, the extent of copy number alteration was greater in single gene compared to bidirectional promoters for all 14 cancer types. Furthermore, among 10 cancer types with available simple somatic mutation data, bidirectional promoters were slightly more susceptible. These results suggest that selective pressures related with specific functional impacts during carcinogenesis drive the susceptibility of promoter regions to somatic alteration.