RESUMEN
BACKGROUND: Detecting structural variations (SVs) at the population level using next-generation sequencing (NGS) requires substantial computational resources and processing time. Here, we compared the performances of 11 SV callers: Delly, Manta, GridSS, Wham, Sniffles, Lumpy, SvABA, Canvas, CNVnator, MELT, and INSurVeyor. These SV callers have been recently published and have been widely employed for processing massive whole-genome sequencing datasets. We evaluated the accuracy, sequence depth, running time, and memory usage of the SV callers. RESULTS: Notably, several callers exhibited better calling performance for deletions than for duplications, inversions, and insertions. Among the SV callers, Manta identified deletion SVs with better performance and efficient computing resources, and both Manta and MELT demonstrated relatively good precision regarding calling insertions. We confirmed that the copy number variation callers, Canvas and CNVnator, exhibited better performance in identifying long duplications as they employ the read-depth approach. Finally, we also verified the genotypes inferred from each SV caller using a phased long-read assembly dataset, and Manta showed the highest concordance in terms of the deletions and insertions. CONCLUSIONS: Our findings provide a comprehensive understanding of the accuracy and computational efficiency of SV callers, thereby facilitating integrative analysis of SV profiles in diverse large-scale genomic datasets.
Asunto(s)
Variaciones en el Número de Copia de ADN , Genómica , Humanos , Secuenciación Completa del Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Genoma Humano , Variación Estructural del GenomaRESUMEN
Alterations in DNA methylation play an important pathophysiological role in the development and progression of colorectal cancer. We comprehensively profiled DNA methylation alterations in 165 Korean patients with colorectal cancer (CRC), and conducted an in-depth investigation of cancer-specific methylation patterns. Our analysis of the tumor samples revealed a significant presence of hypomethylated probes, primarily within the gene body regions; few hypermethylated sites were observed, which were mostly enriched in promoter-like and CpG island regions. The CpG Island Methylator PhenotypeHigh (CIMP-H) exhibited notable enrichment of microsatellite instability-high (MSI-H). Additionally, our findings indicated a significant correlation between methylation of the MLH1 gene and MSI-H status. Furthermore, we found that the CIMP-H had a higher tendency to affect the right-side of the colon tissues and was slightly more prevalent among older patients. Through our methylome profile analysis, we successfully verified the thylation patterns and clinical characteristics of Korean patients with CRC. This valuable dataset lays a strong foundation for exploring novel molecular insights and potential therapeutic targets for the treatment of CRC. [BMB Reports 2024; 57(2): 110-115].
Asunto(s)
Neoplasias Colorrectales , Metilación de ADN , Humanos , Metilación de ADN/genética , Inestabilidad de Microsatélites , Mutación , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , República de Corea , Islas de CpG/genética , FenotipoRESUMEN
Aberrant DNA methylation plays a critical role in the development and progression of colorectal cancer (CRC), which has high incidence and mortality rates in Korea. Various CRC-associated methylation markers for cancer diagnosis and prognosis have been developed; however, they have not been validated for Korean patients owing to the lack of comprehensive clinical and methylome data. Here, we obtained reliable methylation profiles for 228 tumor, 103 adjacent normal, and two unmatched normal colon tissues from Korean patients with CRC using an Illumina Infinium EPIC array; the data were corrected for biological and experiment biases. A comparative methylome analysis confirmed the previous findings that hypermethylated positions in the tumor were highly enriched in CpG island and promoter, 5' untranslated, and first exon regions. However, hypomethylated positions were enriched in the open-sea regions considerably distant from CpG islands. After applying a CpG island methylator phenotype (CIMP) to the methylome data of tumor samples to stratify the CRC patients, we consolidated the previously established clinicopathological findings that the tumors with high CIMP signatures were significantly enriched in the right colon. The results showed a higher prevalence of microsatellite instability status and MLH1 methylation in tumors with high CMP signatures than in those with low or non-CIMP signatures. Therefore, our methylome analysis and dataset provide insights into applying CRC-associated methylation markers for Korean patients regarding cancer diagnosis and prognosis. [BMB Reports 2024; 57(3): 161-166].
Asunto(s)
Neoplasias Colorrectales , Epigenoma , Humanos , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Metilación de ADN/genética , Islas de CpG/genética , Fenotipo , República de CoreaRESUMEN
Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth.
Asunto(s)
Colaboración de las Masas , Microbiota , Nacimiento Prematuro , Embarazo , Femenino , Recién Nacido , Humanos , Filogenia , Vagina , Microbiota/genéticaRESUMEN
DNA methylation regulates gene expression and contributes to tumorigenesis in the early stages of cancer. In colorectal cancer (CRC), CpG island methylator phenotype (CIMP) is recognized as a distinct subset that is associated with specific molecular and clinical features. In this study, we investigated the genomewide DNA methylation patterns among patients with CRC. The methylation data of 1 unmatched normal, 142 adjacent normal, and 294 tumor samples were analyzed. We identified 40,003 differentially methylated positions with 6,933 (79.8%) hypermethylated and 16,145 (51.6%) hypomethylated probes in the genic region. Hypermethylated probes were predominantly found in promoter-like regions, CpG islands, and N shore sites; hypomethylated probes were enriched in open-sea regions. CRC tumors were categorized into three CIMP subgroups, with 90 (30.6%) in the CIMP-high (CIMP-H), 115 (39.1%) in the CIMP-low (CIMP-L), and 89 (30.3%) in the non-CIMP group. The CIMP-H group was associated with microsatellite instabilityhigh tumors, hypermethylation of MLH1, older age, and rightsided tumors. Our results showed that genome-wide methylation analyses classified patients with CRC into three subgroups according to CIMP levels, with clinical and molecular features consistent with previous data. [BMB Reports 2023; 56(10): 563-568].
Asunto(s)
Neoplasias Colorrectales , Metilación de ADN , Humanos , Metilación de ADN/genética , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Islas de CpG/genética , Fenotipo , Epigénesis Genética/genética , República de CoreaRESUMEN
Aberrant DNA methylation plays a pivotal role in the onset and progression of colorectal cancer (CRC), a disease with high incidence and mortality rates in Korea. Several CRC-associated diagnostic and prognostic methylation markers have been identified; however, due to a lack of comprehensive clinical and methylome data, these markers have not been validated in the Korean population. Therefore, in this study, we aimed to obtain the CRC methylation profile using 172 tumors and 128 adjacent normal colon tissues of Korean patients with CRC. Based on the comparative methylome analysis, we found that hypermethylated positions in the tumor were predominantly concentrated in CpG islands and promoter regions, whereas hypomethylated positions were largely found in the open-sea region, notably distant from the CpG islands. In addition, we stratified patients by applying the CpG island methylator phenotype (CIMP) to the tumor methylome data. This stratification validated previous clinicopathological implications, as tumors with high CIMP signatures were significantly correlated with the proximal colon, higher prevalence of microsatellite instability status, and MLH1 promoter methylation. In conclusion, our extensive methylome analysis and the accompanying dataset offers valuable insights into the utilization of CRC-associated methylation markers in Korean patients, potentially improving CRC diagnosis and prognosis. Furthermore, this study serves as a solid foundation for further investigations into personalized and ethnicity-specific CRC treatments. [BMB Reports 2023; 56(10): 569-574].
Asunto(s)
Neoplasias Colorrectales , Metilación de ADN , Humanos , Metilación de ADN/genética , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Islas de CpG/genética , República de Corea , FenotipoRESUMEN
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.
RESUMEN
BACKGROUND: Genome-wide studies of DNA methylation across the epigenetic landscape provide insights into the heterogeneity of pluripotent embryonic stem cells (ESCs). Differentiating into embryonic somatic and germ cells, ESCs exhibit varying degrees of pluripotency, and epigenetic changes occurring in this process have emerged as important factors explaining stem cell pluripotency. RESULTS: Here, using paired scBS-seq and scRNA-seq data of mice, we constructed a machine learning model that predicts degrees of pluripotency for mouse ESCs. Since the biological activities of non-CpG markers have yet to be clarified, we tested the predictive power of CpG and non-CpG markers, as well as a combination thereof, in the model. Through rigorous performance evaluation with both internal and external validation, we discovered that a model using both CpG and non-CpG markers predicted the pluripotency of ESCs with the highest prediction performance (0.956 AUC, external test). The prediction model consisted of 16 CpG and 33 non-CpG markers. The CpG and most of the non-CpG markers targeted depletions of methylation and were indicative of cell pluripotency, whereas only a few non-CpG markers reflected accumulations of methylation. Additionally, we confirmed that there exists the differing pluripotency between individual developmental stages, such as E3.5 and E6.5, as well as between induced mouse pluripotent stem cell (iPSC) and somatic cell. CONCLUSIONS: In this study, we investigated CpG and non-CpG methylation in relation to mouse stem cell pluripotency and developed a model thereon that successfully predicts the pluripotency of mouse ESCs.
Asunto(s)
Islas de CpG , Metilación de ADN , Células Madre Pluripotentes/metabolismo , Animales , Epigénesis Genética , Epigenómica , Ratones , Células Madre Embrionarias de Ratones/metabolismoRESUMEN
DNA methylation patterns have been shown to change throughout the normal aging process. Several studies have found epigenetic aging markers using age predictors, but these studies only focused on blood-specific or tissue-common methylation patterns. Here, we constructed nine tissue-specific age prediction models using methylation array data from normal samples. The constructed models predict the chronological age with good performance (mean absolute error of 5.11 years on average) and show better performance in the independent test than previous multi-tissue age predictors. We also compared tissue-common and tissue-specific aging markers and found that they had different characteristics. Firstly, the tissue-common group tended to contain more positive aging markers with methylation values that increased during the aging process, whereas the tissue-specific group tended to contain more negative aging markers. Secondly, many of the tissue-common markers were located in Cytosine-phosphate-Guanine (CpG) island regions, whereas the tissue-specific markers were located in CpG shore regions. Lastly, the tissue-common CpG markers tended to be located in more evolutionarily conserved regions. In conclusion, our prediction models identified CpG markers that capture both tissue-common and tissue-specific characteristics during the aging process.
Asunto(s)
Factores de Edad , Metilación de ADN/genética , Predicción/métodos , Adulto , Anciano , Biomarcadores , Islas de CpG/genética , Bases de Datos Genéticas , Epigénesis Genética/genética , Epigenómica , Femenino , Humanos , Masculino , Persona de Mediana Edad , Especificidad de Órganos/genéticaRESUMEN
BACKGROUND: The survival of patients with breast cancer is highly sporadic, from a few months to more than 15 years. In recent studies, the gene expression profiling of tumors has been used as a promising means of predicting prognosis factors. METHODS: In this study, we used gene expression datasets of tumors to identify prognostic factors in breast cancer. We conducted log-rank tests and used unsupervised clustering methods to find reciprocally expressed gene sets associated with worse survival rates. Prognosis prediction scores were determined as the ratio of gene expressions. RESULTS: As a result, four prognosis prediction gene set modules were constructed. The four prognostic gene sets predicted worse survival rates in three independent gene expression data sets. In addition, we found that cancer patient with poor prognosis, i.e., triple-negative cancer, HER2-enriched, TP53 mutated and high-graded patients had higher prognosis prediction scores than those with other types of breast cancer. CONCLUSIONS: In conclusion, based on a gene expression analysis, we suggest that our well-defined scoring method of the prediction of survival outcome may be useful for developing prognostic factors in breast cancer.