RESUMEN
Next-generation sequencing (NGS) technology has revolutionised human cancer research, particularly via detection of genomic variants with its ultra-high-throughput sequencing and increasing affordability. However, the inundation of rich cancer genomics data has resulted in significant challenges in its exploration and translation into biological insights. One of the difficulties in cancer genome sequencing is software selection. Currently, multiple tools are widely used to process NGS data in four stages: raw sequence data pre-processing and quality control (QC), sequence alignment, variant calling and annotation and visualisation. However, the differences between these NGS tools, including their installation, merits, drawbacks and application, have not been fully appreciated. Therefore, a systematic review of the functionality and performance of NGS tools is required to provide cancer researchers with guidance on software and strategy selection. Another challenge is the multidimensional QC of sequencing data because QC can not only report varied sequence data characteristics but also reveal deviations in diverse features and is essential for a meaningful and successful study. However, monitoring of QC metrics in specific steps including alignment and variant calling is neglected in certain pipelines such as the 'Best Practices Workflows' in GATK. In this review, we investigated the most widely used software for the fundamental analysis and QC of cancer genome sequencing data and provided instructions for selecting the most appropriate software and pipelines to ensure precise and efficient conclusions. We further discussed the prospects and new research directions for cancer genomics.
Asunto(s)
Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , Control de Calidad , Biología Computacional/métodos , Humanos , Anotación de Secuencia Molecular , Programas InformáticosRESUMEN
Internal tandem duplication (ITD) of FMS-like tyrosine kinase 3 (FLT3-ITD) constitutes an independent indicator of poor prognosis in acute myeloid leukaemia (AML). AML with FLT3-ITD usually presents with poor treatment outcomes, high recurrence rate and short overall survival. Currently, polymerase chain reaction and capillary electrophoresis are widely adopted for the clinical detection of FLT3-ITD, whereas the length and mutation frequency of ITD are evaluated using fragment analysis. With the development of sequencing technology and the high incidence of FLT3-ITD mutations, a multitude of bioinformatics tools and pipelines have been developed to detect FLT3-ITD using next-generation sequencing data. However, systematic comparison and evaluation of the methods or software have not been performed. In this study, we provided a comprehensive review of the principles, functionality and limitations of the existing methods for detecting FLT3-ITD. We further compared the qualitative and quantitative detection capabilities of six representative tools using simulated and biological data. Our results will provide practical guidance for researchers and clinicians to select the appropriate FLT3-ITD detection tools and highlight the direction of future developments in this field. Availability: A Docker image with several programs pre-installed is available at https://github.com/niu-lab/docker-flt3-itd to facilitate the application of FLT3-ITD detection tools.
Asunto(s)
Biomarcadores de Tumor/genética , Biología Computacional/métodos , Duplicación de Gen , Leucemia Mieloide/genética , Secuencias Repetidas en Tándem/genética , Tirosina Quinasa 3 Similar a fms/genética , Enfermedad Aguda , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Leucemia Mieloide/diagnóstico , MutaciónRESUMEN
MOTIVATION: Microsatellite instability (MSI) is a promising biomarker for cancer prognosis and chemosensitivity. Techniques are rapidly evolving for the detection of MSI from tumor-normal paired or tumor-only sequencing data. However, tumor tissues are often insufficient, unavailable, or otherwise difficult to procure. Increasing clinical evidence indicates the enormous potential of plasma circulating cell-free DNA (cfNDA) technology as a noninvasive MSI detection approach. RESULTS: We developed MSIsensor-ct, a bioinformatics tool based on a machine learning protocol, dedicated to detecting MSI status using cfDNA sequencing data with a potential stable MSIscore threshold of 20%. Evaluation of MSIsensor-ct on independent testing datasets with various levels of circulating tumor DNA (ctDNA) and sequencing depth showed 100% accuracy within the limit of detection (LOD) of 0.05% ctDNA content. MSIsensor-ct requires only BAM files as input, rendering it user-friendly and readily integrated into next generation sequencing (NGS) analysis pipelines. AVAILABILITY: MSIsensor-ct is freely available at https://github.com/niu-lab/MSIsensor-ct. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.
Asunto(s)
ADN Tumoral Circulante/genética , Aprendizaje Automático , Inestabilidad de Microsatélites , Neoplasias/genética , Programas Informáticos , ADN Tumoral Circulante/sangre , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Límite de Detección , Repeticiones de Microsatélite , Neoplasias/sangre , Neoplasias/diagnóstico , Neoplasias/patología , Análisis de Secuencia de ADNRESUMEN
BACKGROUND: The mortality rate of hepatocellular carcinoma (HCC) remains high worldwide despite surgery and chemotherapy. Immunotherapy is a promising treatment for the rapidly expanding HCC spectrum. Therefore, it is necessary to further explore the immune-related characteristics of the tumour microenvironment (TME), which plays a vital role in tumour initiation and progression. METHODS: In this research, 866 immune-related differentially expressed genes (DEGs) were identified by integrating the DEGs of samples from The Cancer Genome Atlas (TCGA)-HCC dataset and the immune-related genes from databases (InnateDB; ImmPort). Afterwards, 144 candidate prognostic genes were defined through weighted gene co-expression network analysis (WGCNA). RESULTS: Seven immune-related prognostic DEGs were identified using the L1-penalized least absolute shrinkage and selection operator (LASSO) Cox proportional hazards (PH) model, and the ImmuneRiskScore model was constructed on this basis. The prognostic index of the ImmuneRiskScore model was then validated in the relevant dataset. Patients were divided into high- and low-risk groups according to the ImmuneRiskScore. Differences in the immune cell infiltration of patients with different ImmuneRiskScore values were clarified, and the correlation of immune cell infiltration with immunotherapy biomarkers was further explored. CONCLUSION: The ImmuneRiskScore of HCC could be a prognostic marker and can reflect the immune characteristics of the TME. Furthermore, it provides a potential biomarker for predicting the response to immunotherapy in HCC patients.
Asunto(s)
Biomarcadores de Tumor/metabolismo , Carcinoma Hepatocelular/inmunología , Neoplasias Hepáticas/inmunología , Microambiente Tumoral/inmunología , Carcinoma Hepatocelular/mortalidad , Carcinoma Hepatocelular/patología , Humanos , Neoplasias Hepáticas/mortalidad , Neoplasias Hepáticas/patología , Pronóstico , Análisis de SupervivenciaRESUMEN
BACKGROUND: Rectal mucosal melanoma (RMM) is a rare and highly aggressive disease with a poor prognosis. Due to the rarity of RMM, there are few studies focusing on its genetic mechanism. This retrospective study aimed to analyze the genetic spectrum and prognosis of RMM in China and lay a foundation for targeted therapy. METHODS: 36 patients with primary RMM from Peking University Cancer Hospital were enrolled in this study. The Next-generation sequencing (NGS) data of the tumor samples were fitted into the TruSight™ Oncology 500 (TSO500) Docker pipeline to detect genomic variants. Then, the univariate and multivariate Cox hazard analysis were performed to evaluate the correlations of the variants with the overall survival (OS), along with Kaplan-Meier and log-rank test to determine their significance. RESULTS: BRAF mutations, NRG1 deletions and mitotic index were significant prognostic factors in the univariate analysis. In multivariable analysis of the OS-related prognostic factors in primary RMM patients, it revealed 2 significant alterations: BRAF mutations [HR 7.732 (95%CI: 1.735-34.456), P = 0.007] and NRG1 deletions [HR 14.976 (95%CI: 2.305-97.300), P = 0.005]. CONCLUSIONS: This is the first study to show genetic alterations exclusively to Chinese patients with RMM. We confirmed genetic alterations of RMM differ from cutaneous melanoma (CM). Our study indicates that BRAF and NRG1 were correlated with a poor prognostic of RMM and may be potential therapeutic targets for RMM treatment.
Asunto(s)
Melanoma/genética , Neoplasias del Recto/genética , Adulto , Anciano , Anciano de 80 o más Años , Pueblo Asiatico/genética , China/epidemiología , Análisis Mutacional de ADN , Femenino , Humanos , Mucosa Intestinal/patología , Masculino , Melanoma/mortalidad , Melanoma/patología , Persona de Mediana Edad , Índice Mitótico , Neurregulina-1/genética , Pronóstico , Proteínas Proto-Oncogénicas B-raf/genética , Neoplasias del Recto/mortalidad , Neoplasias del Recto/patología , Recto/patología , Estudios RetrospectivosRESUMEN
Studying the regulatory mechanisms that drive nitrogen-use efficiency (NUE) in crops is important for sustainable agriculture and environmental protection. In this study, we generated a high-quality genome assembly for the high-NUE wheat cultivar Kenong 9204 and systematically analyzed genes related to nitrogen uptake and metabolism. By comparative analyses, we found that the high-affinity nitrate transporter gene family had expanded in Triticeae. Further studies showed that subsequent functional differentiation endowed the expanded family members with saline inducibility, providing a genetic basis for improving the adaptability of wheat to nitrogen deficiency in various habitats. To explore the genetic and molecular mechanisms of high NUE, we compared genomic and transcriptomic data from the high-NUE cultivar Kenong 9204 (KN9204) and the low-NUE cultivar Jing 411 and quantified their nitrogen accumulation under high- and low-nitrogen conditions. Compared with Jing 411, KN9204 absorbed significantly more nitrogen at the reproductive stage after shooting and accumulated it in the shoots and seeds. Transcriptome data analysis revealed that nitrogen deficiency clearly suppressed the expression of genes related to cell division in the young spike of Jing 411, whereas this suppression of gene expression was much lower in KN9204. In addition, KN9204 maintained relatively high expression of NPF genes for a longer time than Jing 411 during seed maturity. Physiological and transcriptome data revealed that KN9204 was more tolerant of nitrogen deficiency than Jing 411, especially at the reproductive stage. The high NUE of KN9204 is an integrated effect controlled at different levels. Taken together, our data provide new insights into the molecular mechanisms of NUE and important gene resources for improving wheat cultivars with a higher NUE trait.
Asunto(s)
Nitrógeno , Triticum , Perfilación de la Expresión Génica , Genómica , Nitrógeno/metabolismo , Transcriptoma/genética , Triticum/genética , Triticum/metabolismoRESUMEN
Next-generation sequencing (NGS) has drastically enhanced human cancer research, but diverse sequencing strategies, complicated open-source software, and the identification of massive numbers of mutations have limited the clinical application of NGS. Here, we first presented GPyFlow, a lightweight tool that flexibly customizes, executes, and shares workflows. We then introduced DIVIS, a customizable pipeline based on GPyFlow that integrates read preprocessing, alignment, variant detection, and annotation of whole-genome sequencing, whole-exome sequencing, and gene-panel sequencing. By default, DIVIS screens variants from multiple callers and generates a standard variant-detection format list containing caller evidence for each sample, which is compatible with advanced analyses. Lastly, DIVIS generates a statistical report, including command lines, parameters, quality-control indicators, and mutation summary. DIVIS substantially facilitates complex cancer genome sequencing analyses by means of a single powerful and easy-to-use command. The DIVIS code is freely available at https://github.com/niu-lab/DIVIS, and the docker image can be downloaded from https://hub.docker.com/repository/docker/sunshinerain/divis.
RESUMEN
RATIONALE: Gallbladder carcinoma is a malignant biliary tract tumor which is characterized by poor prognosis. Recent advances in genomic medicine have identified a few novel germline mutations that contribute to the increased risk of gallbladder carcinoma. RAD52 is a crucial human deoxyribonucleic acid (DNA) repair gene involved in maintaining genomic stability and preventing tumor occurrence. PATIENT CONCERNS: A 57-year-old man was hospitalized for space-occupying lesions in the gallbladder. DIAGNOSIS: A diagnosis of gallbladder adenocarcinoma was made based on computed tomography, B-ultrasound, blood tests, and postoperative pathology. INTERVENTIONS: Next-generation sequencing using a 599-gene panel and Sanger sequencing were performed to validate the mutation in the proband and his family members, respectively. OUTCOMES: A novel potentially pathogenic heterozygous germline RAD52 missense mutation (c.276Tâ>âA: p.N92K) was identified in the patient. Sanger sequencing revealed that this variation was not observed in unaffected family members. LESSONS: We identified a novel heterozygous germline RAD52 missense mutation in a patient with gallbladder carcinoma. Our results added to the current body of knowledge. It also provides new insights into genetic counseling and targeted therapeutic strategies for patients with gallbladder carcinoma.
Asunto(s)
Neoplasias del Sistema Biliar/genética , Proteína Recombinante y Reparadora de ADN Rad52/genética , Predisposición Genética a la Enfermedad , Mutación de Línea Germinal , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Persona de Mediana Edad , Mutación MissenseRESUMEN
With the increasing incidence of colorectal cancer (CRC) and continued difficulty in treating it using immunotherapy, there is an urgent need to identify an effective immune-related biomarker associated with the survival and prognosis of patients with this disease. DNA methylation plays an essential role in maintaining cellular function, and changes in methylation patterns may contribute to the development of autoimmunity, aging, and cancer. In this study, we aimed to identify a novel immune-related methylated signature to aid in predicting the prognosis of patients with CRC. We investigated DNA methylation patterns in patients with stage II/III CRC using datasets from The cancer genome atlas (TCGA). Overall, 182 patients were randomly divided into training (n = 127) and test groups (n = 55). In the training group, five immune-related methylated CG sites (cg11621464, cg13565656, cg18976437, cg20505223, and cg20528583) were identified, and CG site-based risk scores were calculated using univariate Cox proportional hazards regression in patients with stage II/III CRC. Multivariate Cox regression analysis indicated that methylated signature was independent of other clinical parameters. The Kaplan-Meier analysis results showed that CG site-based risk scores could significantly help distinguish between high- and low-risk patients in both the training (P = 0.000296) and test groups (P = 0.022). The area under the receiver operating characteristic curve in the training and test groups were estimated to be 0.771 and 0.724, respectively, for prognosis prediction. Finally, stratified analysis results suggested the remarkable prognostic value of CG site-based risk scores in CRC subtypes. We identified five methylated CG sites that could be used as an efficient overall survival (OS)-related biomarker for stage II/III CRC patients.
RESUMEN
Brain science accelerates the study of intelligence and behavior, contributes fundamental insights into human cognition, and offers prospective treatments for brain disease. Faced with the challenges posed by imaging technologies and deep learning computational models, big data and high-performance computing (HPC) play essential roles in studying brain function, brain diseases, and large-scale brain models or connectomes. We review the driving forces behind big data and HPC methods applied to brain science, including deep learning, powerful data analysis capabilities, and computational performance solutions, each of which can be used to improve diagnostic accuracy and research output. This work reinforces predictions that big data and HPC will continue to improve brain science by making ultrahigh-performance analysis possible, by improving data standardization and sharing, and by providing new neuromorphic insights.
Asunto(s)
Macrodatos , Encéfalo/fisiología , Biología Computacional/métodos , Conducta/fisiología , Cognición/fisiología , Humanos , Inteligencia/fisiología , Modelos Teóricos , Estudios ProspectivosRESUMEN
The accelerating growth of the public microbial genomic data imposes substantial burden on the research community that uses such resources. Building databases for non-redundant reference sequences from massive microbial genomic data based on clustering analysis is essential. However, existing clustering algorithms perform poorly on long genomic sequences. In this article, we present Gclust, a parallel program for clustering complete or draft genomic sequences, where clustering is accelerated with a novel parallelization strategy and a fast sequence comparison algorithm using sparse suffix arrays (SSAs). Moreover, genome identity measures between two sequences are calculated based on their maximal exact matches (MEMs). In this paper, we demonstrate the high speed and clustering quality of Gclust by examining four genome sequence datasets. Gclust is freely available for non-commercial use at https://github.com/niu-lab/gclust. We also introduce a web server for clustering user-uploaded genomes at http://niulab.scgrid.cn/gclust.