Pesquisa | Prevenção e Controle de Câncer

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data.

Anilkumar Sithara, Anjana; Maripuri, Devi Priyanka; Moorthy, Keerthika; Amirtha Ganesh, Sai Sruthi; Philip, Philge; Banerjee, Shayantan; Sudhakar, Malvika; Raman, Karthik.

NAR Genom Bioinform ; 4(3): lqac053, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-35899080

RESUMO

Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, we have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input and outputs insightful statistics. Our iCOMIC toolkit pipeline featuring many independent workflows is embedded in the popular Snakemake workflow management system. It can analyze whole-genome and transcriptome data and is characterized by a user-friendly GUI that offers several advantages, including minimal execution steps and eliminating the need for complex command-line arguments. Notably, we have integrated algorithms developed in-house to predict pathogenicity among cancer-causing mutations and differentiate between tumor suppressor genes and oncogenes from somatic mutation data. We benchmarked our tool against Genome In A Bottle benchmark dataset (NA12878) and got the highest F1 score of 0.971 and 0.988 for indels and SNPs, respectively, using the BWA MEM-GATK HC DNA-Seq pipeline. Similarly, we achieved a correlation coefficient of r = 0.85 using the HISAT2-StringTie-ballgown and STAR-StringTie-ballgown RNA-Seq pipelines on the human monocyte dataset (SRP082682). Overall, our tool enables easy analyses of omics datasets, significantly ameliorating complex data analysis pipelines.

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes.

Banerjee, Shayantan; Raman, Karthik; Ravindran, Balaraman.

Cancers (Basel) ; 13(10)2021 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-34068918

RESUMO

Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. "Driver" mutations are primarily responsible for cancer progression, while "passengers" are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5' and 3' from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA