Búsqueda | Portal Regional de la BVS

On the identifiability of the isoform deconvolution problem: application to select the proper fragment length in an RNA-seq library.

Ferrer-Bonsoms, Juan A; Morales, Xabier; Afshar, Pegah T; Wong, Wing H; Rubio, Angel.

Bioinformatics ; 38(6): 1491-1496, 2022 03 04.

Artículo en Inglés | MEDLINE | ID: mdl-34978563

RESUMEN

MOTIVATION: Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome. RESULTS: A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400-600 nt for coding genes and 150-200 nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability. AVAILABILITY AND IMPLEMENTATION: Code is available in GitHub (https://github.com/JFerrer-B/transcriptome-identifiability). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Genoma , Transcriptoma , Humanos , RNA-Seq , Biblioteca de Genes , Isoformas de Proteínas/genética , Programas Informáticos

A universal SNP and small-indel variant caller using deep neural networks.

Poplin, Ryan; Chang, Pi-Chuan; Alexander, David; Schwartz, Scott; Colthurst, Thomas; Ku, Alexander; Newburger, Dan; Dijamco, Jojo; Nguyen, Nam; Afshar, Pegah T; Gross, Sam S; Dorfman, Lizzie; McLean, Cory Y; DePristo, Mark A.

Nat Biotechnol ; 36(10): 983-987, 2018 11.

Artículo en Inglés | MEDLINE | ID: mdl-30247488

RESUMEN

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.

Asunto(s)

Genoma Humano , Mamíferos/genética , Redes Neurales de la Computación , Polimorfismo de Nucleótido Simple , Animales , Análisis Mutacional de ADN , Genómica , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Análisis de Secuencia de ADN , Programas Informáticos

Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis.

Sahraeian, Sayed Mohammad Ebrahim; Mohiyuddin, Marghoob; Sebra, Robert; Tilgner, Hagen; Afshar, Pegah T; Au, Kin Fai; Bani Asadi, Narges; Gerstein, Mark B; Wong, Wing Hung; Snyder, Michael P; Schadt, Eric; Lam, Hugo Y K.

Nat Commun ; 8(1): 59, 2017 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-28680106

RESUMEN

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.

Asunto(s)

Células Madre Embrionarias , Transcriptoma , Secuencia de Bases , Línea Celular , Humanos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA