Búsqueda | BVS Nicaragua

1.

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.

Pardo-Palacios, Francisco J; Wang, Dingjie; Reese, Fairlie; Diekhans, Mark; Carbonell-Sala, Sílvia; Williams, Brian; Loveland, Jane E; De María, Maite; Adams, Matthew S; Balderrama-Gutierrez, Gabriela; Behera, Amit K; Gonzalez Martinez, Jose M; Hunt, Toby; Lagarde, Julien; Liang, Cindy E; Li, Haoran; Meade, Marcus Jerryd; Moraga Amador, David A; Prjibelski, Andrey D; Birol, Inanc; Bostan, Hamed; Brooks, Ashley M; Çelik, Muhammed Hasan; Chen, Ying; Du, Mei R M; Felton, Colette; Göke, Jonathan; Hafezqorani, Saber; Herwig, Ralf; Kawaji, Hideya; Lee, Joseph; Li, Jian-Liang; Lienhard, Matthias; Mikheenko, Alla; Mulligan, Dennis; Nip, Ka Ming; Pertea, Mihaela; Ritchie, Matthew E; Sim, Andre D; Tang, Alison D; Wan, Yuk Kei; Wang, Changqing; Wong, Brandon Y; Yang, Chen; Barnes, If; Berry, Andrew E; Capella-Gutierrez, Salvador; Cousineau, Alyssa; Dhillon, Namrita; Fernandez-Gonzalez, Jose M.

Nat Methods ; 2024 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-38849569

RESUMEN

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

2.

TDP-43 loss induces extensive cryptic polyadenylation in ALS/FTD.

Bryce-Smith, Sam; Brown, Anna-Leigh; Mehta, Puja R; Mattedi, Francesca; Mikheenko, Alla; Barattucci, Simone; Zanovello, Matteo; Dattilo, Dario; Yome, Matthew; Hill, Sarah E; Qi, Yue A; Wilkins, Oscar G; Sun, Kai; Ryadnov, Eugeni; Wan, Yixuan; Vargas, Jose Norberto S; Birsa, Nicol; Raj, Towfique; Humphrey, Jack; Keuss, Matthew; Ward, Michael; Secrier, Maria; Fratta, Pietro.

bioRxiv ; 2024 Jan 23.

Artículo en Inglés | MEDLINE | ID: mdl-38313254

RESUMEN

Nuclear depletion and cytoplasmic aggregation of the RNA-binding protein TDP-43 is the hallmark of ALS, occurring in over 97% of cases. A key consequence of TDP-43 nuclear loss is the de-repression of cryptic exons. Whilst TDP-43 regulated cryptic splicing is increasingly well catalogued, cryptic alternative polyadenylation (APA) events, which define the 3' end of last exons, have been largely overlooked, especially when not associated with novel upstream splice junctions. We developed a novel bioinformatic approach to reliably identify distinct APA event types: alternative last exons (ALE), 3'UTR extensions (3'Ext) and intronic polyadenylation (IPA) events. We identified novel neuronal cryptic APA sites induced by TDP-43 loss of function by systematically applying our pipeline to a compendium of publicly available and in house datasets. We find that TDP-43 binding sites and target motifs are enriched at these cryptic events and that TDP-43 can have both repressive and enhancing action on APA. Importantly, all categories of cryptic APA can also be identified in ALS and FTD post mortem brain regions with TDP-43 proteinopathy underlining their potential disease relevance. RNA-seq and Ribo-seq analyses indicate that distinct cryptic APA categories have different downstream effects on transcript and translation. Intriguingly, cryptic 3'Exts occur in multiple transcription factors, such as ELK1, SIX3, and TLX1, and lead to an increase in wild-type protein levels and function. Finally, we show that an increase in RNA stability leading to a higher cytoplasmic localisation underlies these observations. In summary, we demonstrate that TDP-43 nuclear depletion induces a novel category of cryptic RNA processing events and we expand the palette of TDP-43 loss consequences by showing this can also lead to an increase in normal protein translation.

3.

PolyGR and polyPR knock-in mice reveal a conserved neuroprotective extracellular matrix signature in C9orf72 ALS/FTD neurons.

Milioto, Carmelo; Carcolé, Mireia; Giblin, Ashling; Coneys, Rachel; Attrebi, Olivia; Ahmed, Mhoriam; Harris, Samuel S; Lee, Byung Il; Yang, Mengke; Ellingford, Robert A; Nirujogi, Raja S; Biggs, Daniel; Salomonsson, Sally; Zanovello, Matteo; de Oliveira, Paula; Katona, Eszter; Glaria, Idoia; Mikheenko, Alla; Geary, Bethany; Udine, Evan; Vaizoglu, Deniz; Anoar, Sharifah; Jotangiya, Khrisha; Crowley, Gerard; Smeeth, Demelza M; Adams, Mirjam L; Niccoli, Teresa; Rademakers, Rosa; van Blitterswijk, Marka; Devoy, Anny; Hong, Soyon; Partridge, Linda; Coyne, Alyssa N; Fratta, Pietro; Alessi, Dario R; Davies, Ben; Busche, Marc Aurel; Greensmith, Linda; Fisher, Elizabeth M C; Isaacs, Adrian M.

Nat Neurosci ; 27(4): 643-655, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38424324

RESUMEN

Dipeptide repeat proteins are a major pathogenic feature of C9orf72 amyotrophic lateral sclerosis (C9ALS)/frontotemporal dementia (FTD) pathology, but their physiological impact has yet to be fully determined. Here we generated C9orf72 dipeptide repeat knock-in mouse models characterized by expression of 400 codon-optimized polyGR or polyPR repeats, and heterozygous C9orf72 reduction. (GR)400 and (PR)400 knock-in mice recapitulate key features of C9ALS/FTD, including cortical neuronal hyperexcitability, age-dependent spinal motor neuron loss and progressive motor dysfunction. Quantitative proteomics revealed an increase in extracellular matrix (ECM) proteins in (GR)400 and (PR)400 spinal cord, with the collagen COL6A1 the most increased protein. TGF-ß1 was one of the top predicted regulators of this ECM signature and polyGR expression in human induced pluripotent stem cell neurons was sufficient to induce TGF-ß1 followed by COL6A1. Knockdown of TGF-ß1 or COL6A1 orthologues in polyGR model Drosophila exacerbated neurodegeneration, while expression of TGF-ß1 or COL6A1 in induced pluripotent stem cell-derived motor neurons of patients with C9ALS/FTD protected against glutamate-induced cell death. Altogether, our findings reveal a neuroprotective and conserved ECM signature in C9ALS/FTD.

Asunto(s)

Esclerosis Amiotrófica Lateral , Demencia Frontotemporal , Células Madre Pluripotentes Inducidas , Animales , Humanos , Ratones , Demencia Frontotemporal/patología , Esclerosis Amiotrófica Lateral/metabolismo , Factor de Crecimiento Transformador beta1 , Proteína C9orf72/genética , Proteína C9orf72/metabolismo , Células Madre Pluripotentes Inducidas/metabolismo , Neuronas Motoras/metabolismo , Drosophila , Matriz Extracelular/metabolismo , Dipéptidos/metabolismo , Expansión de las Repeticiones de ADN/genética

4.

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.

Pardo-Palacios, Francisco J; Wang, Dingjie; Reese, Fairlie; Diekhans, Mark; Carbonell-Sala, Sílvia; Williams, Brian; Loveland, Jane E; De María, Maite; Adams, Matthew S; Balderrama-Gutierrez, Gabriela; Behera, Amit K; Gonzalez, Jose M; Hunt, Toby; Lagarde, Julien; Liang, Cindy E; Li, Haoran; Jerryd Meade, Marcus; Moraga Amador, David A; Prjibelski, Andrey D; Birol, Inanc; Bostan, Hamed; Brooks, Ashley M; Hasan Çelik, Muhammed; Chen, Ying; Du, Mei R M; Felton, Colette; Göke, Jonathan; Hafezqorani, Saber; Herwig, Ralf; Kawaji, Hideya; Lee, Joseph; Liang Li, Jian; Lienhard, Matthias; Mikheenko, Alla; Mulligan, Dennis; Ming Nip, Ka; Pertea, Mihaela; Ritchie, Matthew E; Sim, Andre D; Tang, Alison D; Kei Wan, Yuk; Wang, Changqing; Wong, Brandon Y; Yang, Chen; Barnes, If; Berry, Andrew; Capella, Salvador; Dhillon, Namrita; Fernandez-Gonzalez, Jose M; Ferrández-Peral, Luis.

bioRxiv ; 2023 Jul 27.

Artículo en Inglés | MEDLINE | ID: mdl-37546854

RESUMEN

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

5.

WebQUAST: online evaluation of genome assemblies.

Mikheenko, Alla; Saveliev, Vladislav; Hirsch, Pascal; Gurevich, Alexey.

Nucleic Acids Res ; 51(W1): W601-W606, 2023 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-37194696

RESUMEN

Selecting proper genome assembly is key for downstream analysis in genomics studies. However, the availability of many genome assembly tools and the huge variety of their running parameters challenge this task. The existing online evaluation tools are limited to specific taxa or provide just a one-sided view on the assembly quality. We present WebQUAST, a web server for multifaceted quality assessment and comparison of genome assemblies based on the state-of-the-art QUAST tool. The server is freely available at https://www.ccb.uni-saarland.de/quast/. WebQUAST can handle an unlimited number of genome assemblies and evaluate them against a user-provided or pre-loaded reference genome or in a completely reference-free fashion. We demonstrate key WebQUAST features in three common evaluation scenarios: assembly of an unknown species, a model organism, and a close variant of it.

Asunto(s)

Genómica , Programas Informáticos , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Internet

6.

Accurate isoform discovery with IsoQuant using long reads.

Prjibelski, Andrey D; Mikheenko, Alla; Joglekar, Anoushka; Smetanin, Alexander; Jarroux, Julien; Lapidus, Alla L; Tilgner, Hagen U.

Nat Biotechnol ; 41(7): 915-918, 2023 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-36593406

RESUMEN

Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant-a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific Biosciences data.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , ARN , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN , Genoma , Análisis de Secuencia de ADN

7.

Fast and accurate mapping of long reads to complete genome assemblies with VerityMap.

Bzikadze, Andrey V; Mikheenko, Alla; Pevzner, Pavel A.

Genome Res ; 32(11-12): 2107-2118, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36379716

RESUMEN

Recent advancements in long-read sequencing have enabled the telomere-to-telomere (complete) assembly of a human genome and are now contributing to the haplotype-resolved complete assemblies of multiple human genomes. Because the accuracy of read mapping tools deteriorates in highly repetitive regions, there is a need to develop accurate, error-exposing (detecting potential assembly errors), and diploid-aware (distinguishing different haplotypes) tools for read mapping in complete assemblies. We describe the first accurate, error-exposing, and partially diploid-aware VerityMap tool for long-read mapping to complete assemblies.

Asunto(s)

Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN , Secuencias Repetitivas de Ácidos Nucleicos , Diploidia

8.

NPvis: An Interactive Visualizer of Peptidic Natural Product-MS/MS Matches.

Kunyavskaya, Olga; Mikheenko, Alla; Gurevich, Alexey.

Metabolites ; 12(8)2022 Jul 29.

Artículo en Inglés | MEDLINE | ID: mdl-36005578

RESUMEN

Peptidic natural products (PNPs) represent a medically important class of secondary metabolites that includes antibiotics, anti-inflammatory and antitumor agents. Advances in tandem mass spectra (MS/MS) acquisition and in silico database search methods have enabled high-throughput PNP discovery. However, the resulting spectra annotations are often error-prone and their validation remains a bottleneck. Here, we present NPvis, a visualizer suitable for the evaluation of PNP-MS/MS matches. The tool interactively maps annotated spectrum peaks to the corresponding PNP fragments and allows researchers to assess the match correctness. NPvis accounts for the wide chemical diversity of PNPs that prevents the use of the existing proteomics visualizers. Moreover, NPvis works even if the exact chemical structure of the matching PNP is unknown. The tool is available online and as a standalone application. We hope that it will benefit the community by streamlining PNP data analysis and validation.

9.

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

Mc Cartney, Ann M; Shafin, Kishwar; Alonge, Michael; Bzikadze, Andrey V; Formenti, Giulio; Fungtammasan, Arkarachai; Howe, Kerstin; Jain, Chirag; Koren, Sergey; Logsdon, Glennis A; Miga, Karen H; Mikheenko, Alla; Paten, Benedict; Shumate, Alaina; Soto, Daniela C; Sovic, Ivan; Wood, Jonathan M D; Zook, Justin M; Phillippy, Adam M; Rhie, Arang.

Nat Methods ; 19(6): 687-695, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35361931

RESUMEN

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Nanoporos , Femenino , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Embarazo , Análisis de Secuencia de ADN/métodos , Telómero/genética

10.

Single-nuclei isoform RNA sequencing unlocks barcoded exon connectivity in frozen brain tissue.

Hardwick, Simon A; Hu, Wen; Joglekar, Anoushka; Fan, Li; Collier, Paul G; Foord, Careen; Balacco, Jennifer; Lanjewar, Samantha; Sampson, Maureen McGuirk; Koopmans, Frank; Prjibelski, Andrey D; Mikheenko, Alla; Belchikov, Natan; Jarroux, Julien; Lucas, Anne Bergstrom; Palkovits, Miklós; Luo, Wenjie; Milner, Teresa A; Ndhlovu, Lishomwa C; Smit, August B; Trojanowski, John Q; Lee, Virginia M Y; Fedrigo, Olivier; Sloan, Steven A; Tombácz, Dóra; Ross, M Elizabeth; Jarvis, Erich; Boldogkoi, Zsolt; Gan, Li; Tilgner, Hagen U.

Nat Biotechnol ; 40(7): 1082-1092, 2022 07.

Artículo en Inglés | MEDLINE | ID: mdl-35256815

RESUMEN

Single-nuclei RNA sequencing characterizes cell types at the gene level. However, compared to single-cell approaches, many single-nuclei cDNAs are purely intronic, lack barcodes and hinder the study of isoforms. Here we present single-nuclei isoform RNA sequencing (SnISOr-Seq). Using microfluidics, PCR-based artifact removal, target enrichment and long-read sequencing, SnISOr-Seq increased barcoded, exon-spanning long reads 7.5-fold compared to naive long-read single-nuclei sequencing. We applied SnISOr-Seq to adult human frontal cortex and found that exons associated with autism exhibit coordinated and highly cell-type-specific inclusion. We found two distinct combination patterns: those distinguishing neural cell types, enriched in TSS-exon, exon-polyadenylation-site and non-adjacent exon pairs, and those with multiple configurations within one cell type, enriched in adjacent exon pairs. Finally, we observed that human-specific exons are almost as tightly coordinated as conserved exons, implying that coordination can be rapidly established during evolution. SnISOr-Seq enables cell-type-specific long-read isoform analysis in human brain and in any frozen or hard-to-dissociate sample.

Asunto(s)

Encéfalo , ARN , Empalme Alternativo/genética , Encéfalo/metabolismo , Exones/genética , Humanos , Isoformas de Proteínas/genética , ARN/genética , Análisis de Secuencia de ARN

11.

Complete genomic and epigenetic maps of human centromeres.

Altemose, Nicolas; Logsdon, Glennis A; Bzikadze, Andrey V; Sidhwani, Pragya; Langley, Sasha A; Caldas, Gina V; Hoyt, Savannah J; Uralsky, Lev; Ryabov, Fedor D; Shew, Colin J; Sauria, Michael E G; Borchers, Matthew; Gershman, Ariel; Mikheenko, Alla; Shepelev, Valery A; Dvorkina, Tatiana; Kunyavskaya, Olga; Vollger, Mitchell R; Rhie, Arang; McCartney, Ann M; Asri, Mobin; Lorig-Roach, Ryan; Shafin, Kishwar; Lucas, Julian K; Aganezov, Sergey; Olson, Daniel; de Lima, Leonardo Gomes; Potapova, Tamara; Hartley, Gabrielle A; Haukness, Marina; Kerpedjiev, Peter; Gusev, Fedor; Tigyi, Kristof; Brooks, Shelise; Young, Alice; Nurk, Sergey; Koren, Sergey; Salama, Sofie R; Paten, Benedict; Rogaev, Evgeny I; Streets, Aaron; Karpen, Gary H; Dernburg, Abby F; Sullivan, Beth A; Straight, Aaron F; Wheeler, Travis J; Gerton, Jennifer L; Eichler, Evan E; Phillippy, Adam M; Timp, Winston.

Science ; 376(6588): eabl4178, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35357911

RESUMEN

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

Asunto(s)

Centrómero/genética , Mapeo Cromosómico , Epigénesis Genética , Genoma Humano , Evolución Molecular , Genómica , Humanos , Secuencias Repetitivas de Ácidos Nucleicos

12.

The complete sequence of a human genome.

Nurk, Sergey; Koren, Sergey; Rhie, Arang; Rautiainen, Mikko; Bzikadze, Andrey V; Mikheenko, Alla; Vollger, Mitchell R; Altemose, Nicolas; Uralsky, Lev; Gershman, Ariel; Aganezov, Sergey; Hoyt, Savannah J; Diekhans, Mark; Logsdon, Glennis A; Alonge, Michael; Antonarakis, Stylianos E; Borchers, Matthew; Bouffard, Gerard G; Brooks, Shelise Y; Caldas, Gina V; Chen, Nae-Chyun; Cheng, Haoyu; Chin, Chen-Shan; Chow, William; de Lima, Leonardo G; Dishuck, Philip C; Durbin, Richard; Dvorkina, Tatiana; Fiddes, Ian T; Formenti, Giulio; Fulton, Robert S; Fungtammasan, Arkarachai; Garrison, Erik; Grady, Patrick G S; Graves-Lindsay, Tina A; Hall, Ira M; Hansen, Nancy F; Hartley, Gabrielle A; Haukness, Marina; Howe, Kerstin; Hunkapiller, Michael W; Jain, Chirag; Jain, Miten; Jarvis, Erich D; Kerpedjiev, Peter; Kirsche, Melanie; Kolmogorov, Mikhail; Korlach, Jonas; Kremitzki, Milinn; Li, Heng.

Science ; 376(6588): 44-53, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35357919

RESUMEN

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

Asunto(s)

Genoma Humano , Proyecto Genoma Humano , Análisis de Secuencia de ADN/normas , Línea Celular , Cromosomas Artificiales Bacterianos/genética , Cromosomas Humanos/genética , Humanos , Valores de Referencia

13.

Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns.

Mikheenko, Alla; Prjibelski, Andrey D; Joglekar, Anoushka; Tilgner, Hagen U.

Genome Res ; 32(4): 726-737, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35301264

RESUMEN

Long-read transcriptomics require understanding error sources inherent to technologies. Current approaches cannot compare methods for an individual RNA molecule. Here, we present a novel platform-comparison method that combines barcoding strategies and long-read sequencing to sequence cDNA copies representing an individual RNA molecule on both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). We compare these long-read pairs in terms of sequence content and isoform patterns. Although individual read pairs show high similarity, we find differences in (1) aligned length, (2) transcription start site (TSS), (3) polyadenylation site (poly(A)-site) assignment, and (4) exon-intron structures. Overall, 25% of read pairs disagree on either TSS, poly(A)-site, or splice site. Intron-chain disagreement typically arises from alignment errors of microexons and complicated splice sites. Our single-molecule technology comparison reveals that inconsistencies are often caused by sequencing error-induced inaccurate ONT alignments, especially to downstream GUNNGU donor motifs. However, annotation-disagreeing upstream shifts in NAGNAG acceptors in ONT are often confirmed by PacBio and are thus likely real. In both barcoded and nonbarcoded ONT reads, we find that intron number and proximity of GU/AGs better predict inconsistencies with the annotation than read quality alone. We summarize these findings in an annotation-based algorithm for spliced alignment correction that improves subsequent transcript construction with ONT reads.

Asunto(s)

Nanoporos , ADN Complementario , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN , Análisis de Secuencia de ADN/métodos , Tecnología

14.

The structure, function and evolution of a complete human chromosome 8.

Logsdon, Glennis A; Vollger, Mitchell R; Hsieh, PingHsun; Mao, Yafei; Liskovykh, Mikhail A; Koren, Sergey; Nurk, Sergey; Mercuri, Ludovica; Dishuck, Philip C; Rhie, Arang; de Lima, Leonardo G; Dvorkina, Tatiana; Porubsky, David; Harvey, William T; Mikheenko, Alla; Bzikadze, Andrey V; Kremitzki, Milinn; Graves-Lindsay, Tina A; Jain, Chirag; Hoekzema, Kendra; Murali, Shwetha C; Munson, Katherine M; Baker, Carl; Sorensen, Melanie; Lewis, Alexandra M; Surti, Urvashi; Gerton, Jennifer L; Larionov, Vladimir; Ventura, Mario; Miga, Karen H; Phillippy, Adam M; Eichler, Evan E.

Nature ; 593(7857): 101-107, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33828295

RESUMEN

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.

Asunto(s)

Cromosomas Humanos Par 8/química , Cromosomas Humanos Par 8/genética , Evolución Molecular , Animales , Línea Celular , Centrómero/química , Centrómero/genética , Centrómero/metabolismo , Cromosomas Humanos Par 8/fisiología , Metilación de ADN , ADN Satélite/genética , Epigénesis Genética , Femenino , Humanos , Macaca mulatta/genética , Masculino , Repeticiones de Minisatélite/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telómero/química , Telómero/genética , Telómero/metabolismo

15.

TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats.

Mikheenko, Alla; Bzikadze, Andrey V; Gurevich, Alexey; Miga, Karen H; Pevzner, Pavel A.

Bioinformatics ; 36(Suppl_1): i75-i83, 2020 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-32657355

RESUMEN

MOTIVATION: Extra-long tandem repeats (ETRs) are widespread in eukaryotic genomes and play an important role in fundamental cellular processes, such as chromosome segregation. Although emerging long-read technologies have enabled ETR assemblies, the accuracy of such assemblies is difficult to evaluate since there are no tools for their quality assessment. Moreover, since the mapping of error-prone reads to ETRs remains an open problem, it is not clear how to polish draft ETR assemblies. RESULTS: To address these problems, we developed the TandemTools software that includes the TandemMapper tool for mapping reads to ETRs and the TandemQUAST tool for polishing ETR assemblies and their quality assessment. We demonstrate that TandemTools not only reveals errors in ETR assemblies but also improves the recently generated assemblies of human centromeres. AVAILABILITY AND IMPLEMENTATION: https://github.com/ablab/TandemTools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Eucariontes , Humanos , Análisis de Secuencia de ADN , Secuencias Repetidas en Tándem

16.

Extending rnaSPAdes functionality for hybrid transcriptome assembly.

Prjibelski, Andrey D; Puglia, Giuseppe D; Antipov, Dmitry; Bushmanova, Elena; Giordano, Daniela; Mikheenko, Alla; Vitale, Domenico; Lapidus, Alla.

BMC Bioinformatics ; 21(Suppl 12): 302, 2020 Jul 24.

Artículo en Inglés | MEDLINE | ID: mdl-32703149

RESUMEN

BACKGROUND: De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. RESULTS: In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. CONCLUSION: To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.

Asunto(s)

Algoritmos , Transcriptoma/genética , Bases de Datos Genéticas , Humanos , Células MCF-7 , Nanoporos , RNA-Seq , Reproducibilidad de los Resultados

17.

MetaMiner: A Scalable Peptidogenomics Approach for Discovery of Ribosomal Peptide Natural Products with Blind Modifications from Microbial Communities.

Cao, Liu; Gurevich, Alexey; Alexander, Kelsey L; Naman, C Benjamin; Leão, Tiago; Glukhov, Evgenia; Luzzatto-Knaan, Tal; Vargas, Fernando; Quinn, Robby; Bouslimani, Amina; Nothias, Louis Felix; Singh, Nitin K; Sanders, Jon G; Benitez, Rodolfo A S; Thompson, Luke R; Hamid, Md-Nafiz; Morton, James T; Mikheenko, Alla; Shlemov, Alexander; Korobeynikov, Anton; Friedberg, Iddo; Knight, Rob; Venkateswaran, Kasthuri; Gerwick, William H; Gerwick, Lena; Dorrestein, Pieter C; Pevzner, Pavel A; Mohimani, Hosein.

Cell Syst ; 9(6): 600-608.e4, 2019 12 18.

Artículo en Inglés | MEDLINE | ID: mdl-31629686

RESUMEN

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are an important class of natural products that contain antibiotics and a variety of other bioactive compounds. The existing methods for discovery of RiPPs by combining genome mining and computational mass spectrometry are limited to discovering specific classes of RiPPs from small datasets, and these methods fail to handle unknown post-translational modifications. Here, we present MetaMiner, a software tool for addressing these challenges that is compatible with large-scale screening platforms for natural product discovery. After searching millions of spectra in the Global Natural Products Social (GNPS) molecular networking infrastructure against just eight genomic and metagenomic datasets, MetaMiner discovered 31 known and seven unknown RiPPs from diverse microbial communities, including human microbiome and lichen microbiome, and microorganisms isolated from the International Space Station.

Asunto(s)

Biología Computacional/métodos , Microbiota/genética , Procesamiento Proteico-Postraduccional/genética , Genómica/métodos , Humanos , Péptidos/química , Ribosomas/genética , Programas Informáticos

18.

Assembly Graph Browser: interactive visualization of assembly graphs.

Mikheenko, Alla; Kolmogorov, Mikhail.

Bioinformatics ; 35(18): 3476-3478, 2019 09 15.

Artículo en Inglés | MEDLINE | ID: mdl-30715194

RESUMEN

SUMMARY: Currently, most genome assembly projects focus on contigs and scaffolds rather than assembly graphs that provide a more comprehensive representation of an assembly. Since interactive visualization of large assembly graphs remains an open problem, we developed an Assembly Graph Browser (AGB) tool that visualizes large assembly graphs, extending the functionality of previously developed visualization approaches. Assembly Graph Browser includes a number of novel functions including repeat analysis, construction of the contracted assembly graphs (i.e. the graphs obtained by collapsing a selected set of edges) and a new approach to visualizing large assembly graphs. AVAILABILITY AND IMPLEMENTATION: http://www.github.com/almiheenko/AGB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Programas Informáticos

19.

Dereplication of microbial metabolites through database search of mass spectra.

Mohimani, Hosein; Gurevich, Alexey; Shlemov, Alexander; Mikheenko, Alla; Korobeynikov, Anton; Cao, Liu; Shcherbin, Egor; Nothias, Louis-Felix; Dorrestein, Pieter C; Pevzner, Pavel A.

Nat Commun ; 9(1): 4035, 2018 10 02.

Artículo en Inglés | MEDLINE | ID: mdl-30279420

RESUMEN

Natural products have traditionally been rich sources for drug discovery. In order to clear the road toward the discovery of unknown natural products, biologists need dereplication strategies that identify known ones. Here we report DEREPLICATOR+, an algorithm that improves on the previous approaches for identifying peptidic natural products, and extends them for identification of polyketides, terpenes, benzenoids, alkaloids, flavonoids, and other classes of natural products. We show that DEREPLICATOR+ can search all spectra in the recently launched Global Natural Products Social molecular network and identify an order of magnitude more natural products than previous dereplication efforts. We further demonstrate that DEREPLICATOR+ enables cross-validation of genome-mining and peptidogenomics/glycogenomics results.

Asunto(s)

Productos Biológicos/análisis , Descubrimiento de Drogas/métodos , Espectrometría de Masas , Actinomyces/química , Algoritmos , Cianobacterias/química , Genómica , Macrólidos/análisis , Programas Informáticos

20.

Versatile genome assembly evaluation with QUAST-LG.

Mikheenko, Alla; Prjibelski, Andrey; Saveliev, Vladislav; Antipov, Dmitry; Gurevich, Alexey.

Bioinformatics ; 34(13): i142-i150, 2018 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-29949969

RESUMEN

Motivation: The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results: In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference. Availability and implementation: http://cab.spbu.ru/software/quast-lg. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Genómica/métodos , Humanos , Saccharomyces cerevisiae/genética

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA