Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

Improved transcriptome assembly using a hybrid of long and short reads with StringTie.

Shumate, Alaina; Wong, Brandon; Pertea, Geo; Pertea, Mihaela.

PLoS Comput Biol ; 18(6): e1009730, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35648784

RESUMEN

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Transcriptoma , Algoritmos , Animales , Exones , Humanos , Ratones , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN , Programas Informáticos , Transcriptoma/genética

2.

Genome sequence of the progenitor of the wheat D genome Aegilops tauschii.

Luo, Ming-Cheng; Gu, Yong Q; Puiu, Daniela; Wang, Hao; Twardziok, Sven O; Deal, Karin R; Huo, Naxin; Zhu, Tingting; Wang, Le; Wang, Yi; McGuire, Patrick E; Liu, Shuyang; Long, Hai; Ramasamy, Ramesh K; Rodriguez, Juan C; Van, Sonny L; Yuan, Luxia; Wang, Zhenzhong; Xia, Zhiqiang; Xiao, Lichan; Anderson, Olin D; Ouyang, Shuhong; Liang, Yong; Zimin, Aleksey V; Pertea, Geo; Qi, Peng; Bennetzen, Jeffrey L; Dai, Xiongtao; Dawson, Matthew W; Müller, Hans-Georg; Kugler, Karl; Rivarola-Duarte, Lorena; Spannagl, Manuel; Mayer, Klaus F X; Lu, Fu-Hao; Bevan, Michael W; Leroy, Philippe; Li, Pingchuan; You, Frank M; Sun, Qixin; Liu, Zhiyong; Lyons, Eric; Wicker, Thomas; Salzberg, Steven L; Devos, Katrien M; Dvorák, Jan.

Nature ; 551(7681): 498-502, 2017 11 23.

Artículo en Inglés | MEDLINE | ID: mdl-29143815

RESUMEN

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.

Asunto(s)

Genoma de Planta , Filogenia , Poaceae/genética , Triticum/genética , Mapeo Cromosómico , Diploidia , Evolución Molecular , Duplicación de Gen , Genes de Plantas/genética , Genómica/normas , Poaceae/clasificación , Recombinación Genética/genética , Análisis de Secuencia de ADN/normas , Triticum/clasificación

3.

Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing.

Read, Andrew C; Moscou, Matthew J; Zimin, Aleksey V; Pertea, Geo; Meyer, Rachel S; Purugganan, Michael D; Leach, Jan E; Triplett, Lindsay R; Salzberg, Steven L; Bogdanove, Adam J.

PLoS Genet ; 16(1): e1008571, 2020 01.

Artículo en Inglés | MEDLINE | ID: mdl-31986137

RESUMEN

Long-read sequencing facilitates assembly of complex genomic regions. In plants, loci containing nucleotide-binding, leucine-rich repeat (NLR) disease resistance genes are an important example of such regions. NLR genes constitute one of the largest gene families in plants and are often clustered, evolving via duplication, contraction, and transposition. We recently mapped the Xo1 locus for resistance to bacterial blight and bacterial leaf streak, found in the American heirloom rice variety Carolina Gold Select, to a region that in the Nipponbare reference genome is NLR gene-rich. Here, toward identification of the Xo1 gene, we combined Nanopore and Illumina reads and generated a high-quality Carolina Gold Select genome assembly. We identified 529 complete or partial NLR genes and discovered, relative to Nipponbare, an expansion of NLR genes at the Xo1 locus. One of these has high sequence similarity to the cloned, functionally similar Xa1 gene. Both harbor an integrated zfBED domain, and the repeats within each protein are nearly perfect. Across diverse Oryzeae, we identified two sub-clades of NLR genes with these features, varying in the presence of the zfBED domain and the number of repeats. The Carolina Gold Select genome assembly also uncovered at the Xo1 locus a rice blast resistance gene and a gene encoding a polyphenol oxidase (PPO). PPO activity has been used as a marker for blast resistance at the locus in some varieties; however, the Carolina Gold Select sequence revealed a loss-of-function mutation in the PPO gene that breaks this association. Our results demonstrate that whole genome sequencing combining Nanopore and Illumina reads effectively resolves NLR gene loci. Our identification of an Xo1 candidate is an important step toward mechanistic characterization, including the role(s) of the zfBED domain. Finally, the Carolina Gold Select genome assembly will facilitate identification of other useful traits in this historically important variety.

Asunto(s)

Resistencia a la Enfermedad , Proteínas NLR/genética , Oryza/genética , Proteínas de Plantas/genética , Anotación de Secuencia Molecular , Proteínas NLR/química , Proteínas NLR/metabolismo , Secuenciación de Nanoporos/métodos , Oryza/inmunología , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Secuenciación Completa del Genoma/métodos , Dedos de Zinc

4.

TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets.

Varabyou, Ales; Pertea, Geo; Pockrandt, Christopher; Pertea, Mihaela.

Bioinformatics ; 37(20): 3650-3651, 2021 Oct 25.

Artículo en Inglés | MEDLINE | ID: mdl-33964128

RESUMEN

SUMMARY: Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual comparison of transcriptional landscapes between two sets of thousands of RNA-seq samples is limited by available computational resources, which can be overwhelmed due to the sheer size of the data. In this work, we present TieBrush, a software package designed to process very large sequencing datasets (RNA, whole-genome, exome, etc.) into a form that enables quick visual and computational inspection. TieBrush can also be used as a method for aggregating data for downstream computational analysis, and is compatible with most software tools that take aligned reads as input. AVAILABILITY AND IMPLEMENTATION: TieBrush is provided as a C++ package under the MIT License. Precompiled binaries, source code and example data are available on GitHub (https://github.com/alevar/tiebrush). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.

Applying Rapid Whole-Genome Sequencing To Predict Phenotypic Antimicrobial Susceptibility Testing Results among Carbapenem-Resistant Klebsiella pneumoniae Clinical Isolates.

Tamma, Pranita D; Fan, Yunfan; Bergman, Yehudit; Pertea, Geo; Kazmi, Abida Q; Lewis, Shawna; Carroll, Karen C; Schatz, Michael C; Timp, Winston; Simner, Patricia J.

Antimicrob Agents Chemother ; 63(1)2019 01.

Artículo en Inglés | MEDLINE | ID: mdl-30373801

RESUMEN

Standard antimicrobial susceptibility testing (AST) approaches lead to delays in the selection of optimal antimicrobial therapy. Here, we sought to determine the accuracy of antimicrobial resistance (AMR) determinants identified by Nanopore whole-genome sequencing in predicting AST results. Using a cohort of 40 clinical isolates (21 carbapenemase-producing carbapenem-resistant Klebsiella pneumoniae, 10 non-carbapenemase-producing carbapenem-resistant K. pneumoniae, and 9 carbapenem-susceptible K. pneumoniae isolates), three separate sequencing and analysis pipelines were performed, as follows: (i) a real-time Nanopore analysis approach identifying acquired AMR genes, (ii) an assembly-based Nanopore approach identifying acquired AMR genes and chromosomal mutations, and (iii) an approach using short-read correction of Nanopore assemblies. The short-read correction of Nanopore assemblies served as the reference standard to determine the accuracy of Nanopore sequencing results. With the real-time analysis approach, full annotation of acquired AMR genes occurred within 8 h from subcultured isolates. Assemblies sufficient for full resistance gene and single-nucleotide polymorphism annotation were available within 14 h from subcultured isolates. The overall agreement of genotypic results and anticipated AST results for the 40 K. pneumoniae isolates was 77% (range, 30% to 100%) and 92% (range, 80% to 100%) for the real-time approach and the assembly approach, respectively. Evaluating the patients contributing the 40 isolates, the real-time approach and assembly approach could shorten the median time to effective antibiotic therapy by 20 h and 26 h, respectively, compared to standard AST. Nanopore sequencing offers a rapid approach to both accurately identify resistance mechanisms and to predict AST results for K. pneumoniae isolates. Bioinformatics improvements enabling real-time alignment, coupled with rapid extraction and library preparation, will further enhance the accuracy and workflow of the Nanopore real-time approach.

Asunto(s)

Proteínas Bacterianas/genética , Farmacorresistencia Bacteriana Múltiple/genética , Genoma Bacteriano , Klebsiella pneumoniae/genética , Fenotipo , Secuenciación Completa del Genoma/métodos , beta-Lactamasas/genética , Antibacterianos/metabolismo , Antibacterianos/farmacología , Proteínas Bacterianas/metabolismo , Carbapenémicos/metabolismo , Carbapenémicos/farmacología , Estudios de Cohortes , Biología Computacional/métodos , Expresión Génica , Biblioteca de Genes , Humanos , Infecciones por Klebsiella/tratamiento farmacológico , Infecciones por Klebsiella/microbiología , Klebsiella pneumoniae/efectos de los fármacos , Klebsiella pneumoniae/enzimología , Klebsiella pneumoniae/aislamiento & purificación , Pruebas de Sensibilidad Microbiana , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma/instrumentación , beta-Lactamasas/metabolismo

6.

Analysis of gene expression in the postmortem brain of neurotypical Black Americans reveals contributions of genetic ancestry.

Benjamin, Kynon J M; Chen, Qiang; Eagles, Nicholas J; Huuki-Myers, Louise A; Collado-Torres, Leonardo; Stolz, Joshua M; Pertea, Geo; Shin, Joo Heon; Paquola, Apuã C M; Hyde, Thomas M; Kleinman, Joel E; Jaffe, Andrew E; Han, Shizhong; Weinberger, Daniel R.

Nat Neurosci ; 27(6): 1064-1074, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38769152

RESUMEN

Ancestral differences in genomic variation affect the regulation of gene expression; however, most gene expression studies have been limited to European ancestry samples or adjusted to identify ancestry-independent associations. Here, we instead examined the impact of genetic ancestry on gene expression and DNA methylation in the postmortem brain tissue of admixed Black American neurotypical individuals to identify ancestry-dependent and ancestry-independent contributions. Ancestry-associated differentially expressed genes (DEGs), transcripts and gene networks, while notably not implicating neurons, are enriched for genes related to the immune response and vascular tissue and explain up to 26% of heritability for ischemic stroke, 27% of heritability for Parkinson disease and 30% of heritability for Alzheimer's disease. Ancestry-associated DEGs also show general enrichment for the heritability of diverse immune-related traits but depletion for psychiatric-related traits. We also compared Black and non-Hispanic white Americans, confirming most ancestry-associated DEGs. Our results delineate the extent to which genetic ancestry affects differences in gene expression in the human brain and the implications for brain illness risk.

Asunto(s)

Negro o Afroamericano , Encéfalo , Metilación de ADN , Humanos , Negro o Afroamericano/genética , Encéfalo/metabolismo , Femenino , Masculino , Población Blanca/genética , Autopsia , Expresión Génica/genética , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Enfermedad de Alzheimer/etnología , Anciano , Persona de Mediana Edad

7.

Sex affects transcriptional associations with schizophrenia across the dorsolateral prefrontal cortex, hippocampus, and caudate nucleus.

Benjamin, Kynon J M; Arora, Ria; Feltrin, Arthur S; Pertea, Geo; Giles, Hunter H; Stolz, Joshua M; D'Ignazio, Laura; Collado-Torres, Leonardo; Shin, Joo Heon; Ulrich, William S; Hyde, Thomas M; Kleinman, Joel E; Weinberger, Daniel R; Paquola, Apuã C M; Erwin, Jennifer A.

Nat Commun ; 15(1): 3980, 2024 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-38730231

RESUMEN

Schizophrenia is a complex neuropsychiatric disorder with sexually dimorphic features, including differential symptomatology, drug responsiveness, and male incidence rate. Prior large-scale transcriptome analyses for sex differences in schizophrenia have focused on the prefrontal cortex. Analyzing BrainSeq Consortium data (caudate nucleus: n = 399, dorsolateral prefrontal cortex: n = 377, and hippocampus: n = 394), we identified 831 unique genes that exhibit sex differences across brain regions, enriched for immune-related pathways. We observed X-chromosome dosage reduction in the hippocampus of male individuals with schizophrenia. Our sex interaction model revealed 148 junctions dysregulated in a sex-specific manner in schizophrenia. Sex-specific schizophrenia analysis identified dozens of differentially expressed genes, notably enriched in immune-related pathways. Finally, our sex-interacting expression quantitative trait loci analysis revealed 704 unique genes, nine associated with schizophrenia risk. These findings emphasize the importance of sex-informed analysis of sexually dimorphic traits, inform personalized therapeutic strategies in schizophrenia, and highlight the need for increased female samples for schizophrenia analyses.

Asunto(s)

Núcleo Caudado , Corteza Prefontal Dorsolateral , Hipocampo , Sitios de Carácter Cuantitativo , Esquizofrenia , Caracteres Sexuales , Humanos , Esquizofrenia/genética , Esquizofrenia/metabolismo , Femenino , Masculino , Hipocampo/metabolismo , Núcleo Caudado/metabolismo , Corteza Prefontal Dorsolateral/metabolismo , Adulto , Transcriptoma , Perfilación de la Expresión Génica , Factores Sexuales , Cromosomas Humanos X/genética , Corteza Prefrontal/metabolismo

8.

Systems biology dissection of PTSD and MDD across brain regions, cell types, and blood.

Daskalakis, Nikolaos P; Iatrou, Artemis; Chatzinakos, Chris; Jajoo, Aarti; Snijders, Clara; Wylie, Dennis; DiPietro, Christopher P; Tsatsani, Ioulia; Chen, Chia-Yen; Pernia, Cameron D; Soliva-Estruch, Marina; Arasappan, Dhivya; Bharadwaj, Rahul A; Collado-Torres, Leonardo; Wuchty, Stefan; Alvarez, Victor E; Dammer, Eric B; Deep-Soboslay, Amy; Duong, Duc M; Eagles, Nick; Huber, Bertrand R; Huuki, Louise; Holstein, Vincent L; Logue, Mark W; Lugenbühl, Justina F; Maihofer, Adam X; Miller, Mark W; Nievergelt, Caroline M; Pertea, Geo; Ross, Deanna; Sendi, Mohammad S E; Sun, Benjamin B; Tao, Ran; Tooke, James; Wolf, Erika J; Zeier, Zane; Berretta, Sabina; Champagne, Frances A; Hyde, Thomas; Seyfried, Nicholas T; Shin, Joo Heon; Weinberger, Daniel R; Nemeroff, Charles B; Kleinman, Joel E; Ressler, Kerry J.

Science ; 384(6698): eadh3707, 2024 May 24.

Artículo en Inglés | MEDLINE | ID: mdl-38781393

RESUMEN

The molecular pathology of stress-related disorders remains elusive. Our brain multiregion, multiomic study of posttraumatic stress disorder (PTSD) and major depressive disorder (MDD) included the central nucleus of the amygdala, hippocampal dentate gyrus, and medial prefrontal cortex (mPFC). Genes and exons within the mPFC carried most disease signals replicated across two independent cohorts. Pathways pointed to immune function, neuronal and synaptic regulation, and stress hormones. Multiomic factor and gene network analyses provided the underlying genomic structure. Single nucleus RNA sequencing in dorsolateral PFC revealed dysregulated (stress-related) signals in neuronal and non-neuronal cell types. Analyses of brain-blood intersections in >50,000 UK Biobank participants were conducted along with fine-mapping of the results of PTSD and MDD genome-wide association studies to distinguish risk from disease processes. Our data suggest shared and distinct molecular pathology in both disorders and propose potential therapeutic targets and biomarkers.

Asunto(s)

Encéfalo , Trastorno Depresivo Mayor , Sitios Genéticos , Trastornos por Estrés Postraumático , Femenino , Humanos , Masculino , Amígdala del Cerebelo/metabolismo , Biomarcadores/metabolismo , Encéfalo/metabolismo , Trastorno Depresivo Mayor/genética , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Neuronas/metabolismo , Corteza Prefrontal/metabolismo , Trastornos por Estrés Postraumático/genética , Biología de Sistemas , Análisis de Expresión Génica de una Sola Célula , Mapeo Cromosómico

9.

Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

Dalloul, Rami A; Long, Julie A; Zimin, Aleksey V; Aslam, Luqman; Beal, Kathryn; Blomberg, Le Ann; Bouffard, Pascal; Burt, David W; Crasta, Oswald; Crooijmans, Richard P M A; Cooper, Kristal; Coulombe, Roger A; De, Supriyo; Delany, Mary E; Dodgson, Jerry B; Dong, Jennifer J; Evans, Clive; Frederickson, Karin M; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A M; Harkins, Tim T; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R; Payne, William S; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali.

PLoS Biol ; 8(9)2010 Sep 07.

Artículo en Inglés | MEDLINE | ID: mdl-20838655

RESUMEN

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (â¼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.

Asunto(s)

Genoma , Pavos/genética , Animales , Secuencia de Bases , Mapeo Cromosómico , ADN/genética , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico , Especificidad de la Especie

10.

Genetic and environmental contributions to ancestry differences in gene expression in the human brain.

Benjamin, Kynon J M; Chen, Qiang; Eagles, Nicholas J; Huuki-Myers, Louise A; Collado-Torres, Leonardo; Stolz, Joshua M; Pertea, Geo; Shin, Joo Heon; Paquola, Apuã C M; Hyde, Thomas M; Kleinman, Joel E; Jaffe, Andrew E; Han, Shizhong; Weinberger, Daniel R.

bioRxiv ; 2023 Oct 05.

Artículo en Inglés | MEDLINE | ID: mdl-37034760

RESUMEN

Ancestral differences in genomic variation are determining factors in gene regulation; however, most gene expression studies have been limited to European ancestry samples or adjusted for ancestry to identify ancestry-independent associations. We instead examined the impact of genetic ancestry on gene expression and DNA methylation (DNAm) in admixed African/Black American neurotypical individuals to untangle effects of genetic and environmental factors. Ancestry-associated differentially expressed genes (DEGs), transcripts, and gene networks, while notably not implicating neurons, are enriched for genes related to immune response and vascular tissue and explain up to 26% of heritability for ischemic stroke, 27% of heritability for Parkinson's disease, and 30% of heritability for Alzhemier's disease. Ancestry-associated DEGs also show general enrichment for heritability of diverse immune-related traits but depletion for psychiatric-related traits. The cell-type enrichments and direction of effects vary by brain region. These DEGs are less evolutionarily constrained and are largely explained by genetic variations; roughly 15% are predicted by DNAm variation implicating environmental exposures. We also compared Black and White Americans, confirming most of these ancestry-associated DEGs. Our results highlight how environment and genetic background affect genetic ancestry differences in gene expression in the human brain and affect risk for brain illness.

11.

Detection of lineage-specific evolutionary changes among primate species.

Pertea, Mihaela; Pertea, Geo M; Salzberg, Steven L.

BMC Bioinformatics ; 12: 274, 2011 Jul 04.

Artículo en Inglés | MEDLINE | ID: mdl-21726447

RESUMEN

BACKGROUND: Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection. RESULTS: We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection. CONCLUSIONS: DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.

Asunto(s)

Filogenia , Primates/genética , Programas Informáticos , Animales , Evolución Biológica , Genoma , Genoma Humano , Humanos

12.

GFF Utilities: GffRead and GffCompare.

Pertea, Geo; Pertea, Mihaela.

F1000Res ; 92020.

Artículo en Inglés | MEDLINE | ID: mdl-32489650

RESUMEN

GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license ( https://github.com/gpertea/gffread, https://github.com/gpertea/gffcompare).

Asunto(s)

Biología Computacional , Genómica , Programas Informáticos , Genoma , Anotación de Secuencia Molecular

13.

Elevated expression of protein biosynthesis genes in liver and muscle of hibernating black bears (Ursus americanus).

Fedorov, Vadim B; Goropashnaya, Anna V; Tøien, Øivind; Stewart, Nathan C; Gracey, Andrew Y; Chang, Celia; Qin, Shizhen; Pertea, Geo; Quackenbush, John; Showe, Louise C; Showe, Michael K; Boyer, Bert B; Barnes, Brian M.

Physiol Genomics ; 37(2): 108-18, 2009 04 10.

Artículo en Inglés | MEDLINE | ID: mdl-19240299

RESUMEN

We conducted a large-scale gene expression screen using the 3,200 cDNA probe microarray developed specifically for Ursus americanus to detect expression differences in liver and skeletal muscle that occur during winter hibernation compared with animals sampled during summer. The expression of 12 genes, including RNA binding protein motif 3 (Rbm3), that are mostly involved in protein biosynthesis, was induced during hibernation in both liver and muscle. The Gene Ontology and Gene Set Enrichment analysis consistently showed a highly significant enrichment of the protein biosynthesis category by overexpressed genes in both liver and skeletal muscle during hibernation. Coordinated induction in transcriptional level of genes involved in protein biosynthesis is a distinctive feature of the transcriptome in hibernating black bears. This finding implies induction of translation and suggests an adaptive mechanism that contributes to a unique ability to reduce muscle atrophy over prolonged periods of immobility during hibernation. Comparing expression profiles in bears to small mammalian hibernators shows a general trend during hibernation of transcriptional changes that include induction of genes involved in lipid metabolism and carbohydrate synthesis as well as depression of genes involved in the urea cycle and detoxification function in liver.

Asunto(s)

Perfilación de la Expresión Génica , Hibernación/genética , Hígado/metabolismo , Músculo Esquelético/metabolismo , Biosíntesis de Proteínas/genética , Ursidae/genética , Animales , Metabolismo Basal , Temperatura Corporal , Biblioteca de Genes , Genómica/métodos , Masculino , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Ursidae/metabolismo , Ursidae/fisiología

14.

Transcriptome assembly from long-read RNA-seq alignments with StringTie2.

Kovaka, Sam; Zimin, Aleksey V; Pertea, Geo M; Razaghi, Roham; Salzberg, Steven L; Pertea, Mihaela.

Genome Biol ; 20(1): 278, 2019 12 16.

Artículo en Inglés | MEDLINE | ID: mdl-31842956

RESUMEN

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.

Asunto(s)

Técnicas Genéticas , Genómica/métodos , Transcriptoma , Animales , Arabidopsis , Humanos , Análisis de Secuencia de ARN , Programas Informáticos , Zea mays

15.

The TIGR Maize Database.

Chan, Agnes P; Pertea, Geo; Cheung, Foo; Lee, Dan; Zheng, Li; Whitelaw, Cathy; Pontaroli, Ana C; SanMiguel, Phillip; Yuan, Yinan; Bennetzen, Jeffrey; Barbazuk, William Brad; Quackenbush, John; Rabinowicz, Pablo D.

Nucleic Acids Res ; 34(Database issue): D771-6, 2006 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-16381977

RESUMEN

Maize is a staple crop of the grass family and also an excellent model for plant genetics. Owing to the large size and repetitiveness of its genome, we previously investigated two approaches to accelerate gene discovery and genome analysis in maize: methylation filtration and high C(0)t selection. These techniques allow the construction of gene-enriched genomic libraries by minimizing repeat sequences due to either their methylation status or their copy number, yielding a 7-fold enrichment in genic sequences relative to a random genomic library. Approximately 900,000 gene-enriched reads from maize were generated and clustered into Assembled Zea mays (AZM) sequences. Here we report the current AZM release, which consists of approximately 298 Mb representing 243,807 sequence assemblies and singletons. In order to provide a repository of publicly available maize genomic sequences, we have created the TIGR Maize Database (http://maize.tigr.org). In this resource, we have assembled and annotated the AZMs and used available sequenced markers to anchor AZMs to maize chromosomes. We have constructed a maize repeat database and generated draft sequence assemblies of 287 maize bacterial artificial chromosome (BAC) clone sequences, which we annotated along with 172 additional publicly available BAC clones. All sequences, assemblies and annotations are available at the project website via web interfaces and FTP downloads.

Asunto(s)

Bases de Datos de Ácidos Nucleicos , Genoma de Planta , Zea mays/genética , Mapeo Cromosómico , Cromosomas Artificiales Bacterianos , Genes de Plantas , Biblioteca Genómica , Genómica , Internet , Secuencias Repetitivas de Ácidos Nucleicos , Interfaz Usuario-Computador

16.

CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise.

Pertea, Mihaela; Shumate, Alaina; Pertea, Geo; Varabyou, Ales; Breitwieser, Florian P; Chang, Yu-Chi; Madugundu, Anil K; Pandey, Akhilesh; Salzberg, Steven L.

Genome Biol ; 19(1): 208, 2018 11 28.

Artículo en Inglés | MEDLINE | ID: mdl-30486838

RESUMEN

We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .

Asunto(s)

Bases de Datos Genéticas , Análisis de Secuencia de ARN , Transcripción Genética , Secuencia de Aminoácidos , Animales , Femenino , Humanos , Intrones , Masculino

17.

Global comparative analysis of ESTs from the southern cattle tick, Rhipicephalus (Boophilus) microplus.

Wang, Minghua; Guerrero, Felix D; Pertea, Geo; Nene, Vishvanath M.

BMC Genomics ; 8: 368, 2007 Oct 12.

Artículo en Inglés | MEDLINE | ID: mdl-17935616

RESUMEN

BACKGROUND: The southern cattle tick, Rhipicephalus (Boophilus) microplus, is an economically important parasite of cattle and can transmit several pathogenic microorganisms to its cattle host during the feeding process. Understanding the biology and genomics of R. microplus is critical to developing novel methods for controlling these ticks. RESULTS: We present a global comparative genomic analysis of a gene index of R. microplus comprised of 13,643 unique transcripts assembled from 42,512 expressed sequence tags (ESTs), a significant fraction of the complement of R. microplus genes. The source material for these ESTs consisted of polyA RNA from various tissues, lifestages, and strains of R. microplus, including larvae exposed to heat, cold, host odor, and acaricide. Functional annotation using RPS-Blast analysis identified conserved protein domains in the conceptually translated gene index and assigned GO terms to those database transcripts which had informative BlastX hits. Blast Score Ratio and SimiTri analysis compared the conceptual transcriptome of the R. microplus database to other eukaryotic proteomes and EST databases, including those from 3 ticks. The most abundant protein domains in BmiGI were also analyzed by SimiTri methodology. CONCLUSION: These results indicate that a large fraction of BmiGI entries have no homologs in other sequenced genomes. Analysis with the PartiGene annotation pipeline showed 64% of the members of BmiGI could not be assigned GO annotation, thus minimal information is available about a significant fraction of the tick genome. This highlights the important insights in tick biology which are likely to result from a tick genome sequencing project. Global comparative analysis identified some tick genes with unexpected phylogenetic relationships which detailed analysis attributed to gene losses in some members of the animal kingdom. Some tick genes were identified which had close orthologues to mammalian genes. Members of this group would likely be poor choices as targets for development of novel tick control technology.

Asunto(s)

Etiquetas de Secuencia Expresada , Genómica , Rhipicephalus/genética , Animales , Bovinos , Biblioteca de Genes , Filogenia

18.

First Draft Genome Sequence of the Pathogenic Fungus Lomentospora prolificans (Formerly Scedosporium prolificans).

Luo, Ruibang; Zimin, Aleksey; Workman, Rachael; Fan, Yunfan; Pertea, Geo; Grossman, Nina; Wear, Maggie P; Jia, Bei; Miller, Heather; Casadevall, Arturo; Timp, Winston; Zhang, Sean X; Salzberg, Steven L.

G3 (Bethesda) ; 7(11): 3831-3836, 2017 11 06.

Artículo en Inglés | MEDLINE | ID: mdl-28963165

RESUMEN

Here we describe the sequencing and assembly of the pathogenic fungus Lomentospora prolificans using a combination of short, highly accurate Illumina reads and additional coverage in very long Oxford Nanopore reads. The resulting assembly is highly contiguous, containing a total of 37,627,092 bp with over 98% of the sequence in just 26 scaffolds. Annotation identified 8896 protein-coding genes. Pulsed-field gel analysis suggests that this organism contains at least 7 and possibly 11 chromosomes, the two longest of which have sizes corresponding closely to the sizes of the longest scaffolds, at 6.6 and 5.7 Mb.

Asunto(s)

Genoma Fúngico , Anotación de Secuencia Molecular , Scedosporium/genética , Proteínas Fúngicas/genética , Secuenciación Completa del Genoma

19.

The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.

Neale, David B; McGuire, Patrick E; Wheeler, Nicholas C; Stevens, Kristian A; Crepeau, Marc W; Cardeno, Charis; Zimin, Aleksey V; Puiu, Daniela; Pertea, Geo M; Sezen, U Uzay; Casola, Claudio; Koralewski, Tomasz E; Paul, Robin; Gonzalez-Ibeas, Daniel; Zaman, Sumaira; Cronn, Richard; Yandell, Mark; Holt, Carson; Langley, Charles H; Yorke, James A; Salzberg, Steven L; Wegrzyn, Jill L.

G3 (Bethesda) ; 7(9): 3157-3167, 2017 09 07.

Artículo en Inglés | MEDLINE | ID: mdl-28751502

RESUMEN

A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms.

Asunto(s)

Genoma de Planta , Fotosíntesis/genética , Pinaceae/genética , Pinaceae/metabolismo , Pseudotsuga/genética , Pseudotsuga/metabolismo , Secuenciación Completa del Genoma , Adaptación Biológica/genética , Biología Computacional , Evolución Molecular , Duplicación de Gen , Redes Reguladoras de Genes , Genómica , Anotación de Secuencia Molecular , Familia de Multigenes , Filogenia , Pinaceae/clasificación , Proteómica/métodos , Pseudotsuga/clasificación , Secuencias Repetitivas de Ácidos Nucleicos

20.

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.

Pertea, Mihaela; Kim, Daehwan; Pertea, Geo M; Leek, Jeffrey T; Salzberg, Steven L.

Nat Protoc ; 11(9): 1650-67, 2016 09.

Artículo en Inglés | MEDLINE | ID: mdl-27560171

RESUMEN

High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Estadística como Asunto/métodos , Anotación de Secuencia Molecular , ARN Mensajero/genética , ARN Mensajero/metabolismo , Interfaz Usuario-Computador

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA