Pesquisa | Portal de Pesquisa da BVS Enfermagem

1.

Improved transcriptome assembly using a hybrid of long and short reads with StringTie.

Shumate, Alaina; Wong, Brandon; Pertea, Geo; Pertea, Mihaela.

PLoS Comput Biol ; 18(6): e1009730, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35648784

RESUMO

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Transcriptoma , Algoritmos , Animais , Éxons , Humanos , Camundongos , Análise de Sequência de DNA , Análise de Sequência de RNA , Software , Transcriptoma/genética

2.

Genome sequence of the progenitor of the wheat D genome Aegilops tauschii.

Luo, Ming-Cheng; Gu, Yong Q; Puiu, Daniela; Wang, Hao; Twardziok, Sven O; Deal, Karin R; Huo, Naxin; Zhu, Tingting; Wang, Le; Wang, Yi; McGuire, Patrick E; Liu, Shuyang; Long, Hai; Ramasamy, Ramesh K; Rodriguez, Juan C; Van, Sonny L; Yuan, Luxia; Wang, Zhenzhong; Xia, Zhiqiang; Xiao, Lichan; Anderson, Olin D; Ouyang, Shuhong; Liang, Yong; Zimin, Aleksey V; Pertea, Geo; Qi, Peng; Bennetzen, Jeffrey L; Dai, Xiongtao; Dawson, Matthew W; Müller, Hans-Georg; Kugler, Karl; Rivarola-Duarte, Lorena; Spannagl, Manuel; Mayer, Klaus F X; Lu, Fu-Hao; Bevan, Michael W; Leroy, Philippe; Li, Pingchuan; You, Frank M; Sun, Qixin; Liu, Zhiyong; Lyons, Eric; Wicker, Thomas; Salzberg, Steven L; Devos, Katrien M; Dvorák, Jan.

Nature ; 551(7681): 498-502, 2017 11 23.

Artigo em Inglês | MEDLINE | ID: mdl-29143815

RESUMO

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.

Assuntos

Genoma de Planta , Filogenia , Poaceae/genética , Triticum/genética , Mapeamento Cromossômico , Diploide , Evolução Molecular , Duplicação Gênica , Genes de Plantas/genética , Genômica/normas , Poaceae/classificação , Recombinação Genética/genética , Análise de Sequência de DNA/normas , Triticum/classificação

3.

Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing.

Read, Andrew C; Moscou, Matthew J; Zimin, Aleksey V; Pertea, Geo; Meyer, Rachel S; Purugganan, Michael D; Leach, Jan E; Triplett, Lindsay R; Salzberg, Steven L; Bogdanove, Adam J.

PLoS Genet ; 16(1): e1008571, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31986137

RESUMO

Long-read sequencing facilitates assembly of complex genomic regions. In plants, loci containing nucleotide-binding, leucine-rich repeat (NLR) disease resistance genes are an important example of such regions. NLR genes constitute one of the largest gene families in plants and are often clustered, evolving via duplication, contraction, and transposition. We recently mapped the Xo1 locus for resistance to bacterial blight and bacterial leaf streak, found in the American heirloom rice variety Carolina Gold Select, to a region that in the Nipponbare reference genome is NLR gene-rich. Here, toward identification of the Xo1 gene, we combined Nanopore and Illumina reads and generated a high-quality Carolina Gold Select genome assembly. We identified 529 complete or partial NLR genes and discovered, relative to Nipponbare, an expansion of NLR genes at the Xo1 locus. One of these has high sequence similarity to the cloned, functionally similar Xa1 gene. Both harbor an integrated zfBED domain, and the repeats within each protein are nearly perfect. Across diverse Oryzeae, we identified two sub-clades of NLR genes with these features, varying in the presence of the zfBED domain and the number of repeats. The Carolina Gold Select genome assembly also uncovered at the Xo1 locus a rice blast resistance gene and a gene encoding a polyphenol oxidase (PPO). PPO activity has been used as a marker for blast resistance at the locus in some varieties; however, the Carolina Gold Select sequence revealed a loss-of-function mutation in the PPO gene that breaks this association. Our results demonstrate that whole genome sequencing combining Nanopore and Illumina reads effectively resolves NLR gene loci. Our identification of an Xo1 candidate is an important step toward mechanistic characterization, including the role(s) of the zfBED domain. Finally, the Carolina Gold Select genome assembly will facilitate identification of other useful traits in this historically important variety.

Assuntos

Resistência à Doença , Proteínas NLR/genética , Oryza/genética , Proteínas de Plantas/genética , Anotação de Sequência Molecular , Proteínas NLR/química , Proteínas NLR/metabolismo , Sequenciamento por Nanoporos/métodos , Oryza/imunologia , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Sequenciamento Completo do Genoma/métodos , Dedos de Zinco

4.

TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets.

Varabyou, Ales; Pertea, Geo; Pockrandt, Christopher; Pertea, Mihaela.

Bioinformatics ; 37(20): 3650-3651, 2021 Oct 25.

Artigo em Inglês | MEDLINE | ID: mdl-33964128

RESUMO

SUMMARY: Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual comparison of transcriptional landscapes between two sets of thousands of RNA-seq samples is limited by available computational resources, which can be overwhelmed due to the sheer size of the data. In this work, we present TieBrush, a software package designed to process very large sequencing datasets (RNA, whole-genome, exome, etc.) into a form that enables quick visual and computational inspection. TieBrush can also be used as a method for aggregating data for downstream computational analysis, and is compatible with most software tools that take aligned reads as input. AVAILABILITY AND IMPLEMENTATION: TieBrush is provided as a C++ package under the MIT License. Precompiled binaries, source code and example data are available on GitHub (https://github.com/alevar/tiebrush). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.

Applying Rapid Whole-Genome Sequencing To Predict Phenotypic Antimicrobial Susceptibility Testing Results among Carbapenem-Resistant Klebsiella pneumoniae Clinical Isolates.

Tamma, Pranita D; Fan, Yunfan; Bergman, Yehudit; Pertea, Geo; Kazmi, Abida Q; Lewis, Shawna; Carroll, Karen C; Schatz, Michael C; Timp, Winston; Simner, Patricia J.

Antimicrob Agents Chemother ; 63(1)2019 01.

Artigo em Inglês | MEDLINE | ID: mdl-30373801

RESUMO

Standard antimicrobial susceptibility testing (AST) approaches lead to delays in the selection of optimal antimicrobial therapy. Here, we sought to determine the accuracy of antimicrobial resistance (AMR) determinants identified by Nanopore whole-genome sequencing in predicting AST results. Using a cohort of 40 clinical isolates (21 carbapenemase-producing carbapenem-resistant Klebsiella pneumoniae, 10 non-carbapenemase-producing carbapenem-resistant K. pneumoniae, and 9 carbapenem-susceptible K. pneumoniae isolates), three separate sequencing and analysis pipelines were performed, as follows: (i) a real-time Nanopore analysis approach identifying acquired AMR genes, (ii) an assembly-based Nanopore approach identifying acquired AMR genes and chromosomal mutations, and (iii) an approach using short-read correction of Nanopore assemblies. The short-read correction of Nanopore assemblies served as the reference standard to determine the accuracy of Nanopore sequencing results. With the real-time analysis approach, full annotation of acquired AMR genes occurred within 8 h from subcultured isolates. Assemblies sufficient for full resistance gene and single-nucleotide polymorphism annotation were available within 14 h from subcultured isolates. The overall agreement of genotypic results and anticipated AST results for the 40 K. pneumoniae isolates was 77% (range, 30% to 100%) and 92% (range, 80% to 100%) for the real-time approach and the assembly approach, respectively. Evaluating the patients contributing the 40 isolates, the real-time approach and assembly approach could shorten the median time to effective antibiotic therapy by 20 h and 26 h, respectively, compared to standard AST. Nanopore sequencing offers a rapid approach to both accurately identify resistance mechanisms and to predict AST results for K. pneumoniae isolates. Bioinformatics improvements enabling real-time alignment, coupled with rapid extraction and library preparation, will further enhance the accuracy and workflow of the Nanopore real-time approach.

Assuntos

Proteínas de Bactérias/genética , Farmacorresistência Bacteriana Múltipla/genética , Genoma Bacteriano , Klebsiella pneumoniae/genética , Fenótipo , Sequenciamento Completo do Genoma/métodos , beta-Lactamases/genética , Antibacterianos/metabolismo , Antibacterianos/farmacologia , Proteínas de Bactérias/metabolismo , Carbapenêmicos/metabolismo , Carbapenêmicos/farmacologia , Estudos de Coortes , Biologia Computacional/métodos , Expressão Gênica , Biblioteca Gênica , Humanos , Infecções por Klebsiella/tratamento farmacológico , Infecções por Klebsiella/microbiologia , Klebsiella pneumoniae/efeitos dos fármacos , Klebsiella pneumoniae/enzimologia , Klebsiella pneumoniae/isolamento & purificação , Testes de Sensibilidade Microbiana , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma/instrumentação , beta-Lactamases/metabolismo

6.

Analysis of gene expression in the postmortem brain of neurotypical Black Americans reveals contributions of genetic ancestry.

Benjamin, Kynon J M; Chen, Qiang; Eagles, Nicholas J; Huuki-Myers, Louise A; Collado-Torres, Leonardo; Stolz, Joshua M; Pertea, Geo; Shin, Joo Heon; Paquola, Apuã C M; Hyde, Thomas M; Kleinman, Joel E; Jaffe, Andrew E; Han, Shizhong; Weinberger, Daniel R.

Nat Neurosci ; 27(6): 1064-1074, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38769152

RESUMO

Ancestral differences in genomic variation affect the regulation of gene expression; however, most gene expression studies have been limited to European ancestry samples or adjusted to identify ancestry-independent associations. Here, we instead examined the impact of genetic ancestry on gene expression and DNA methylation in the postmortem brain tissue of admixed Black American neurotypical individuals to identify ancestry-dependent and ancestry-independent contributions. Ancestry-associated differentially expressed genes (DEGs), transcripts and gene networks, while notably not implicating neurons, are enriched for genes related to the immune response and vascular tissue and explain up to 26% of heritability for ischemic stroke, 27% of heritability for Parkinson disease and 30% of heritability for Alzheimer's disease. Ancestry-associated DEGs also show general enrichment for the heritability of diverse immune-related traits but depletion for psychiatric-related traits. We also compared Black and non-Hispanic white Americans, confirming most ancestry-associated DEGs. Our results delineate the extent to which genetic ancestry affects differences in gene expression in the human brain and the implications for brain illness risk.

Assuntos

Negro ou Afro-Americano , Encéfalo , Metilação de DNA , Humanos , Negro ou Afro-Americano/genética , Encéfalo/metabolismo , Feminino , Masculino , População Branca/genética , Autopsia , Expressão Gênica/genética , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Doença de Alzheimer/etnologia , Idoso , Pessoa de Meia-Idade

7.

Sex affects transcriptional associations with schizophrenia across the dorsolateral prefrontal cortex, hippocampus, and caudate nucleus.

Benjamin, Kynon J M; Arora, Ria; Feltrin, Arthur S; Pertea, Geo; Giles, Hunter H; Stolz, Joshua M; D'Ignazio, Laura; Collado-Torres, Leonardo; Shin, Joo Heon; Ulrich, William S; Hyde, Thomas M; Kleinman, Joel E; Weinberger, Daniel R; Paquola, Apuã C M; Erwin, Jennifer A.

Nat Commun ; 15(1): 3980, 2024 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-38730231

RESUMO

Schizophrenia is a complex neuropsychiatric disorder with sexually dimorphic features, including differential symptomatology, drug responsiveness, and male incidence rate. Prior large-scale transcriptome analyses for sex differences in schizophrenia have focused on the prefrontal cortex. Analyzing BrainSeq Consortium data (caudate nucleus: n = 399, dorsolateral prefrontal cortex: n = 377, and hippocampus: n = 394), we identified 831 unique genes that exhibit sex differences across brain regions, enriched for immune-related pathways. We observed X-chromosome dosage reduction in the hippocampus of male individuals with schizophrenia. Our sex interaction model revealed 148 junctions dysregulated in a sex-specific manner in schizophrenia. Sex-specific schizophrenia analysis identified dozens of differentially expressed genes, notably enriched in immune-related pathways. Finally, our sex-interacting expression quantitative trait loci analysis revealed 704 unique genes, nine associated with schizophrenia risk. These findings emphasize the importance of sex-informed analysis of sexually dimorphic traits, inform personalized therapeutic strategies in schizophrenia, and highlight the need for increased female samples for schizophrenia analyses.

Assuntos

Núcleo Caudado , Córtex Pré-Frontal Dorsolateral , Hipocampo , Locos de Características Quantitativas , Esquizofrenia , Caracteres Sexuais , Humanos , Esquizofrenia/genética , Esquizofrenia/metabolismo , Feminino , Masculino , Hipocampo/metabolismo , Núcleo Caudado/metabolismo , Córtex Pré-Frontal Dorsolateral/metabolismo , Adulto , Transcriptoma , Perfilação da Expressão Gênica , Fatores Sexuais , Cromossomos Humanos X/genética , Córtex Pré-Frontal/metabolismo

8.

Systems biology dissection of PTSD and MDD across brain regions, cell types, and blood.

Daskalakis, Nikolaos P; Iatrou, Artemis; Chatzinakos, Chris; Jajoo, Aarti; Snijders, Clara; Wylie, Dennis; DiPietro, Christopher P; Tsatsani, Ioulia; Chen, Chia-Yen; Pernia, Cameron D; Soliva-Estruch, Marina; Arasappan, Dhivya; Bharadwaj, Rahul A; Collado-Torres, Leonardo; Wuchty, Stefan; Alvarez, Victor E; Dammer, Eric B; Deep-Soboslay, Amy; Duong, Duc M; Eagles, Nick; Huber, Bertrand R; Huuki, Louise; Holstein, Vincent L; Logue, Mark W; Lugenbühl, Justina F; Maihofer, Adam X; Miller, Mark W; Nievergelt, Caroline M; Pertea, Geo; Ross, Deanna; Sendi, Mohammad S E; Sun, Benjamin B; Tao, Ran; Tooke, James; Wolf, Erika J; Zeier, Zane; Berretta, Sabina; Champagne, Frances A; Hyde, Thomas; Seyfried, Nicholas T; Shin, Joo Heon; Weinberger, Daniel R; Nemeroff, Charles B; Kleinman, Joel E; Ressler, Kerry J.

Science ; 384(6698): eadh3707, 2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38781393

RESUMO

The molecular pathology of stress-related disorders remains elusive. Our brain multiregion, multiomic study of posttraumatic stress disorder (PTSD) and major depressive disorder (MDD) included the central nucleus of the amygdala, hippocampal dentate gyrus, and medial prefrontal cortex (mPFC). Genes and exons within the mPFC carried most disease signals replicated across two independent cohorts. Pathways pointed to immune function, neuronal and synaptic regulation, and stress hormones. Multiomic factor and gene network analyses provided the underlying genomic structure. Single nucleus RNA sequencing in dorsolateral PFC revealed dysregulated (stress-related) signals in neuronal and non-neuronal cell types. Analyses of brain-blood intersections in >50,000 UK Biobank participants were conducted along with fine-mapping of the results of PTSD and MDD genome-wide association studies to distinguish risk from disease processes. Our data suggest shared and distinct molecular pathology in both disorders and propose potential therapeutic targets and biomarkers.

Assuntos

Encéfalo , Transtorno Depressivo Maior , Loci Gênicos , Transtornos de Estresse Pós-Traumáticos , Feminino , Humanos , Masculino , Tonsila do Cerebelo/metabolismo , Biomarcadores/metabolismo , Encéfalo/metabolismo , Transtorno Depressivo Maior/genética , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Neurônios/metabolismo , Córtex Pré-Frontal/metabolismo , Transtornos de Estresse Pós-Traumáticos/genética , Biologia de Sistemas , Análise da Expressão Gênica de Célula Única , Mapeamento Cromossômico

9.

Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

Dalloul, Rami A; Long, Julie A; Zimin, Aleksey V; Aslam, Luqman; Beal, Kathryn; Blomberg, Le Ann; Bouffard, Pascal; Burt, David W; Crasta, Oswald; Crooijmans, Richard P M A; Cooper, Kristal; Coulombe, Roger A; De, Supriyo; Delany, Mary E; Dodgson, Jerry B; Dong, Jennifer J; Evans, Clive; Frederickson, Karin M; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A M; Harkins, Tim T; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R; Payne, William S; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali.

PLoS Biol ; 8(9)2010 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-20838655

RESUMO

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (â¼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.

Assuntos

Genoma , Perus/genética , Animais , Sequência de Bases , Mapeamento Cromossômico , DNA/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie

10.

Genetic and environmental contributions to ancestry differences in gene expression in the human brain.

Benjamin, Kynon J M; Chen, Qiang; Eagles, Nicholas J; Huuki-Myers, Louise A; Collado-Torres, Leonardo; Stolz, Joshua M; Pertea, Geo; Shin, Joo Heon; Paquola, Apuã C M; Hyde, Thomas M; Kleinman, Joel E; Jaffe, Andrew E; Han, Shizhong; Weinberger, Daniel R.

bioRxiv ; 2023 Oct 05.

Artigo em Inglês | MEDLINE | ID: mdl-37034760

RESUMO

Ancestral differences in genomic variation are determining factors in gene regulation; however, most gene expression studies have been limited to European ancestry samples or adjusted for ancestry to identify ancestry-independent associations. We instead examined the impact of genetic ancestry on gene expression and DNA methylation (DNAm) in admixed African/Black American neurotypical individuals to untangle effects of genetic and environmental factors. Ancestry-associated differentially expressed genes (DEGs), transcripts, and gene networks, while notably not implicating neurons, are enriched for genes related to immune response and vascular tissue and explain up to 26% of heritability for ischemic stroke, 27% of heritability for Parkinson's disease, and 30% of heritability for Alzhemier's disease. Ancestry-associated DEGs also show general enrichment for heritability of diverse immune-related traits but depletion for psychiatric-related traits. The cell-type enrichments and direction of effects vary by brain region. These DEGs are less evolutionarily constrained and are largely explained by genetic variations; roughly 15% are predicted by DNAm variation implicating environmental exposures. We also compared Black and White Americans, confirming most of these ancestry-associated DEGs. Our results highlight how environment and genetic background affect genetic ancestry differences in gene expression in the human brain and affect risk for brain illness.

11.

Detection of lineage-specific evolutionary changes among primate species.

Pertea, Mihaela; Pertea, Geo M; Salzberg, Steven L.

BMC Bioinformatics ; 12: 274, 2011 Jul 04.

Artigo em Inglês | MEDLINE | ID: mdl-21726447

RESUMO

BACKGROUND: Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection. RESULTS: We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection. CONCLUSIONS: DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.

Assuntos

Filogenia , Primatas/genética , Software , Animais , Evolução Biológica , Genoma , Genoma Humano , Humanos

12.

GFF Utilities: GffRead and GffCompare.

Pertea, Geo; Pertea, Mihaela.

F1000Res ; 92020.

Artigo em Inglês | MEDLINE | ID: mdl-32489650

RESUMO

GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license ( https://github.com/gpertea/gffread, https://github.com/gpertea/gffcompare).

Assuntos

Biologia Computacional , Genômica , Software , Genoma , Anotação de Sequência Molecular

13.

Elevated expression of protein biosynthesis genes in liver and muscle of hibernating black bears (Ursus americanus).

Fedorov, Vadim B; Goropashnaya, Anna V; Tøien, Øivind; Stewart, Nathan C; Gracey, Andrew Y; Chang, Celia; Qin, Shizhen; Pertea, Geo; Quackenbush, John; Showe, Louise C; Showe, Michael K; Boyer, Bert B; Barnes, Brian M.

Physiol Genomics ; 37(2): 108-18, 2009 04 10.

Artigo em Inglês | MEDLINE | ID: mdl-19240299

RESUMO

We conducted a large-scale gene expression screen using the 3,200 cDNA probe microarray developed specifically for Ursus americanus to detect expression differences in liver and skeletal muscle that occur during winter hibernation compared with animals sampled during summer. The expression of 12 genes, including RNA binding protein motif 3 (Rbm3), that are mostly involved in protein biosynthesis, was induced during hibernation in both liver and muscle. The Gene Ontology and Gene Set Enrichment analysis consistently showed a highly significant enrichment of the protein biosynthesis category by overexpressed genes in both liver and skeletal muscle during hibernation. Coordinated induction in transcriptional level of genes involved in protein biosynthesis is a distinctive feature of the transcriptome in hibernating black bears. This finding implies induction of translation and suggests an adaptive mechanism that contributes to a unique ability to reduce muscle atrophy over prolonged periods of immobility during hibernation. Comparing expression profiles in bears to small mammalian hibernators shows a general trend during hibernation of transcriptional changes that include induction of genes involved in lipid metabolism and carbohydrate synthesis as well as depression of genes involved in the urea cycle and detoxification function in liver.

Assuntos

Perfilação da Expressão Gênica , Hibernação/genética , Fígado/metabolismo , Músculo Esquelético/metabolismo , Biossíntese de Proteínas/genética , Ursidae/genética , Animais , Metabolismo Basal , Temperatura Corporal , Biblioteca Gênica , Genômica/métodos , Masculino , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Ursidae/metabolismo , Ursidae/fisiologia

14.

Transcriptome assembly from long-read RNA-seq alignments with StringTie2.

Kovaka, Sam; Zimin, Aleksey V; Pertea, Geo M; Razaghi, Roham; Salzberg, Steven L; Pertea, Mihaela.

Genome Biol ; 20(1): 278, 2019 12 16.

Artigo em Inglês | MEDLINE | ID: mdl-31842956

RESUMO

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.

Assuntos

Técnicas Genéticas , Genômica/métodos , Transcriptoma , Animais , Arabidopsis , Humanos , Análise de Sequência de RNA , Software , Zea mays

15.

The TIGR Maize Database.

Chan, Agnes P; Pertea, Geo; Cheung, Foo; Lee, Dan; Zheng, Li; Whitelaw, Cathy; Pontaroli, Ana C; SanMiguel, Phillip; Yuan, Yinan; Bennetzen, Jeffrey; Barbazuk, William Brad; Quackenbush, John; Rabinowicz, Pablo D.

Nucleic Acids Res ; 34(Database issue): D771-6, 2006 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-16381977

RESUMO

Maize is a staple crop of the grass family and also an excellent model for plant genetics. Owing to the large size and repetitiveness of its genome, we previously investigated two approaches to accelerate gene discovery and genome analysis in maize: methylation filtration and high C(0)t selection. These techniques allow the construction of gene-enriched genomic libraries by minimizing repeat sequences due to either their methylation status or their copy number, yielding a 7-fold enrichment in genic sequences relative to a random genomic library. Approximately 900,000 gene-enriched reads from maize were generated and clustered into Assembled Zea mays (AZM) sequences. Here we report the current AZM release, which consists of approximately 298 Mb representing 243,807 sequence assemblies and singletons. In order to provide a repository of publicly available maize genomic sequences, we have created the TIGR Maize Database (http://maize.tigr.org). In this resource, we have assembled and annotated the AZMs and used available sequenced markers to anchor AZMs to maize chromosomes. We have constructed a maize repeat database and generated draft sequence assemblies of 287 maize bacterial artificial chromosome (BAC) clone sequences, which we annotated along with 172 additional publicly available BAC clones. All sequences, assemblies and annotations are available at the project website via web interfaces and FTP downloads.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genoma de Planta , Zea mays/genética , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Genes de Plantas , Biblioteca Genômica , Genômica , Internet , Sequências Repetitivas de Ácido Nucleico , Interface Usuário-Computador

16.

CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise.

Pertea, Mihaela; Shumate, Alaina; Pertea, Geo; Varabyou, Ales; Breitwieser, Florian P; Chang, Yu-Chi; Madugundu, Anil K; Pandey, Akhilesh; Salzberg, Steven L.

Genome Biol ; 19(1): 208, 2018 11 28.

Artigo em Inglês | MEDLINE | ID: mdl-30486838

RESUMO

We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .

Assuntos

Bases de Dados Genéticas , Análise de Sequência de RNA , Transcrição Gênica , Sequência de Aminoácidos , Animais , Feminino , Humanos , Íntrons , Masculino

17.

Global comparative analysis of ESTs from the southern cattle tick, Rhipicephalus (Boophilus) microplus.

Wang, Minghua; Guerrero, Felix D; Pertea, Geo; Nene, Vishvanath M.

BMC Genomics ; 8: 368, 2007 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-17935616

RESUMO

BACKGROUND: The southern cattle tick, Rhipicephalus (Boophilus) microplus, is an economically important parasite of cattle and can transmit several pathogenic microorganisms to its cattle host during the feeding process. Understanding the biology and genomics of R. microplus is critical to developing novel methods for controlling these ticks. RESULTS: We present a global comparative genomic analysis of a gene index of R. microplus comprised of 13,643 unique transcripts assembled from 42,512 expressed sequence tags (ESTs), a significant fraction of the complement of R. microplus genes. The source material for these ESTs consisted of polyA RNA from various tissues, lifestages, and strains of R. microplus, including larvae exposed to heat, cold, host odor, and acaricide. Functional annotation using RPS-Blast analysis identified conserved protein domains in the conceptually translated gene index and assigned GO terms to those database transcripts which had informative BlastX hits. Blast Score Ratio and SimiTri analysis compared the conceptual transcriptome of the R. microplus database to other eukaryotic proteomes and EST databases, including those from 3 ticks. The most abundant protein domains in BmiGI were also analyzed by SimiTri methodology. CONCLUSION: These results indicate that a large fraction of BmiGI entries have no homologs in other sequenced genomes. Analysis with the PartiGene annotation pipeline showed 64% of the members of BmiGI could not be assigned GO annotation, thus minimal information is available about a significant fraction of the tick genome. This highlights the important insights in tick biology which are likely to result from a tick genome sequencing project. Global comparative analysis identified some tick genes with unexpected phylogenetic relationships which detailed analysis attributed to gene losses in some members of the animal kingdom. Some tick genes were identified which had close orthologues to mammalian genes. Members of this group would likely be poor choices as targets for development of novel tick control technology.

Assuntos

Etiquetas de Sequências Expressas , Genômica , Rhipicephalus/genética , Animais , Bovinos , Biblioteca Gênica , Filogenia

18.

First Draft Genome Sequence of the Pathogenic Fungus Lomentospora prolificans (Formerly Scedosporium prolificans).

Luo, Ruibang; Zimin, Aleksey; Workman, Rachael; Fan, Yunfan; Pertea, Geo; Grossman, Nina; Wear, Maggie P; Jia, Bei; Miller, Heather; Casadevall, Arturo; Timp, Winston; Zhang, Sean X; Salzberg, Steven L.

G3 (Bethesda) ; 7(11): 3831-3836, 2017 11 06.

Artigo em Inglês | MEDLINE | ID: mdl-28963165

RESUMO

Here we describe the sequencing and assembly of the pathogenic fungus Lomentospora prolificans using a combination of short, highly accurate Illumina reads and additional coverage in very long Oxford Nanopore reads. The resulting assembly is highly contiguous, containing a total of 37,627,092 bp with over 98% of the sequence in just 26 scaffolds. Annotation identified 8896 protein-coding genes. Pulsed-field gel analysis suggests that this organism contains at least 7 and possibly 11 chromosomes, the two longest of which have sizes corresponding closely to the sizes of the longest scaffolds, at 6.6 and 5.7 Mb.

Assuntos

Genoma Fúngico , Anotação de Sequência Molecular , Scedosporium/genética , Proteínas Fúngicas/genética , Sequenciamento Completo do Genoma

19.

The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.

Neale, David B; McGuire, Patrick E; Wheeler, Nicholas C; Stevens, Kristian A; Crepeau, Marc W; Cardeno, Charis; Zimin, Aleksey V; Puiu, Daniela; Pertea, Geo M; Sezen, U Uzay; Casola, Claudio; Koralewski, Tomasz E; Paul, Robin; Gonzalez-Ibeas, Daniel; Zaman, Sumaira; Cronn, Richard; Yandell, Mark; Holt, Carson; Langley, Charles H; Yorke, James A; Salzberg, Steven L; Wegrzyn, Jill L.

G3 (Bethesda) ; 7(9): 3157-3167, 2017 09 07.

Artigo em Inglês | MEDLINE | ID: mdl-28751502

RESUMO

A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms.

Assuntos

Genoma de Planta , Fotossíntese/genética , Pinaceae/genética , Pinaceae/metabolismo , Pseudotsuga/genética , Pseudotsuga/metabolismo , Sequenciamento Completo do Genoma , Adaptação Biológica/genética , Biologia Computacional , Evolução Molecular , Duplicação Gênica , Redes Reguladoras de Genes , Genômica , Anotação de Sequência Molecular , Família Multigênica , Filogenia , Pinaceae/classificação , Proteômica/métodos , Pseudotsuga/classificação , Sequências Repetitivas de Ácido Nucleico

20.

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.

Pertea, Mihaela; Kim, Daehwan; Pertea, Geo M; Leek, Jeffrey T; Salzberg, Steven L.

Nat Protoc ; 11(9): 1650-67, 2016 09.

Artigo em Inglês | MEDLINE | ID: mdl-27560171

RESUMO

High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

Assuntos

Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Software , Estatística como Assunto/métodos , Anotação de Sequência Molecular , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA