Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Nat Commun ; 15(1): 3980, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38730231

RESUMEN

Schizophrenia is a complex neuropsychiatric disorder with sexually dimorphic features, including differential symptomatology, drug responsiveness, and male incidence rate. Prior large-scale transcriptome analyses for sex differences in schizophrenia have focused on the prefrontal cortex. Analyzing BrainSeq Consortium data (caudate nucleus: n = 399, dorsolateral prefrontal cortex: n = 377, and hippocampus: n = 394), we identified 831 unique genes that exhibit sex differences across brain regions, enriched for immune-related pathways. We observed X-chromosome dosage reduction in the hippocampus of male individuals with schizophrenia. Our sex interaction model revealed 148 junctions dysregulated in a sex-specific manner in schizophrenia. Sex-specific schizophrenia analysis identified dozens of differentially expressed genes, notably enriched in immune-related pathways. Finally, our sex-interacting expression quantitative trait loci analysis revealed 704 unique genes, nine associated with schizophrenia risk. These findings emphasize the importance of sex-informed analysis of sexually dimorphic traits, inform personalized therapeutic strategies in schizophrenia, and highlight the need for increased female samples for schizophrenia analyses.


Asunto(s)
Núcleo Caudado , Corteza Prefontal Dorsolateral , Hipocampo , Sitios de Carácter Cuantitativo , Esquizofrenia , Caracteres Sexuales , Humanos , Esquizofrenia/genética , Esquizofrenia/metabolismo , Femenino , Masculino , Hipocampo/metabolismo , Núcleo Caudado/metabolismo , Corteza Prefontal Dorsolateral/metabolismo , Adulto , Transcriptoma , Perfilación de la Expresión Génica , Factores Sexuales , Cromosomas Humanos X/genética , Corteza Prefrontal/metabolismo
2.
Science ; 384(6698): eadh3707, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38781393

RESUMEN

The molecular pathology of stress-related disorders remains elusive. Our brain multiregion, multiomic study of posttraumatic stress disorder (PTSD) and major depressive disorder (MDD) included the central nucleus of the amygdala, hippocampal dentate gyrus, and medial prefrontal cortex (mPFC). Genes and exons within the mPFC carried most disease signals replicated across two independent cohorts. Pathways pointed to immune function, neuronal and synaptic regulation, and stress hormones. Multiomic factor and gene network analyses provided the underlying genomic structure. Single nucleus RNA sequencing in dorsolateral PFC revealed dysregulated (stress-related) signals in neuronal and non-neuronal cell types. Analyses of brain-blood intersections in >50,000 UK Biobank participants were conducted along with fine-mapping of the results of PTSD and MDD genome-wide association studies to distinguish risk from disease processes. Our data suggest shared and distinct molecular pathology in both disorders and propose potential therapeutic targets and biomarkers.


Asunto(s)
Encéfalo , Trastorno Depresivo Mayor , Sitios Genéticos , Trastornos por Estrés Postraumático , Femenino , Humanos , Masculino , Amígdala del Cerebelo/metabolismo , Biomarcadores/metabolismo , Encéfalo/metabolismo , Trastorno Depresivo Mayor/genética , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Neuronas/metabolismo , Corteza Prefrontal/metabolismo , Trastornos por Estrés Postraumático/genética , Biología de Sistemas , Análisis de Expresión Génica de una Sola Célula , Mapeo Cromosómico
3.
Nat Neurosci ; 27(6): 1064-1074, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38769152

RESUMEN

Ancestral differences in genomic variation affect the regulation of gene expression; however, most gene expression studies have been limited to European ancestry samples or adjusted to identify ancestry-independent associations. Here, we instead examined the impact of genetic ancestry on gene expression and DNA methylation in the postmortem brain tissue of admixed Black American neurotypical individuals to identify ancestry-dependent and ancestry-independent contributions. Ancestry-associated differentially expressed genes (DEGs), transcripts and gene networks, while notably not implicating neurons, are enriched for genes related to the immune response and vascular tissue and explain up to 26% of heritability for ischemic stroke, 27% of heritability for Parkinson disease and 30% of heritability for Alzheimer's disease. Ancestry-associated DEGs also show general enrichment for the heritability of diverse immune-related traits but depletion for psychiatric-related traits. We also compared Black and non-Hispanic white Americans, confirming most ancestry-associated DEGs. Our results delineate the extent to which genetic ancestry affects differences in gene expression in the human brain and the implications for brain illness risk.


Asunto(s)
Negro o Afroamericano , Encéfalo , Metilación de ADN , Humanos , Negro o Afroamericano/genética , Encéfalo/metabolismo , Femenino , Masculino , Población Blanca/genética , Autopsia , Expresión Génica/genética , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Enfermedad de Alzheimer/etnología , Anciano , Persona de Mediana Edad
4.
bioRxiv ; 2023 Oct 05.
Artículo en Inglés | MEDLINE | ID: mdl-37034760

RESUMEN

Ancestral differences in genomic variation are determining factors in gene regulation; however, most gene expression studies have been limited to European ancestry samples or adjusted for ancestry to identify ancestry-independent associations. We instead examined the impact of genetic ancestry on gene expression and DNA methylation (DNAm) in admixed African/Black American neurotypical individuals to untangle effects of genetic and environmental factors. Ancestry-associated differentially expressed genes (DEGs), transcripts, and gene networks, while notably not implicating neurons, are enriched for genes related to immune response and vascular tissue and explain up to 26% of heritability for ischemic stroke, 27% of heritability for Parkinson's disease, and 30% of heritability for Alzhemier's disease. Ancestry-associated DEGs also show general enrichment for heritability of diverse immune-related traits but depletion for psychiatric-related traits. The cell-type enrichments and direction of effects vary by brain region. These DEGs are less evolutionarily constrained and are largely explained by genetic variations; roughly 15% are predicted by DNAm variation implicating environmental exposures. We also compared Black and White Americans, confirming most of these ancestry-associated DEGs. Our results highlight how environment and genetic background affect genetic ancestry differences in gene expression in the human brain and affect risk for brain illness.

5.
PLoS Comput Biol ; 18(6): e1009730, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35648784

RESUMEN

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Transcriptoma , Algoritmos , Animales , Exones , Humanos , Ratones , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN , Programas Informáticos , Transcriptoma/genética
6.
Bioinformatics ; 37(20): 3650-3651, 2021 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-33964128

RESUMEN

SUMMARY: Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual comparison of transcriptional landscapes between two sets of thousands of RNA-seq samples is limited by available computational resources, which can be overwhelmed due to the sheer size of the data. In this work, we present TieBrush, a software package designed to process very large sequencing datasets (RNA, whole-genome, exome, etc.) into a form that enables quick visual and computational inspection. TieBrush can also be used as a method for aggregating data for downstream computational analysis, and is compatible with most software tools that take aligned reads as input. AVAILABILITY AND IMPLEMENTATION: TieBrush is provided as a C++ package under the MIT License. Precompiled binaries, source code and example data are available on GitHub (https://github.com/alevar/tiebrush). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
F1000Res ; 92020.
Artículo en Inglés | MEDLINE | ID: mdl-32489650

RESUMEN

GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license  ( https://github.com/gpertea/gffread, https://github.com/gpertea/gffcompare).


Asunto(s)
Biología Computacional , Genómica , Programas Informáticos , Genoma , Anotación de Secuencia Molecular
8.
PLoS Genet ; 16(1): e1008571, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31986137

RESUMEN

Long-read sequencing facilitates assembly of complex genomic regions. In plants, loci containing nucleotide-binding, leucine-rich repeat (NLR) disease resistance genes are an important example of such regions. NLR genes constitute one of the largest gene families in plants and are often clustered, evolving via duplication, contraction, and transposition. We recently mapped the Xo1 locus for resistance to bacterial blight and bacterial leaf streak, found in the American heirloom rice variety Carolina Gold Select, to a region that in the Nipponbare reference genome is NLR gene-rich. Here, toward identification of the Xo1 gene, we combined Nanopore and Illumina reads and generated a high-quality Carolina Gold Select genome assembly. We identified 529 complete or partial NLR genes and discovered, relative to Nipponbare, an expansion of NLR genes at the Xo1 locus. One of these has high sequence similarity to the cloned, functionally similar Xa1 gene. Both harbor an integrated zfBED domain, and the repeats within each protein are nearly perfect. Across diverse Oryzeae, we identified two sub-clades of NLR genes with these features, varying in the presence of the zfBED domain and the number of repeats. The Carolina Gold Select genome assembly also uncovered at the Xo1 locus a rice blast resistance gene and a gene encoding a polyphenol oxidase (PPO). PPO activity has been used as a marker for blast resistance at the locus in some varieties; however, the Carolina Gold Select sequence revealed a loss-of-function mutation in the PPO gene that breaks this association. Our results demonstrate that whole genome sequencing combining Nanopore and Illumina reads effectively resolves NLR gene loci. Our identification of an Xo1 candidate is an important step toward mechanistic characterization, including the role(s) of the zfBED domain. Finally, the Carolina Gold Select genome assembly will facilitate identification of other useful traits in this historically important variety.


Asunto(s)
Resistencia a la Enfermedad , Proteínas NLR/genética , Oryza/genética , Proteínas de Plantas/genética , Anotación de Secuencia Molecular , Proteínas NLR/química , Proteínas NLR/metabolismo , Secuenciación de Nanoporos/métodos , Oryza/inmunología , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Secuenciación Completa del Genoma/métodos , Dedos de Zinc
9.
Genome Biol ; 20(1): 278, 2019 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-31842956

RESUMEN

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.


Asunto(s)
Técnicas Genéticas , Genómica/métodos , Transcriptoma , Animales , Arabidopsis , Humanos , Análisis de Secuencia de ARN , Programas Informáticos , Zea mays
10.
Artículo en Inglés | MEDLINE | ID: mdl-30373801

RESUMEN

Standard antimicrobial susceptibility testing (AST) approaches lead to delays in the selection of optimal antimicrobial therapy. Here, we sought to determine the accuracy of antimicrobial resistance (AMR) determinants identified by Nanopore whole-genome sequencing in predicting AST results. Using a cohort of 40 clinical isolates (21 carbapenemase-producing carbapenem-resistant Klebsiella pneumoniae, 10 non-carbapenemase-producing carbapenem-resistant K. pneumoniae, and 9 carbapenem-susceptible K. pneumoniae isolates), three separate sequencing and analysis pipelines were performed, as follows: (i) a real-time Nanopore analysis approach identifying acquired AMR genes, (ii) an assembly-based Nanopore approach identifying acquired AMR genes and chromosomal mutations, and (iii) an approach using short-read correction of Nanopore assemblies. The short-read correction of Nanopore assemblies served as the reference standard to determine the accuracy of Nanopore sequencing results. With the real-time analysis approach, full annotation of acquired AMR genes occurred within 8 h from subcultured isolates. Assemblies sufficient for full resistance gene and single-nucleotide polymorphism annotation were available within 14 h from subcultured isolates. The overall agreement of genotypic results and anticipated AST results for the 40 K. pneumoniae isolates was 77% (range, 30% to 100%) and 92% (range, 80% to 100%) for the real-time approach and the assembly approach, respectively. Evaluating the patients contributing the 40 isolates, the real-time approach and assembly approach could shorten the median time to effective antibiotic therapy by 20 h and 26 h, respectively, compared to standard AST. Nanopore sequencing offers a rapid approach to both accurately identify resistance mechanisms and to predict AST results for K. pneumoniae isolates. Bioinformatics improvements enabling real-time alignment, coupled with rapid extraction and library preparation, will further enhance the accuracy and workflow of the Nanopore real-time approach.


Asunto(s)
Proteínas Bacterianas/genética , Farmacorresistencia Bacteriana Múltiple/genética , Genoma Bacteriano , Klebsiella pneumoniae/genética , Fenotipo , Secuenciación Completa del Genoma/métodos , beta-Lactamasas/genética , Antibacterianos/metabolismo , Antibacterianos/farmacología , Proteínas Bacterianas/metabolismo , Carbapenémicos/metabolismo , Carbapenémicos/farmacología , Estudios de Cohortes , Biología Computacional/métodos , Expresión Génica , Biblioteca de Genes , Humanos , Infecciones por Klebsiella/tratamiento farmacológico , Infecciones por Klebsiella/microbiología , Klebsiella pneumoniae/efectos de los fármacos , Klebsiella pneumoniae/enzimología , Klebsiella pneumoniae/aislamiento & purificación , Pruebas de Sensibilidad Microbiana , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma/instrumentación , beta-Lactamasas/metabolismo
11.
Genome Biol ; 19(1): 208, 2018 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-30486838

RESUMEN

We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .


Asunto(s)
Bases de Datos Genéticas , Análisis de Secuencia de ARN , Transcripción Genética , Secuencia de Aminoácidos , Animales , Femenino , Humanos , Intrones , Masculino
12.
Nature ; 551(7681): 498-502, 2017 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-29143815

RESUMEN

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.


Asunto(s)
Genoma de Planta , Filogenia , Poaceae/genética , Triticum/genética , Mapeo Cromosómico , Diploidia , Evolución Molecular , Duplicación de Gen , Genes de Plantas/genética , Genómica/normas , Poaceae/clasificación , Recombinación Genética/genética , Análisis de Secuencia de ADN/normas , Triticum/clasificación
13.
G3 (Bethesda) ; 7(11): 3831-3836, 2017 11 06.
Artículo en Inglés | MEDLINE | ID: mdl-28963165

RESUMEN

Here we describe the sequencing and assembly of the pathogenic fungus Lomentospora prolificans using a combination of short, highly accurate Illumina reads and additional coverage in very long Oxford Nanopore reads. The resulting assembly is highly contiguous, containing a total of 37,627,092 bp with over 98% of the sequence in just 26 scaffolds. Annotation identified 8896 protein-coding genes. Pulsed-field gel analysis suggests that this organism contains at least 7 and possibly 11 chromosomes, the two longest of which have sizes corresponding closely to the sizes of the longest scaffolds, at 6.6 and 5.7 Mb.


Asunto(s)
Genoma Fúngico , Anotación de Secuencia Molecular , Scedosporium/genética , Proteínas Fúngicas/genética , Secuenciación Completa del Genoma
14.
G3 (Bethesda) ; 7(9): 3157-3167, 2017 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-28751502

RESUMEN

A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms.


Asunto(s)
Genoma de Planta , Fotosíntesis/genética , Pinaceae/genética , Pinaceae/metabolismo , Pseudotsuga/genética , Pseudotsuga/metabolismo , Secuenciación Completa del Genoma , Adaptación Biológica/genética , Biología Computacional , Evolución Molecular , Duplicación de Gen , Redes Reguladoras de Genes , Genómica , Anotación de Secuencia Molecular , Familia de Multigenes , Filogenia , Pinaceae/clasificación , Proteómica/métodos , Pseudotsuga/clasificación , Secuencias Repetitivas de Ácidos Nucleicos
15.
Nat Protoc ; 11(9): 1650-67, 2016 09.
Artículo en Inglés | MEDLINE | ID: mdl-27560171

RESUMEN

High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Estadística como Asunto/métodos , Anotación de Secuencia Molecular , ARN Mensajero/genética , ARN Mensajero/metabolismo , Interfaz Usuario-Computador
17.
Nat Biotechnol ; 33(3): 290-5, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25690850

RESUMEN

Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.


Asunto(s)
Análisis de Secuencia de ARN/métodos , Programas Informáticos , Transcriptoma/genética , Algoritmos , Células HEK293 , Humanos , ARN Mensajero/genética , ARN Mensajero/metabolismo
18.
Genome Biol ; 14(4): R36, 2013 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-23618408

RESUMEN

TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.


Asunto(s)
Duplicación de Gen , Fusión Génica , Mutagénesis Insercional , Alineación de Secuencia/métodos , Programas Informáticos , Humanos , Sensibilidad y Especificidad , Análisis de Secuencia de ARN/métodos , Transcriptoma
19.
Nat Protoc ; 7(3): 562-78, 2012 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-22383036

RESUMEN

Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.


Asunto(s)
ADN Complementario/genética , Perfilación de la Expresión Génica/métodos , Estudios de Asociación Genética/métodos , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos
20.
BMC Bioinformatics ; 12: 274, 2011 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-21726447

RESUMEN

BACKGROUND: Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection. RESULTS: We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection. CONCLUSIONS: DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.


Asunto(s)
Filogenia , Primates/genética , Programas Informáticos , Animales , Evolución Biológica , Genoma , Genoma Humano , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...