Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Genet ; 21(1): 25, 2020 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-32138667

RESUMO

BACKGROUND: POLG, located on nuclear chromosome 15, encodes the DNA polymerase γ(Pol γ). Pol γ is responsible for the replication and repair of mitochondrial DNA (mtDNA). Pol γ is the only DNA polymerase found in mitochondria for most animal cells. Mutations in POLG are the most common single-gene cause of diseases of mitochondria and have been mapped over the coding region of the POLG ORF. RESULTS: Using PhyloCSF to survey alternative reading frames, we found a conserved coding signature in an alternative frame in exons 2 and 3 of POLG, herein referred to as ORF-Y that arose de novo in placental mammals. Using the synplot2 program, synonymous site conservation was found among mammals in the region of the POLG ORF that is overlapped by ORF-Y. Ribosome profiling data revealed that ORF-Y is translated and that initiation likely occurs at a CUG codon. Inspection of an alignment of mammalian sequences containing ORF-Y revealed that the CUG codon has a strong initiation context and that a well-conserved predicted RNA stem-loop begins 14 nucleotides downstream. Such features are associated with enhanced initiation at near-cognate non-AUG codons. Reanalysis of the Kim et al. (2014) draft human proteome dataset yielded two unique peptides that map unambiguously to ORF-Y. An additional conserved uORF, herein referred to as ORF-Z, was also found in exon 2 of POLG. Lastly, we surveyed Clinvar variants that are synonymous with respect to the POLG ORF and found that most of these variants cause amino acid changes in ORF-Y or ORF-Z. CONCLUSIONS: We provide evidence for a novel coding sequence, ORF-Y, that overlaps the POLG ORF. Ribosome profiling and mass spectrometry data show that ORF-Y is expressed. PhyloCSF and synplot2 analysis show that ORF-Y is subject to strong purifying selection. An abundance of disease-correlated mutations that map to exons 2 and 3 of POLG but also affect ORF-Y provides potential clinical significance to this finding.

2.
Genome Res ; 29(12): 2073-2087, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31537640

RESUMO

The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.

3.
Mol Biol Evol ; 36(10): 2328-2339, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31220870

RESUMO

Because of the degeneracy of the genetic code, multiple codons are translated into the same amino acid. Despite being "synonymous," these codons are not equally used. Selective pressures are thought to drive the choice among synonymous codons within a genome, while GC content, which is typically attributed to mutational drift, is the major determinant of variation across species. Here, we find that in addition to GC content, interspecies codon usage signatures can also be detected. More specifically, we show that a single amino acid, arginine, is the major contributor to codon usage bias differences across domains of life. We then exploit this finding and show that domain-specific codon bias signatures can be used to classify a given sequence into its corresponding domain of life with high accuracy. We then wondered whether the inclusion of codon usage codon autocorrelation patterns, which reflects the nonrandom distribution of codon occurrences throughout a transcript, might improve the classification performance of our algorithm. However, we find that autocorrelation patterns are not domain-specific, and surprisingly, are unrelated to tRNA reusage, in contrast to previous reports. Instead, our results suggest that codon autocorrelation patterns are a by-product of codon optimality throughout a sequence, where highly expressed genes display autocorrelated "optimal" codons, whereas lowly expressed genes display autocorrelated "nonoptimal" codons.


Assuntos
Archaea/genética , Bactérias/genética , Eucariotos/genética , Arginina/genética , Composição de Bases , Humanos , Anotação de Sequência Molecular , RNA de Transferência/metabolismo
4.
Development ; 146(6)2019 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-30923056

RESUMO

Cell type specification during early nervous system development in Drosophila melanogaster requires precise regulation of gene expression in time and space. Resolving the programs driving neurogenesis has been a major challenge owing to the complexity and rapidity with which distinct cell populations arise. To resolve the cell type-specific gene expression dynamics in early nervous system development, we have sequenced the transcriptomes of purified neurogenic cell types across consecutive time points covering crucial events in neurogenesis. The resulting gene expression atlas comprises a detailed resource of global transcriptome dynamics that permits systematic analysis of how cells in the nervous system acquire distinct fates. We resolve known gene expression dynamics and uncover novel expression signatures for hundreds of genes among diverse neurogenic cell types, most of which remain unstudied. We also identified a set of conserved long noncoding RNAs (lncRNAs) that are regulated in a tissue-specific manner and exhibit spatiotemporal expression during neurogenesis with exquisite specificity. lncRNA expression is highly dynamic and demarcates specific subpopulations within neurogenic cell types. Our spatiotemporal transcriptome atlas provides a comprehensive resource for investigating the function of coding genes and noncoding RNAs during crucial stages of early neurogenesis.


Assuntos
Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Sistema Nervoso/embriologia , Neurogênese/genética , RNA Longo não Codificante/genética , Animais , Linhagem da Célula , Drosophila melanogaster/metabolismo , Citometria de Fluxo , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Hibridização in Situ Fluorescente , Neuroglia/fisiologia , Filogenia , Transcriptoma
5.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357393

RESUMO

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

7.
Nucleic Acids Res ; 46(14): 7070-7084, 2018 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-29982784

RESUMO

Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.


Assuntos
Genes , Anticorpos , Variações do Número de Cópias de DNA , Variação Genética , Genoma Humano , Humanos , Anotação de Sequência Molecular , Proteínas/genética , Proteínas/imunologia , Proteínas/metabolismo , Pseudogenes
8.
J Biol Chem ; 293(12): 4434-4444, 2018 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-29386352

RESUMO

Although stop codon readthrough is used extensively by viruses to expand their gene expression, verified instances of mammalian readthrough have only recently been uncovered by systems biology and comparative genomics approaches. Previously, our analysis of conserved protein coding signatures that extend beyond annotated stop codons predicted stop codon readthrough of several mammalian genes, all of which have been validated experimentally. Four mRNAs display highly efficient stop codon readthrough, and these mRNAs have a UGA stop codon immediately followed by CUAG (UGA_CUAG) that is conserved throughout vertebrates. Extending on the identification of this readthrough motif, we here investigated stop codon readthrough, using tissue culture reporter assays, for all previously untested human genes containing UGA_CUAG. The readthrough efficiency of the annotated stop codon for the sequence encoding vitamin D receptor (VDR) was 6.7%. It was the highest of those tested but all showed notable levels of readthrough. The VDR is a member of the nuclear receptor superfamily of ligand-inducible transcription factors, and it binds its major ligand, calcitriol, via its C-terminal ligand-binding domain. Readthrough of the annotated VDR mRNA results in a 67 amino acid-long C-terminal extension that generates a VDR proteoform named VDRx. VDRx may form homodimers and heterodimers with VDR but, compared with VDR, VDRx displayed a reduced transcriptional response to calcitriol even in the presence of its partner retinoid X receptor.


Assuntos
Calcitriol/farmacologia , Agonistas dos Canais de Cálcio/farmacologia , Códon de Terminação , Regulação da Expressão Gênica/efeitos dos fármacos , Biossíntese de Proteínas , RNA Mensageiro/metabolismo , Receptores de Calcitriol/genética , Células HEK293 , Células HeLa , Humanos , Fases de Leitura Aberta , RNA Mensageiro/genética , Receptores de Calcitriol/biossíntese
9.
Mol Biol Evol ; 33(12): 3108-3132, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27604222

RESUMO

Translational stop codon readthrough emerged as a major regulatory mechanism affecting hundreds of genes in animal genomes, based on recent comparative genomics and ribosomal profiling evidence, but its evolutionary properties remain unknown. Here, we leverage comparative genomic evidence across 21 Anopheles mosquitoes to systematically annotate readthrough genes in the malaria vector Anopheles gambiae, and to provide the first study of abundant readthrough evolution, by comparison with 20 Drosophila species. Using improved comparative genomics methods for detecting readthrough, we identify evolutionary signatures of conserved, functional readthrough of 353 stop codons in the malaria vector, Anopheles gambiae, and of 51 additional Drosophila melanogaster stop codons, including several cases of double and triple readthrough and of readthrough of two adjacent stop codons. We find that most differences between the readthrough repertoires of the two species arose from readthrough gain or loss in existing genes, rather than birth of new genes or gene death; that readthrough-associated RNA structures are sometimes gained or lost while readthrough persists; that readthrough is more likely to be lost at TAA and TAG stop codons; and that readthrough is under continued purifying evolutionary selection in mosquito, based on population genetic evidence. We also determine readthrough-associated gene properties that predate readthrough, and identify differences in the characteristic properties of readthrough genes between clades. We estimate more than 600 functional readthrough stop codons in mosquito and 900 in fruit fly, provide evidence of readthrough control of peroxisomal targeting, and refine the phylogenetic extent of abundant readthrough as following divergence from centipede.


Assuntos
Anopheles/genética , Anopheles/metabolismo , Códon de Terminação , Terminação Traducional da Cadeia Peptídica , Animais , Evolução Biológica , Códon , Drosophila melanogaster , Evolução Molecular , Genômica , Fases de Leitura Aberta , Filogenia , Biossíntese de Proteínas , Ribossomos/genética , Ribossomos/metabolismo
10.
Anal Chem ; 88(7): 3967-75, 2016 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-27010111

RESUMO

Computational, genomic, and proteomic approaches have been used to discover nonannotated protein-coding small open reading frames (smORFs). Some novel smORFs have crucial biological roles in cells and organisms, which motivates the search for additional smORFs. Proteomic smORF discovery methods are advantageous because they detect smORF-encoded polypeptides (SEPs) to validate smORF translation and SEP stability. Because SEPs are shorter and less abundant than average proteins, SEP detection using proteomics faces unique challenges. Here, we optimize several steps in the SEP discovery workflow to improve SEP isolation and identification. These changes have led to the detection of several new human SEPs (novel human genes), improved confidence in the SEP assignments, and enabled quantification of SEPs under different cellular conditions. These improvements will allow faster detection and characterization of new SEPs and smORFs.


Assuntos
Fases de Leitura Aberta/genética , Peptídeos/análise , Peptídeos/genética , Células HEK293 , Humanos , Células K562 , Peptídeos/isolamento & purificação , Células Tumorais Cultivadas
11.
Genome Biol ; 16: 38, 2015 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-25853568

RESUMO

BACKGROUND: The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding. RESULTS: We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures. CONCLUSIONS: FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species.


Assuntos
Evolução Molecular , Genoma Viral , Fases de Leitura Aberta/genética , Vírus/genética , Códon/genética , Sequência Conservada , Ebolavirus/genética , Vírus da Hepatite B/genética , Humanos , Vírus Lassa/genética , MicroRNAs/genética , Filogenia , Poliovirus/genética , Alinhamento de Sequência , Mutação Silenciosa/genética , Vírus do Nilo Ocidental/genética
12.
Science ; 347(6217): 1258522, 2015 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-25554792

RESUMO

Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.


Assuntos
Anopheles/genética , Evolução Molecular , Genoma de Inseto , Insetos Vetores/genética , Malária/transmissão , Animais , Anopheles/classificação , Sequência de Bases , Cromossomos de Insetos/genética , Drosophila/genética , Humanos , Insetos Vetores/classificação , Dados de Sequência Molecular , Filogenia , Alinhamento de Sequência
13.
Nucleic Acids Res ; 42(14): 8928-38, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25013167

RESUMO

Stop codon readthrough is used extensively by viruses to expand their gene expression. Until recent discoveries in Drosophila, only a very limited number of readthrough cases in chromosomal genes had been reported. Analysis of conserved protein coding signatures that extend beyond annotated stop codons identified potential stop codon readthrough of four mammalian genes. Here we use a modified targeted bioinformatic approach to identify a further three mammalian readthrough candidates. All seven genes were tested experimentally using reporter constructs transfected into HEK-293T cells. Four displayed efficient stop codon readthrough, and these have UGA immediately followed by CUAG. Comparative genomic analysis revealed that in the four readthrough candidates containing UGA-CUAG, this motif is conserved not only in mammals but throughout vertebrates with the first six of the seven nucleotides being universally conserved. The importance of the CUAG motif was confirmed using a systematic mutagenesis approach. One gene, OPRL1, encoding an opiate receptor, displayed extremely efficient levels of readthrough (∼31%) in HEK-293T cells. Signals both 5' and 3' of the OPRL1 stop codon contribute to this high level of readthrough. The sequence UGA-CUA alone can support 1.5% readthrough, underlying its importance.


Assuntos
Códon de Terminação , Biossíntese de Proteínas , Aminoglicosídeos/farmacologia , Animais , Antibacterianos/farmacologia , Aquaporina 4/genética , Sequência Conservada , Células HEK293 , Humanos , Proteína Quinase 10 Ativada por Mitógeno/genética , Motivos de Nucleotídeos , Filogenia , Biossíntese de Proteínas/efeitos dos fármacos , Receptores Opioides/genética , Receptores Opioides kappa/genética
14.
J Proteome Res ; 13(3): 1757-65, 2014 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-24490786

RESUMO

The existence of nonannotated protein-coding human short open reading frames (sORFs) has been revealed through the direct detection of their sORF-encoded polypeptide (SEP) products. The discovery of novel SEPs increases the size of the genome and the proteome and provides insights into the molecular biology of mammalian cells, such as the prevalent usage of non-AUG start codons. Through modifications of the existing SEP-discovery workflow, we discover an additional 195 SEPs in K562 cells and extend this methodology to identify novel human SEPs in additional cell lines and human tissue for a final tally of 237 new SEPs. These results continue to expand the human genome and proteome and demonstrate that SEPs are a ubiquitous class of nonannotated polypeptides that require further investigation.


Assuntos
Neoplasias da Mama/química , Genoma Humano , Fases de Leitura Aberta , Peptídeos/análise , Proteoma/análise , Neoplasias da Mama/genética , Linhagem Celular , Cromatografia Líquida , Códon de Iniciação/química , Códon de Iniciação/genética , Feminino , Humanos , Células K562 , Peptídeos/química , Biossíntese de Proteínas , Proteoma/química , Espectrometria de Massas em Tandem
15.
PLoS One ; 8(3): e59450, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23544069

RESUMO

Recent analysis of genomic signatures in mammals, flies, and worms indicates that functional translational stop codon readthrough is considerably more abundant in metazoa than previously recognized, but this analysis provides only limited clues about the function or mechanism of readthrough. If an mRNA known to be read through in one species is also read through in another, perhaps these questions can be studied in a simpler setting. With this end in mind, we have investigated whether some of the readthrough genes in human, fly, and worm also exhibit readthrough when expressed in S. cerevisiae. We found that readthrough was highest in a gene with a post-stop hexamer known to trigger readthrough, while other metazoan readthrough genes exhibit borderline readthrough in S. cerevisiae.


Assuntos
Códon de Terminação/genética , Fases de Leitura Aberta/genética , Saccharomyces cerevisiae/genética , Animais , Humanos , Conformação de Ácido Nucleico , RNA Fúngico/química , RNA Fúngico/genética
16.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-21993624

RESUMO

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Assuntos
Evolução Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animais , Doença , Éxons/genética , Genômica , Saúde , Humanos , Anotação de Sequência Molecular , Filogenia , RNA/classificação , RNA/genética , Seleção Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNA
17.
Genome Res ; 21(12): 2096-113, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21994247

RESUMO

While translational stop codon readthrough is often used by viral genomes, it has been observed for only a handful of eukaryotic genes. We previously used comparative genomics evidence to recognize protein-coding regions in 12 species of Drosophila and showed that for 149 genes, the open reading frame following the stop codon has a protein-coding conservation signature, hinting that stop codon readthrough might be common in Drosophila. We return to this observation armed with deep RNA sequence data from the modENCODE project, an improved higher-resolution comparative genomics metric for detecting protein-coding regions, comparative sequence information from additional species, and directed experimental evidence. We report an expanded set of 283 readthrough candidates, including 16 double-readthrough candidates; these were manually curated to rule out alternatives such as A-to-I editing, alternative splicing, dicistronic translation, and selenocysteine incorporation. We report experimental evidence of translation using GFP tagging and mass spectrometry for several readthrough regions. We find that the set of readthrough candidates differs from other genes in length, composition, conservation, stop codon context, and in some cases, conserved stem-loops, providing clues about readthrough regulation and potential mechanisms. Lastly, we expand our studies beyond Drosophila and find evidence of abundant readthrough in several other insect species and one crustacean, and several readthrough candidates in nematode and human, suggesting that functionally important translational stop codon readthrough is significantly more prevalent in Metazoa than previously recognized.


Assuntos
Códon de Terminação/fisiologia , Genes de Insetos/fisiologia , Fases de Leitura Aberta/fisiologia , Biossíntese de Proteínas/fisiologia , Animais , Proteínas de Drosophila/biossíntese , Proteínas de Drosophila/genética , Drosophila melanogaster , Humanos
18.
Bioinformatics ; 27(13): i275-82, 2011 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-21685081

RESUMO

MOTIVATION: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. RESULTS: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. AVAILABILITY AND IMPLEMENTATION: The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF CONTACT: mlin@mit.edu; manoli@mit.edu.


Assuntos
Drosophila melanogaster/genética , Genômica/métodos , Fases de Leitura Aberta , Alinhamento de Sequência/métodos , Animais , Sequência de Bases , Drosophila/classificação , Drosophila/genética , Perfilação da Expressão Gênica , Mamíferos/genética , Schizosaccharomyces/genética
19.
Science ; 330(6012): 1787-97, 2010 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-21177974

RESUMO

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.


Assuntos
Cromatina , Drosophila melanogaster/genética , Redes Reguladoras de Genes , Genoma de Inseto , Anotação de Sequência Molecular , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Biologia Computacional/métodos , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/metabolismo , Epigênese Genética , Regulação da Expressão Gênica , Genes de Insetos , Genômica/métodos , Histonas/metabolismo , Nucleossomos/genética , Nucleossomos/metabolismo , Regiões Promotoras Genéticas , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Fatores de Transcrição/metabolismo , Transcrição Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA