Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
2.
Genome Res ; 27(3): 491-499, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28100584

RESUMO

Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package.


Assuntos
Análise de Sequência de DNA/normas , Software , Humanos , Análise de Sequência de DNA/métodos
3.
Nat Rev Genet ; 15(2): 121-32, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24434847

RESUMO

Sequencing technologies have placed a wide range of genomic analyses within the capabilities of many laboratories. However, sequencing costs often set limits to the amount of sequences that can be generated and, consequently, the biological outcomes that can be achieved from an experimental design. In this Review, we discuss the issue of sequencing depth in the design of next-generation sequencing experiments. We review current guidelines and precedents on the issue of coverage, as well as their underlying considerations, for four major study designs, which include de novo genome sequencing, genome resequencing, transcriptome sequencing and genomic location analyses (for example, chromatin immunoprecipitation followed by sequencing (ChIP-seq) and chromosome conformation capture (3C)).


Assuntos
Imunoprecipitação da Cromatina/métodos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Guias como Assunto , Humanos
4.
Hum Mol Genet ; 26(3): 552-566, 2017 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-28096185

RESUMO

While induced pluripotent stem cell (iPSC) technologies enable the study of inaccessible patient cell types, cellular heterogeneity can confound the comparison of gene expression profiles between iPSC-derived cell lines. Here, we purified iPSC-derived human dopaminergic neurons (DaNs) using the intracellular marker, tyrosine hydroxylase. Once purified, the transcriptomic profiles of iPSC-derived DaNs appear remarkably similar to profiles obtained from mature post-mortem DaNs. Comparison of the profiles of purified iPSC-derived DaNs derived from Parkinson's disease (PD) patients carrying LRRK2 G2019S variants to controls identified significant functional convergence amongst differentially-expressed (DE) genes. The PD LRRK2-G2019S associated profile was positively matched with expression changes induced by the Parkinsonian neurotoxin rotenone and opposed by those induced by clioquinol, a compound with demonstrated therapeutic efficacy in multiple PD models. No functional convergence amongst DE genes was observed following a similar comparison using non-purified iPSC-derived DaN-containing populations, with cellular heterogeneity appearing a greater confound than genotypic background.


Assuntos
Células-Tronco Pluripotentes Induzidas/efeitos dos fármacos , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Doença de Parkinson/tratamento farmacológico , Transcriptoma/genética , Autopsia , Células Cultivadas , Clioquinol/administração & dosagem , Dopamina/genética , Neurônios Dopaminérgicos/efeitos dos fármacos , Neurônios Dopaminérgicos/metabolismo , Neurônios Dopaminérgicos/patologia , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica/efeitos dos fármacos , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/biossíntese , Mutação , Doença de Parkinson/genética , Doença de Parkinson/patologia , Rotenona/metabolismo , Rotenona/toxicidade , Transcriptoma/efeitos dos fármacos
5.
Blood ; 128(7): e10-9, 2016 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-27381906

RESUMO

Long noncoding RNAs (lncRNAs) are potentially important regulators of cell differentiation and development, but little is known about their roles in B lymphocytes. Using RNA-seq and de novo transcript assembly, we identified 4516 lncRNAs expressed in 11 stages of B-cell development and activation. Most of these lncRNAs have not been previously detected, even in the closely related T-cell lineage. Comparison with lncRNAs previously described in human B cells identified 185 mouse lncRNAs that have human orthologs. Using chromatin immunoprecipitation-seq, we classified 20% of the lncRNAs as either enhancer-associated (eRNA) or promoter-associated RNAs. We identified 126 eRNAs whose expression closely correlated with the nearest coding gene, thereby indicating the likely location of numerous enhancers active in the B-cell lineage. Furthermore, using this catalog of newly discovered lncRNAs, we show that PAX5, a transcription factor required to specify the B-cell lineage, bound to and regulated the expression of 109 lncRNAs in pro-B and mature B cells and 184 lncRNAs in acute lymphoblastic leukemia.


Assuntos
Linfócitos B/imunologia , Ativação Linfocitária/genética , RNA Longo não Codificante/metabolismo , Animais , Transformação Celular Neoplásica/genética , Transformação Celular Neoplásica/patologia , Cromatina/metabolismo , Elementos Facilitadores Genéticos/genética , Feminino , Regulação da Expressão Gênica , Loci Gênicos , Humanos , Camundongos Endogâmicos C57BL , Fases de Leitura Aberta/genética , Fator de Transcrição PAX5/metabolismo , Leucemia-Linfoma Linfoblástico de Células Precursoras B/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras B/patologia , Regiões Promotoras Genéticas/genética , RNA Longo não Codificante/genética
6.
Nature ; 483(7388): 169-75, 2012 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-22398555

RESUMO

Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.


Assuntos
Evolução Molecular , Especiação Genética , Genoma/genética , Gorilla gorilla/genética , Animais , Feminino , Regulação da Expressão Gênica , Variação Genética/genética , Genômica , Humanos , Macaca mulatta/genética , Dados de Sequência Molecular , Pan troglodytes/genética , Filogenia , Pongo/genética , Proteínas/genética , Alinhamento de Sequência , Especificidade da Espécie , Transcrição Gênica
7.
Genome Res ; 24(12): 1918-31, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25224068

RESUMO

Promiscuous gene expression (PGE) by thymic epithelial cells (TEC) is essential for generating a diverse T cell antigen receptor repertoire tolerant to self-antigens, and thus for avoiding autoimmunity. Nevertheless, the extent and nature of this unusual expression program within TEC populations and single cells are unknown. Using deep transcriptome sequencing of carefully identified mouse TEC subpopulations, we discovered a program of PGE that is common between medullary (m) and cortical TEC, further elaborated in mTEC, and completed in mature mTEC expressing the autoimmune regulator gene (Aire). TEC populations are capable of expressing up to 19,293 protein-coding genes, the highest number of genes known to be expressed in any cell type. Remarkably, in mouse mTEC, Aire expression alone positively regulates 3980 tissue-restricted genes. Notably, the tissue specificities of these genes include known targets of autoimmunity in human AIRE deficiency. Led by the observation that genes induced by Aire expression are generally characterized by a repressive chromatin state in somatic tissues, we found these genes to be strongly associated with H3K27me3 marks in mTEC. Our findings are consistent with AIRE targeting and inducing the promiscuous expression of genes previously epigenetically silenced by Polycomb group proteins. Comparison of the transcriptomes of 174 single mTEC indicates that genes induced by Aire expression are transcribed stochastically at low cell frequency. Furthermore, when present, Aire expression-dependent transcript levels were 16-fold higher, on average, in individual TEC than in the mTEC population.


Assuntos
Autoantígenos/genética , Células Epiteliais/metabolismo , Inativação Gênica , Proteínas do Grupo Polycomb/genética , Timo/citologia , Timo/metabolismo , Fatores de Transcrição/genética , Acetilação , Animais , Autoantígenos/imunologia , Cromatina/genética , Cromatina/metabolismo , Análise por Conglomerados , Biologia Computacional , Expressão Gênica , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Ordem dos Genes , Marcação de Genes , Loci Gênicos , Vetores Genéticos/genética , Genômica/métodos , Histonas/metabolismo , Camundongos , Camundongos Transgênicos , Especificidade de Órgãos/genética , Proteínas do Grupo Polycomb/metabolismo , Transdução de Sinais , Análise de Célula Única , Timo/imunologia , Fatores de Transcrição/metabolismo , Transcriptoma , Proteína AIRE
8.
Nature ; 477(7364): 289-94, 2011 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-21921910

RESUMO

We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.


Assuntos
Regulação da Expressão Gênica/genética , Variação Genética/genética , Genoma/genética , Camundongos Endogâmicos/genética , Camundongos/genética , Fenótipo , Alelos , Animais , Animais de Laboratório/genética , Genômica , Camundongos/classificação , Camundongos Endogâmicos C57BL/genética , Filogenia , Locos de Características Quantitativas/genética
9.
Nature ; 469(7331): 529-33, 2011 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-21270892

RESUMO

'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.


Assuntos
Variação Genética , Genoma/genética , Pongo abelii/genética , Pongo pygmaeus/genética , Animais , Centrômero/genética , Cerebrosídeos/metabolismo , Cromossomos , Evolução Molecular , Feminino , Rearranjo Gênico/genética , Especiação Genética , Genética Populacional , Humanos , Masculino , Filogenia , Densidade Demográfica , Dinâmica Populacional , Especificidade da Espécie
10.
Nature ; 477(7366): 587-91, 2011 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-21881562

RESUMO

The evolution of the amniotic egg was one of the great evolutionary innovations in the history of life, freeing vertebrates from an obligatory connection to water and thus permitting the conquest of terrestrial environments. Among amniotes, genome sequences are available for mammals and birds, but not for non-avian reptiles. Here we report the genome sequence of the North American green anole lizard, Anolis carolinensis. We find that A. carolinensis microchromosomes are highly syntenic with chicken microchromosomes, yet do not exhibit the high GC and low repeat content that are characteristic of avian microchromosomes. Also, A. carolinensis mobile elements are very young and diverse-more so than in any other sequenced amniote genome. The GC content of this lizard genome is also unusual in its homogeneity, unlike the regionally variable GC content found in mammals and birds. We describe and assign sequence to the previously unknown A. carolinensis X chromosome. Comparative gene analysis shows that amniote egg proteins have evolved significantly more rapidly than other proteins. An anole phylogeny resolves basal branches to illuminate the history of their repeated adaptive radiations.


Assuntos
Aves/genética , Evolução Molecular , Genoma/genética , Lagartos/genética , Mamíferos/genética , Animais , Galinhas/genética , Sequência Rica em GC/genética , Genômica , Humanos , Dados de Sequência Molecular , Filogenia , Sintenia/genética , Cromossomo X/genética
11.
Nature ; 464(7289): 757-62, 2010 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-20360741

RESUMO

The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.


Assuntos
Tentilhões/genética , Genoma/genética , Regiões 3' não Traduzidas/genética , Animais , Percepção Auditiva/genética , Encéfalo/fisiologia , Galinhas/genética , Evolução Molecular , Feminino , Tentilhões/fisiologia , Duplicação Gênica , Redes Reguladoras de Genes/genética , Masculino , MicroRNAs/genética , Modelos Animais , Família Multigênica/genética , Retroelementos/genética , Cromossomos Sexuais/genética , Sequências Repetidas Terminais/genética , Transcrição Gênica/genética , Vocalização Animal/fisiologia
12.
Nucleic Acids Res ; 42(Database issue): D222-30, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24288371

RESUMO

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.


Assuntos
Bases de Dados de Proteínas , Alinhamento de Sequência , Análise de Sequência de Proteína , Internet , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Proteínas/química , Proteínas/classificação , Proteínas/genética , Proteoma/química , Análise de Sequência de DNA
13.
Bioinformatics ; 30(9): 1290-1, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24395753

RESUMO

Computational genomics seeks to draw biological inferences from genomic datasets, often by integrating and contextualizing next-generation sequencing data. CGAT provides an extensive suite of tools designed to assist in the analysis of genome scale data from a range of standard file formats. The toolkit enables filtering, comparison, conversion, summarization and annotation of genomic intervals, gene sets and sequences. The tools can both be run from the Unix command line and installed into visual workflow builders, such as Galaxy.


Assuntos
Genômica/métodos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Software , Fluxo de Trabalho
14.
Bioinformatics ; 29(16): 2046-8, 2013 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-23782611

RESUMO

MOTIVATION: A common question in genomic analysis is whether two sets of genomic intervals overlap significantly. This question arises, for example, when interpreting ChIP-Seq or RNA-Seq data in functional terms. Because genome organization is complex, answering this question is non-trivial. SUMMARY: We present Genomic Association Test (GAT), a tool for estimating the significance of overlap between multiple sets of genomic intervals. GAT implements a null model that the two sets of intervals are placed independently of one another, but allows each set's density to depend on external variables, for example, isochore structure or chromosome identity. GAT estimates statistical significance based on simulation and controls for multiple tests using the false discovery rate. AVAILABILITY: GAT's source code, documentation and tutorials are available at http://code.google.com/p/genomic-association-tester.


Assuntos
Genômica/métodos , Software , Sítios de Ligação , Imunoprecipitação da Cromatina , Simulação por Computador , Desoxirribonuclease I , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo
15.
BMC Cancer ; 14: 977, 2014 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-25519703

RESUMO

BACKGROUND: Although chemotherapy for prostate cancer (PCa) can improve patient survival, some tumours are chemo-resistant. Tumour molecular profiles may help identify the mechanisms of drug action and identify potential prognostic biomarkers. We performed in vivo transcriptome profiling of pre- and post-treatment prostatic biopsies from patients with advanced hormone-naive prostate cancer treated with docetaxel chemotherapy and androgen deprivation therapy (ADT) with an aim to identify the mechanisms of drug action and identify prognostic biomarkers. METHODS: RNA sequencing (RNA-Seq) was performed on biopsies from four patients before and ~22 weeks after docetaxel and ADT initiation. Gene fusion products and differentially-regulated genes between treatment pairs were identified using TopHat and pathway enrichment analyses undertaken. Publically available datasets were interrogated to perform survival analyses on the gene signatures identified using cBioportal. RESULTS: A number of genomic rearrangements were identified including the TMPRSS2/ERG fusion and 3 novel gene fusions involving the ETS family of transcription factors in patients, both pre and post chemotherapy. In total, gene expression analyses showed differential expression of at least 2 fold in 575 genes in post-chemotherapy biopsies. Of these, pathway analyses identified a panel of 7 genes (ADAM7, FAM72B, BUB1B, CCNB1, CCNB2, TTK, CDK1), including a cell cycle-related geneset, that were differentially-regulated following treatment with docetaxel and ADT. Using cBioportal to interrogate the MSKCC-Prostate Oncogenome Project dataset we observed a statistically-significant reduction in disease-free survival of patients with tumours exhibiting alterations in gene expression of the above panel of 7 genes (p = 0.015). CONCLUSIONS: Here we report on the first "real-time" in vivo RNA-Seq-based transcriptome analysis of clinical PCa from pre- and post-treatment TRUSS-guided biopsies of patients treated with docetaxel chemotherapy plus ADT. We identify a chemotherapy-driven PCa transcriptome profile which includes the down-regulation of important positive regulators of cell cycle progression. A 7 gene signature biomarker panel has also been identified in high-risk prostate cancer patients to be of prognostic value. Future prospective study is warranted to evaluate the clinical value of this panel.


Assuntos
Perfilação da Expressão Gênica , Neoplasias da Próstata/genética , Neoplasias da Próstata/mortalidade , Transcriptoma , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Biópsia , Biologia Computacional , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Masculino , Gradação de Tumores , Estadiamento de Neoplasias , Prognóstico , Neoplasias da Próstata/patologia , Neoplasias da Próstata/terapia
16.
Nature ; 453(7192): 175-83, 2008 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-18464734

RESUMO

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.


Assuntos
Evolução Molecular , Genoma/genética , Ornitorrinco/genética , Animais , Composição de Bases , Dentição , Feminino , Impressão Genômica/genética , Humanos , Imunidade/genética , Masculino , Mamíferos/genética , MicroRNAs/genética , Proteínas do Leite/genética , Filogenia , Ornitorrinco/imunologia , Ornitorrinco/fisiologia , Receptores Odorantes/genética , Sequências Repetitivas de Ácido Nucleico/genética , Répteis/genética , Análise de Sequência de DNA , Espermatozoides/metabolismo , Peçonhas/genética , Zona Pelúcida/metabolismo
17.
Nucleic Acids Res ; 40(Database issue): D290-301, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22127870

RESUMO

Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Enciclopédias como Assunto , Internet , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos
18.
BMC Genomics ; 14: 95, 2013 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-23402223

RESUMO

BACKGROUND: A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin's (Galápagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galápagos archipelago took place in the last 2-3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris. RESULTS: 13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin's finch lineage. Two of these genes (Igf2r and Pou1f1) have been implicated in beak morphology changes in Darwin's finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins. CONCLUSIONS: These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin's finches.


Assuntos
Evolução Molecular , Genômica , Passeriformes/genética , Adaptação Fisiológica , Animais , Genética Populacional , Modelos Genéticos , Passeriformes/fisiologia , Homologia de Sequência do Ácido Nucleico
19.
Genome Res ; 20(10): 1352-60, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20736230

RESUMO

Initially thought to play a restricted role in calcium homeostasis, the pleiotropic actions of vitamin D in biology and their clinical significance are only now becoming apparent. However, the mode of action of vitamin D, through its cognate nuclear vitamin D receptor (VDR), and its contribution to diverse disorders, remain poorly understood. We determined VDR binding throughout the human genome using chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq). After calcitriol stimulation, we identified 2776 genomic positions occupied by the VDR and 229 genes with significant changes in expression in response to vitamin D. VDR binding sites were significantly enriched near autoimmune and cancer associated genes identified from genome-wide association (GWA) studies. Notable genes with VDR binding included IRF8, associated with MS, and PTPN2 associated with Crohn's disease and T1D. Furthermore, a number of single nucleotide polymorphism associations from GWA were located directly within VDR binding intervals, for example, rs13385731 associated with SLE and rs947474 associated with T1D. We also observed significant enrichment of VDR intervals within regions of positive selection among individuals of Asian and European descent. ChIP-seq determination of transcription factor binding, in combination with GWA data, provides a powerful approach to further understanding the molecular bases of complex diseases.


Assuntos
Doenças Autoimunes/genética , Imunoprecipitação da Cromatina , Evolução Molecular , Estudo de Associação Genômica Ampla , Receptores de Calcitriol/metabolismo , Vitamina D/metabolismo , Sítios de Ligação , Doença de Crohn/genética , Diabetes Mellitus Tipo 1/genética , Humanos , Fatores Reguladores de Interferon/genética , Fatores Reguladores de Interferon/metabolismo , Esclerose Múltipla/genética , Ligação Proteica , Proteína Tirosina Fosfatase não Receptora Tipo 2/genética , Proteína Tirosina Fosfatase não Receptora Tipo 2/metabolismo , Análise de Sequência de DNA/métodos
20.
Nature ; 447(7141): 167-77, 2007 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-17495919

RESUMO

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.


Assuntos
Evolução Molecular , Genoma/genética , Genômica , Gambás/genética , Animais , Composição de Bases , Sequência Conservada/genética , Elementos de DNA Transponíveis/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Biossíntese de Proteínas , Sintenia/genética , Inativação do Cromossomo X/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA