Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
2.
Genome Res ; 27(3): 491-499, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-28100584

RESUMEN

Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package.


Asunto(s)
Análisis de Secuencia de ADN/normas , Programas Informáticos , Humanos , Análisis de Secuencia de ADN/métodos
3.
Nat Rev Genet ; 15(2): 121-32, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24434847

RESUMEN

Sequencing technologies have placed a wide range of genomic analyses within the capabilities of many laboratories. However, sequencing costs often set limits to the amount of sequences that can be generated and, consequently, the biological outcomes that can be achieved from an experimental design. In this Review, we discuss the issue of sequencing depth in the design of next-generation sequencing experiments. We review current guidelines and precedents on the issue of coverage, as well as their underlying considerations, for four major study designs, which include de novo genome sequencing, genome resequencing, transcriptome sequencing and genomic location analyses (for example, chromatin immunoprecipitation followed by sequencing (ChIP-seq) and chromosome conformation capture (3C)).


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Perfilación de la Expresión Génica/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , Guías como Asunto , Humanos
4.
Hum Mol Genet ; 26(3): 552-566, 2017 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-28096185

RESUMEN

While induced pluripotent stem cell (iPSC) technologies enable the study of inaccessible patient cell types, cellular heterogeneity can confound the comparison of gene expression profiles between iPSC-derived cell lines. Here, we purified iPSC-derived human dopaminergic neurons (DaNs) using the intracellular marker, tyrosine hydroxylase. Once purified, the transcriptomic profiles of iPSC-derived DaNs appear remarkably similar to profiles obtained from mature post-mortem DaNs. Comparison of the profiles of purified iPSC-derived DaNs derived from Parkinson's disease (PD) patients carrying LRRK2 G2019S variants to controls identified significant functional convergence amongst differentially-expressed (DE) genes. The PD LRRK2-G2019S associated profile was positively matched with expression changes induced by the Parkinsonian neurotoxin rotenone and opposed by those induced by clioquinol, a compound with demonstrated therapeutic efficacy in multiple PD models. No functional convergence amongst DE genes was observed following a similar comparison using non-purified iPSC-derived DaN-containing populations, with cellular heterogeneity appearing a greater confound than genotypic background.


Asunto(s)
Células Madre Pluripotentes Inducidas/efectos de los fármacos , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/genética , Enfermedad de Parkinson/tratamiento farmacológico , Transcriptoma/genética , Autopsia , Células Cultivadas , Clioquinol/administración & dosificación , Dopamina/genética , Neuronas Dopaminérgicas/efectos de los fármacos , Neuronas Dopaminérgicas/metabolismo , Neuronas Dopaminérgicas/patología , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica/efectos de los fármacos , Humanos , Células Madre Pluripotentes Inducidas/metabolismo , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/biosíntesis , Mutación , Enfermedad de Parkinson/genética , Enfermedad de Parkinson/patología , Rotenona/metabolismo , Rotenona/toxicidad , Transcriptoma/efectos de los fármacos
5.
Blood ; 128(7): e10-9, 2016 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-27381906

RESUMEN

Long noncoding RNAs (lncRNAs) are potentially important regulators of cell differentiation and development, but little is known about their roles in B lymphocytes. Using RNA-seq and de novo transcript assembly, we identified 4516 lncRNAs expressed in 11 stages of B-cell development and activation. Most of these lncRNAs have not been previously detected, even in the closely related T-cell lineage. Comparison with lncRNAs previously described in human B cells identified 185 mouse lncRNAs that have human orthologs. Using chromatin immunoprecipitation-seq, we classified 20% of the lncRNAs as either enhancer-associated (eRNA) or promoter-associated RNAs. We identified 126 eRNAs whose expression closely correlated with the nearest coding gene, thereby indicating the likely location of numerous enhancers active in the B-cell lineage. Furthermore, using this catalog of newly discovered lncRNAs, we show that PAX5, a transcription factor required to specify the B-cell lineage, bound to and regulated the expression of 109 lncRNAs in pro-B and mature B cells and 184 lncRNAs in acute lymphoblastic leukemia.


Asunto(s)
Linfocitos B/inmunología , Activación de Linfocitos/genética , ARN Largo no Codificante/metabolismo , Animales , Transformación Celular Neoplásica/genética , Transformación Celular Neoplásica/patología , Cromatina/metabolismo , Elementos de Facilitación Genéticos/genética , Femenino , Regulación de la Expresión Génica , Sitios Genéticos , Humanos , Ratones Endogámicos C57BL , Sistemas de Lectura Abierta/genética , Factor de Transcripción PAX5/metabolismo , Leucemia-Linfoma Linfoblástico de Células Precursoras B/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras B/patología , Regiones Promotoras Genéticas/genética , ARN Largo no Codificante/genética
6.
Nature ; 483(7388): 169-75, 2012 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-22398555

RESUMEN

Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.


Asunto(s)
Evolución Molecular , Especiación Genética , Genoma/genética , Gorilla gorilla/genética , Animales , Femenino , Regulación de la Expresión Génica , Variación Genética/genética , Genómica , Humanos , Macaca mulatta/genética , Datos de Secuencia Molecular , Pan troglodytes/genética , Filogenia , Pongo/genética , Proteínas/genética , Alineación de Secuencia , Especificidad de la Especie , Transcripción Genética
7.
Genome Res ; 24(12): 1918-31, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25224068

RESUMEN

Promiscuous gene expression (PGE) by thymic epithelial cells (TEC) is essential for generating a diverse T cell antigen receptor repertoire tolerant to self-antigens, and thus for avoiding autoimmunity. Nevertheless, the extent and nature of this unusual expression program within TEC populations and single cells are unknown. Using deep transcriptome sequencing of carefully identified mouse TEC subpopulations, we discovered a program of PGE that is common between medullary (m) and cortical TEC, further elaborated in mTEC, and completed in mature mTEC expressing the autoimmune regulator gene (Aire). TEC populations are capable of expressing up to 19,293 protein-coding genes, the highest number of genes known to be expressed in any cell type. Remarkably, in mouse mTEC, Aire expression alone positively regulates 3980 tissue-restricted genes. Notably, the tissue specificities of these genes include known targets of autoimmunity in human AIRE deficiency. Led by the observation that genes induced by Aire expression are generally characterized by a repressive chromatin state in somatic tissues, we found these genes to be strongly associated with H3K27me3 marks in mTEC. Our findings are consistent with AIRE targeting and inducing the promiscuous expression of genes previously epigenetically silenced by Polycomb group proteins. Comparison of the transcriptomes of 174 single mTEC indicates that genes induced by Aire expression are transcribed stochastically at low cell frequency. Furthermore, when present, Aire expression-dependent transcript levels were 16-fold higher, on average, in individual TEC than in the mTEC population.


Asunto(s)
Autoantígenos/genética , Células Epiteliales/metabolismo , Silenciador del Gen , Proteínas del Grupo Polycomb/genética , Timo/citología , Timo/metabolismo , Factores de Transcripción/genética , Acetilación , Animales , Autoantígenos/inmunología , Cromatina/genética , Cromatina/metabolismo , Análisis por Conglomerados , Biología Computacional , Expresión Génica , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Orden Génico , Marcación de Gen , Sitios Genéticos , Vectores Genéticos/genética , Genómica/métodos , Histonas/metabolismo , Ratones , Ratones Transgénicos , Especificidad de Órganos/genética , Proteínas del Grupo Polycomb/metabolismo , Transducción de Señal , Análisis de la Célula Individual , Timo/inmunología , Factores de Transcripción/metabolismo , Transcriptoma , Proteína AIRE
8.
Nature ; 477(7364): 289-94, 2011 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-21921910

RESUMEN

We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.


Asunto(s)
Regulación de la Expresión Génica/genética , Variación Genética/genética , Genoma/genética , Ratones Endogámicos/genética , Ratones/genética , Fenotipo , Alelos , Animales , Animales de Laboratorio/genética , Genómica , Ratones/clasificación , Ratones Endogámicos C57BL/genética , Filogenia , Sitios de Carácter Cuantitativo/genética
9.
Nature ; 477(7366): 587-91, 2011 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-21881562

RESUMEN

The evolution of the amniotic egg was one of the great evolutionary innovations in the history of life, freeing vertebrates from an obligatory connection to water and thus permitting the conquest of terrestrial environments. Among amniotes, genome sequences are available for mammals and birds, but not for non-avian reptiles. Here we report the genome sequence of the North American green anole lizard, Anolis carolinensis. We find that A. carolinensis microchromosomes are highly syntenic with chicken microchromosomes, yet do not exhibit the high GC and low repeat content that are characteristic of avian microchromosomes. Also, A. carolinensis mobile elements are very young and diverse-more so than in any other sequenced amniote genome. The GC content of this lizard genome is also unusual in its homogeneity, unlike the regionally variable GC content found in mammals and birds. We describe and assign sequence to the previously unknown A. carolinensis X chromosome. Comparative gene analysis shows that amniote egg proteins have evolved significantly more rapidly than other proteins. An anole phylogeny resolves basal branches to illuminate the history of their repeated adaptive radiations.


Asunto(s)
Aves/genética , Evolución Molecular , Genoma/genética , Lagartos/genética , Mamíferos/genética , Animales , Pollos/genética , Secuencia Rica en GC/genética , Genómica , Humanos , Datos de Secuencia Molecular , Filogenia , Sintenía/genética , Cromosoma X/genética
10.
Nature ; 469(7331): 529-33, 2011 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-21270892

RESUMEN

'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.


Asunto(s)
Variación Genética , Genoma/genética , Pongo abelii/genética , Pongo pygmaeus/genética , Animales , Centrómero/genética , Cerebrósidos/metabolismo , Cromosomas , Evolución Molecular , Femenino , Reordenamiento Génico/genética , Especiación Genética , Genética de Población , Humanos , Masculino , Filogenia , Densidad de Población , Dinámica Poblacional , Especificidad de la Especie
11.
Nature ; 464(7289): 757-62, 2010 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-20360741

RESUMEN

The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.


Asunto(s)
Pinzones/genética , Genoma/genética , Regiones no Traducidas 3'/genética , Animales , Percepción Auditiva/genética , Encéfalo/fisiología , Pollos/genética , Evolución Molecular , Femenino , Pinzones/fisiología , Duplicación de Gen , Redes Reguladoras de Genes/genética , Masculino , MicroARNs/genética , Modelos Animales , Familia de Multigenes/genética , Retroelementos/genética , Cromosomas Sexuales/genética , Secuencias Repetidas Terminales/genética , Transcripción Genética/genética , Vocalización Animal/fisiología
12.
Nucleic Acids Res ; 42(Database issue): D222-30, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24288371

RESUMEN

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.


Asunto(s)
Bases de Datos de Proteínas , Alineación de Secuencia , Análisis de Secuencia de Proteína , Internet , Proteínas Intrínsecamente Desordenadas/química , Conformación Proteica , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Proteoma/química , Análisis de Secuencia de ADN
13.
Bioinformatics ; 30(9): 1290-1, 2014 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24395753

RESUMEN

Computational genomics seeks to draw biological inferences from genomic datasets, often by integrating and contextualizing next-generation sequencing data. CGAT provides an extensive suite of tools designed to assist in the analysis of genome scale data from a range of standard file formats. The toolkit enables filtering, comparison, conversion, summarization and annotation of genomic intervals, gene sets and sequences. The tools can both be run from the Unix command line and installed into visual workflow builders, such as Galaxy.


Asunto(s)
Genómica/métodos , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Flujo de Trabajo
14.
Bioinformatics ; 29(16): 2046-8, 2013 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-23782611

RESUMEN

MOTIVATION: A common question in genomic analysis is whether two sets of genomic intervals overlap significantly. This question arises, for example, when interpreting ChIP-Seq or RNA-Seq data in functional terms. Because genome organization is complex, answering this question is non-trivial. SUMMARY: We present Genomic Association Test (GAT), a tool for estimating the significance of overlap between multiple sets of genomic intervals. GAT implements a null model that the two sets of intervals are placed independently of one another, but allows each set's density to depend on external variables, for example, isochore structure or chromosome identity. GAT estimates statistical significance based on simulation and controls for multiple tests using the false discovery rate. AVAILABILITY: GAT's source code, documentation and tutorials are available at http://code.google.com/p/genomic-association-tester.


Asunto(s)
Genómica/métodos , Programas Informáticos , Sitios de Unión , Inmunoprecipitación de Cromatina , Simulación por Computador , Desoxirribonucleasa I , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo
15.
BMC Cancer ; 14: 977, 2014 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-25519703

RESUMEN

BACKGROUND: Although chemotherapy for prostate cancer (PCa) can improve patient survival, some tumours are chemo-resistant. Tumour molecular profiles may help identify the mechanisms of drug action and identify potential prognostic biomarkers. We performed in vivo transcriptome profiling of pre- and post-treatment prostatic biopsies from patients with advanced hormone-naive prostate cancer treated with docetaxel chemotherapy and androgen deprivation therapy (ADT) with an aim to identify the mechanisms of drug action and identify prognostic biomarkers. METHODS: RNA sequencing (RNA-Seq) was performed on biopsies from four patients before and ~22 weeks after docetaxel and ADT initiation. Gene fusion products and differentially-regulated genes between treatment pairs were identified using TopHat and pathway enrichment analyses undertaken. Publically available datasets were interrogated to perform survival analyses on the gene signatures identified using cBioportal. RESULTS: A number of genomic rearrangements were identified including the TMPRSS2/ERG fusion and 3 novel gene fusions involving the ETS family of transcription factors in patients, both pre and post chemotherapy. In total, gene expression analyses showed differential expression of at least 2 fold in 575 genes in post-chemotherapy biopsies. Of these, pathway analyses identified a panel of 7 genes (ADAM7, FAM72B, BUB1B, CCNB1, CCNB2, TTK, CDK1), including a cell cycle-related geneset, that were differentially-regulated following treatment with docetaxel and ADT. Using cBioportal to interrogate the MSKCC-Prostate Oncogenome Project dataset we observed a statistically-significant reduction in disease-free survival of patients with tumours exhibiting alterations in gene expression of the above panel of 7 genes (p = 0.015). CONCLUSIONS: Here we report on the first "real-time" in vivo RNA-Seq-based transcriptome analysis of clinical PCa from pre- and post-treatment TRUSS-guided biopsies of patients treated with docetaxel chemotherapy plus ADT. We identify a chemotherapy-driven PCa transcriptome profile which includes the down-regulation of important positive regulators of cell cycle progression. A 7 gene signature biomarker panel has also been identified in high-risk prostate cancer patients to be of prognostic value. Future prospective study is warranted to evaluate the clinical value of this panel.


Asunto(s)
Perfilación de la Expresión Génica , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/mortalidad , Transcriptoma , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Biopsia , Biología Computacional , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Masculino , Clasificación del Tumor , Estadificación de Neoplasias , Pronóstico , Neoplasias de la Próstata/patología , Neoplasias de la Próstata/terapia
16.
Nature ; 453(7192): 175-83, 2008 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-18464734

RESUMEN

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.


Asunto(s)
Evolución Molecular , Genoma/genética , Ornitorrinco/genética , Animales , Composición de Base , Dentición , Femenino , Impresión Genómica/genética , Humanos , Inmunidad/genética , Masculino , Mamíferos/genética , MicroARNs/genética , Proteínas de la Leche/genética , Filogenia , Ornitorrinco/inmunología , Ornitorrinco/fisiología , Receptores Odorantes/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Reptiles/genética , Análisis de Secuencia de ADN , Espermatozoides/metabolismo , Ponzoñas/genética , Zona Pelúcida/metabolismo
17.
Nucleic Acids Res ; 40(Database issue): D290-301, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22127870

RESUMEN

Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Enciclopedias como Asunto , Internet , Estructura Terciaria de Proteína , Homología de Secuencia de Aminoácido
18.
BMC Genomics ; 14: 95, 2013 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-23402223

RESUMEN

BACKGROUND: A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin's (Galápagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galápagos archipelago took place in the last 2-3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris. RESULTS: 13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin's finch lineage. Two of these genes (Igf2r and Pou1f1) have been implicated in beak morphology changes in Darwin's finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins. CONCLUSIONS: These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin's finches.


Asunto(s)
Evolución Molecular , Genómica , Passeriformes/genética , Adaptación Fisiológica , Animales , Genética de Población , Modelos Genéticos , Passeriformes/fisiología , Homología de Secuencia de Ácido Nucleico
19.
Genome Res ; 20(10): 1352-60, 2010 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-20736230

RESUMEN

Initially thought to play a restricted role in calcium homeostasis, the pleiotropic actions of vitamin D in biology and their clinical significance are only now becoming apparent. However, the mode of action of vitamin D, through its cognate nuclear vitamin D receptor (VDR), and its contribution to diverse disorders, remain poorly understood. We determined VDR binding throughout the human genome using chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq). After calcitriol stimulation, we identified 2776 genomic positions occupied by the VDR and 229 genes with significant changes in expression in response to vitamin D. VDR binding sites were significantly enriched near autoimmune and cancer associated genes identified from genome-wide association (GWA) studies. Notable genes with VDR binding included IRF8, associated with MS, and PTPN2 associated with Crohn's disease and T1D. Furthermore, a number of single nucleotide polymorphism associations from GWA were located directly within VDR binding intervals, for example, rs13385731 associated with SLE and rs947474 associated with T1D. We also observed significant enrichment of VDR intervals within regions of positive selection among individuals of Asian and European descent. ChIP-seq determination of transcription factor binding, in combination with GWA data, provides a powerful approach to further understanding the molecular bases of complex diseases.


Asunto(s)
Enfermedades Autoinmunes/genética , Inmunoprecipitación de Cromatina , Evolución Molecular , Estudio de Asociación del Genoma Completo , Receptores de Calcitriol/metabolismo , Vitamina D/metabolismo , Sitios de Unión , Enfermedad de Crohn/genética , Diabetes Mellitus Tipo 1/genética , Humanos , Factores Reguladores del Interferón/genética , Factores Reguladores del Interferón/metabolismo , Esclerosis Múltiple/genética , Unión Proteica , Proteína Tirosina Fosfatasa no Receptora Tipo 2/genética , Proteína Tirosina Fosfatasa no Receptora Tipo 2/metabolismo , Análisis de Secuencia de ADN/métodos
20.
Nature ; 447(7141): 167-77, 2007 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-17495919

RESUMEN

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.


Asunto(s)
Evolución Molecular , Genoma/genética , Genómica , Zarigüeyas/genética , Animales , Composición de Base , Secuencia Conservada/genética , Elementos Transponibles de ADN/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Biosíntesis de Proteínas , Sintenía/genética , Inactivación del Cromosoma X/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA