Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Nucleic Acids Res ; 40(Database issue): D290-301, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22127870

RESUMO

Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Enciclopédias como Assunto , Internet , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos
2.
Nucleic Acids Res ; 40(Database issue): D306-12, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22096229

RESUMO

InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/classificação , Proteínas/fisiologia , Análise de Sequência de Proteína , Software , Terminologia como Assunto , Interface Usuário-Computador
3.
Nucleic Acids Res ; 38(Database issue): D211-22, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19920124

RESUMO

Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Sequência de Aminoácidos , Animais , Biologia Computacional/tendências , Genoma Arqueal , Genoma Fúngico , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Software
4.
PLoS Genet ; 5(3): e1000436, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19325871

RESUMO

Numerous genetic association studies have implicated the KIAA0319 gene on human chromosome 6p22 in dyslexia susceptibility. The causative variant(s) remains unknown but may modulate gene expression, given that (1) a dyslexia-associated haplotype has been implicated in the reduced expression of KIAA0319, and (2) the strongest association has been found for the region spanning exon 1 of KIAA0319. Here, we test the hypothesis that variant(s) responsible for reduced KIAA0319 expression resides on the risk haplotype close to the gene's transcription start site. We identified seven single-nucleotide polymorphisms on the risk haplotype immediately upstream of KIAA0319 and determined that three of these are strongly associated with multiple reading-related traits. Using luciferase-expressing constructs containing the KIAA0319 upstream region, we characterized the minimal promoter and additional putative transcriptional regulator regions. This revealed that the minor allele of rs9461045, which shows the strongest association with dyslexia in our sample (max p-value = 0.0001), confers reduced luciferase expression in both neuronal and non-neuronal cell lines. Additionally, we found that the presence of this rs9461045 dyslexia-associated allele creates a nuclear protein-binding site, likely for the transcriptional silencer OCT-1. Knocking down OCT-1 expression in the neuronal cell line SHSY5Y using an siRNA restores KIAA0319 expression from the risk haplotype to nearly that seen from the non-risk haplotype. Our study thus pinpoints a common variant as altering the function of a dyslexia candidate gene and provides an illustrative example of the strategic approach needed to dissect the molecular basis of complex genetic traits.


Assuntos
Dislexia/genética , Regulação da Expressão Gênica , Proteínas do Tecido Nervoso/genética , Polimorfismo de Nucleotídeo Único , Sítios de Ligação , Linhagem Celular , Regulação para Baixo/genética , Haplótipos , Humanos , Neurônios , Fator 1 de Transcrição de Octâmero/genética , Regiões Promotoras Genéticas
5.
PLoS Genet ; 5(10): e1000688, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19834558

RESUMO

There are two main classes of natural killer (NK) cell receptors in mammals, the killer cell immunoglobulin-like receptors (KIR) and the structurally unrelated killer cell lectin-like receptors (KLR). While KIR represent the most diverse group of NK receptors in all primates studied to date, including humans, apes, and Old and New World monkeys, KLR represent the functional equivalent in rodents. Here, we report a first digression from this rule in lemurs, where the KLR (CD94/NKG2) rather than KIR constitute the most diverse group of NK cell receptors. We demonstrate that natural selection contributed to such diversification in lemurs and particularly targeted KLR residues interacting with the peptide presented by MHC class I ligands. We further show that lemurs lack a strict ortholog or functional equivalent of MHC-E, the ligands of non-polymorphic KLR in "higher" primates. Our data support the existence of a hitherto unknown system of polymorphic and diverse NK cell receptors in primates and of combinatorial diversity as a novel mechanism to increase NK cell receptor repertoire.


Assuntos
Evolução Molecular , Subfamília D de Receptores Semelhantes a Lectina de Células NK/genética , Subfamília D de Receptores Semelhantes a Lectina de Células NK/imunologia , Polimorfismo Genético , Strepsirhini/genética , Strepsirhini/imunologia , Animais , Linhagem Celular , Antígenos de Histocompatibilidade/genética , Antígenos de Histocompatibilidade/imunologia , Humanos , Camundongos , Modelos Moleculares , Subfamília D de Receptores Semelhantes a Lectina de Células NK/química , Filogenia , Estrutura Quaternária de Proteína
6.
BMC Genomics ; 12: 421, 2011 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-21854592

RESUMO

BACKGROUND: The major histocompatibility complex (MHC) is a group of genes with a variety of roles in the innate and adaptive immune responses. MHC genes form a genetically linked cluster in eutherian mammals, an organization that is thought to confer functional and evolutionary advantages to the immune system. The tammar wallaby (Macropus eugenii), an Australian marsupial, provides a unique model for understanding MHC gene evolution, as many of its antigen presenting genes are not linked to the MHC, but are scattered around the genome. RESULTS: Here we describe the 'core' tammar wallaby MHC region on chromosome 2q by ordering and sequencing 33 BAC clones, covering over 4.5 MB and containing 129 genes. When compared to the MHC region of the South American opossum, eutherian mammals and non-mammals, the wallaby MHC has a novel gene organization. The wallaby has undergone an expansion of MHC class II genes, which are separated into two clusters by the class III genes. The antigen processing genes have undergone duplication, resulting in two copies of TAP1 and three copies of TAP2. Notably, Kangaroo Endogenous Retroviral Elements are present within the region and may have contributed to the genomic instability. CONCLUSIONS: The wallaby MHC has been extensively remodeled since the American and Australian marsupials last shared a common ancestor. The instability is characterized by the movement of antigen presenting genes away from the core MHC, most likely via the presence and activity of retroviral elements. We propose that the movement of class II genes away from the ancestral class II region has allowed this gene family to expand and diversify in the wallaby. The duplication of TAP genes in the wallaby MHC makes this species a unique model organism for studying the relationship between MHC gene organization and function.


Assuntos
Evolução Molecular , Instabilidade Genômica , Macropodidae/genética , Complexo Principal de Histocompatibilidade/genética , Família Multigênica , Sequência de Aminoácidos , Animais , Cromossomos Artificiais Bacterianos/genética , Etiquetas de Sequências Expressas , Duplicação Gênica , Genes MHC da Classe II , Macropodidae/imunologia , Masculino , Dados de Sequência Molecular , Filogenia , Mapeamento Físico do Cromossomo , Alinhamento de Sequência , Análise de Sequência de DNA
7.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1148-52, 2010 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-20944204

RESUMO

Domains of unknown function (DUFs) are a large set of uncharacterized protein families that are found in the Pfam database. Here, the scale and growth of functionally uncharacterized families in biological databases are surveyed and the prospects for discovering their function are examined. In particular, the important role that structural genomics can play in identifying potential function is evaluated.


Assuntos
Bases de Dados Genéticas , Proteínas/análise , Genômica , Estrutura Terciária de Proteína
8.
Nucleic Acids Res ; 36(Database issue): D281-8, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18039703

RESUMO

Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/), as well as from mirror sites in France (http://pfam.jouy.inra.fr/) and South Korea (http://pfam.ccbb.re.kr/).


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/classificação , Animais , Genômica , Internet , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Interface Usuário-Computador
9.
BMC Genomics ; 10: 310, 2009 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-19602235

RESUMO

BACKGROUND: MHC class I antigens are encoded by a rapidly evolving gene family comprising classical and non-classical genes that are found in all vertebrates and involved in diverse immune functions. However, there is a fundamental difference between the organization of class I genes in mammals and non-mammals. Non-mammals have a single classical gene responsible for antigen presentation, which is linked to the antigen processing genes, including TAP. This organization allows co-evolution of advantageous class Ia/TAP haplotypes. In contrast, mammals have multiple classical genes within the MHC, which are separated from the antigen processing genes by class III genes. It has been hypothesized that separation of classical class I genes from antigen processing genes in mammals allowed them to duplicate. We investigated this hypothesis by characterizing the class I genes of the tammar wallaby, a model marsupial that has a novel MHC organization, with class I genes located within the MHC and 10 other chromosomal locations. RESULTS: Sequence analysis of 14 BACs containing 15 class I genes revealed that nine class I genes, including one to three classical class I, are not linked to the MHC but are scattered throughout the genome. Kangaroo Endogenous Retroviruses (KERVs) were identified flanking the MHC un-linked class I. The wallaby MHC contains four non-classical class I, interspersed with antigen processing genes. Clear orthologs of non-classical class I are conserved in distant marsupial lineages. CONCLUSION: We demonstrate that classical class I genes are not linked to antigen processing genes in the wallaby and provide evidence that retroviral elements were involved in their movement. The presence of retroviral elements most likely facilitated the formation of recombination hotspots and subsequent diversification of class I genes. The classical class I have moved away from antigen processing genes in eutherian mammals and the wallaby independently, but both lineages appear to have benefited from this loss of linkage by increasing the number of classical genes, perhaps enabling response to a wider range of pathogens. The discovery of non-classical orthologs between distantly related marsupial species is unusual for the rapidly evolving class I genes and may indicate an important marsupial specific function.


Assuntos
Genes MHC Classe I , Ligação Genética , Macropodidae/genética , Animais , Sequência de Bases , Cromossomos Artificiais Bacterianos , Sequência Conservada , Retrovirus Endógenos/genética , Evolução Molecular , Perfilação da Expressão Gênica , Dados de Sequência Molecular , Filogenia , Polimorfismo Genético , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Especificidade da Espécie
10.
BMC Struct Biol ; 9: 75, 2009 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-20017931

RESUMO

BACKGROUND: Many Gram-positive lactic acid bacteria (LAB) produce anti-bacterial peptides and small proteins called bacteriocins, which enable them to compete against other bacteria in the environment. These peptides fall structurally into three different classes, I, II, III, with class IIa being pediocin-like single entities and class IIb being two-peptide bacteriocins. Self-protective cognate immunity proteins are usually co-transcribed with these toxins. Several examples of cognates for IIa have already been solved structurally. Streptococcus pyogenes, closely related to LAB, is one of the most common human pathogens, so knowledge of how it competes against other LAB species is likely to prove invaluable. RESULTS: We have solved the crystal structure of the gene-product of locus Spy_2152 from S. pyogenes, (PDB:2fu2), and found it to comprise an anti-parallel four-helix bundle that is structurally similar to other bacteriocin immunity proteins. Sequence analyses indicate this protein to be a possible immunity protein protective against class IIa or IIb bacteriocins. However, given that S. pyogenes appears to lack any IIa pediocin-like proteins but does possess class IIb bacteriocins, we suggest this protein confers immunity to IIb-like peptides. CONCLUSIONS: Combined structural, genomic and proteomic analyses have allowed the identification and in silico characterization of a new putative immunity protein from S. pyogenes, possibly the first structure of an immunity protein protective against potential class IIb two-peptide bacteriocins. We have named the two pairs of putative bacteriocins found in S. pyogenes pyogenecin 1, 2, 3 and 4.


Assuntos
Bacteriocinas/química , Streptococcus pyogenes/química , Sequência de Aminoácidos , Cristalografia por Raios X , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína
11.
PLoS Genet ; 2(5): e73, 2006 May.
Artigo em Inglês | MEDLINE | ID: mdl-16699593

RESUMO

The innate and adaptive immune systems of vertebrates possess complementary, but intertwined functions within immune responses. Receptors of the mammalian innate immune system play an essential role in the detection of infected or transformed cells and are vital for the initiation and regulation of a full adaptive immune response. The genes for several of these receptors are clustered within the leukocyte receptor complex (LRC). The purpose of this study was to carry out a detailed analysis of the chicken (Gallus gallus domesticus) LRC. Bacterial artificial chromosomes containing genes related to mammalian leukocyte immunoglobulin-like receptors were identified in a chicken genomic library and shown to map to a single microchromosome. Sequencing revealed 103 chicken immunoglobulin-like receptor (CHIR) loci (22 inhibitory, 25 activating, 15 bifunctional, and 41 pseudogenes). A very complex splicing pattern was found using transcript analyses and seven hypervariable regions were detected in the external CHIR domains. Phylogenetic and genomic analysis showed that CHIR genes evolved mainly by block duplications from an ancestral inhibitory receptor locus, with transformation into activating receptors occurring more than once. Evolutionary selection pressure has led not only to an exceptional expansion of the CHIR cluster but also to a dramatic diversification of CHIR loci and haplotypes. This indicates that CHIRs have the potential to complement the adaptive immune system in fighting pathogens.


Assuntos
Imunoglobulinas/genética , Leucócitos/metabolismo , Processamento Alternativo , Animais , Galinhas , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Evolução Molecular , Biblioteca Gênica , Genoma , Haplótipos , Filogenia , Estrutura Terciária de Proteína , Receptores Imunológicos/metabolismo
12.
PLoS Genet ; 2(1): e9, 2006 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-16440057

RESUMO

The major histocompatibility complex (MHC) is recognised as one of the most important genetic regions in relation to common human disease. Advancement in identification of MHC genes that confer susceptibility to disease requires greater knowledge of sequence variation across the complex. Highly duplicated and polymorphic regions of the human genome such as the MHC are, however, somewhat refractory to some whole-genome analysis methods. To address this issue, we are employing a bacterial artificial chromosome (BAC) cloning strategy to sequence entire MHC haplotypes from consanguineous cell lines as part of the MHC Haplotype Project. Here we present 4.25 Mb of the human haplotype QBL (HLA-A26-B18-Cw5-DR3-DQ2) and compare it with the MHC reference haplotype and with a second haplotype, COX (HLA-A1-B8-Cw7-DR3-DQ2), that shares the same HLA-DRB1, -DQA1, and -DQB1 alleles. We have defined the complete gene, splice variant, and sequence variation contents of all three haplotypes, comprising over 259 annotated loci and over 20,000 single nucleotide polymorphisms (SNPs). Certain coding sequences vary significantly between different haplotypes, making them candidates for functional and disease-association studies. Analysis of the two DR3 haplotypes allowed delineation of the shared sequence between two HLA class II-related haplotypes differing in disease associations and the identification of at least one of the sites that mediated the original recombination event. The levels of variation across the MHC were similar to those seen for other HLA-disparate haplotypes, except for a 158-kb segment that contained the HLA-DRB1, -DQA1, and -DQB1 genes and showed very limited polymorphism compatible with identity-by-descent and relatively recent common ancestry (<3,400 generations). These results indicate that the differential disease associations of these two DR3 haplotypes are due to sequence variation outside this central 158-kb segment, and that shuffling of ancestral blocks via recombination is a potential mechanism whereby certain DR-DQ allelic combinations, which presumably have favoured immunological functions, can spread across haplotypes and populations.


Assuntos
Evolução Molecular , Haplótipos/genética , Complexo Principal de Histocompatibilidade , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Clonagem Molecular , Variação Genética , Antígenos HLA-DR/genética , Humanos , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Análise de Sequência de DNA
13.
BMC Genomics ; 7: 281, 2006 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-17081307

RESUMO

BACKGROUND: The Major Histocompatibility Complex (MHC) is essential for immune function. Historically, it has been subdivided into three regions (Class I, II, and III), but a cluster of functionally related genes within the Class III region has also been referred to as the Class IV region or "inflammatory region". This group of genes is involved in the inflammatory response, and includes members of the tumour necrosis family. Here we report the sequencing, annotation and comparative analysis of a tammar wallaby BAC containing the inflammatory region. We also discuss the extent of sequence conservation across the entire region and identify elements conserved in evolution. RESULTS: Fourteen Class III genes from the tammar wallaby inflammatory region were characterised and compared to their orthologues in other vertebrates. The organisation and sequence of genes in the inflammatory region of both the wallaby and South American opossum are highly conserved compared to known genes from eutherian ("placental") mammals. Some minor differences separate the two marsupial species. Eight genes within the inflammatory region have remained tightly clustered for at least 360 million years, predating the divergence of the amphibian lineage. Analysis of sequence conservation identified 354 elements that are conserved. These range in size from 7 to 431 bases and cover 15.6% of the inflammatory region, representing approximately a 4-fold increase compared to the average for vertebrate genomes. About 5.5% of this conserved sequence is marsupial-specific, including three cases of marsupial-specific repeats. Highly Conserved Elements were also characterised. CONCLUSION: Using comparative analysis, we show that a cluster of MHC genes involved in inflammation, including TNF, LTA (or its putative teleost homolog TNF-N), APOM, and BAT3 have remained together for over 450 million years, predating the divergence of mammals from fish. The observed enrichment in conserved sequences within the inflammatory region suggests conservation at the transcriptional regulatory level, in addition to the functional level.


Assuntos
Evolução Molecular , Macropodidae/genética , Complexo Principal de Histocompatibilidade/genética , Animais , Anuros/genética , Mapeamento Cromossômico/métodos , Sequência Conservada/genética , Bases de Dados Genéticas , Peixes/genética , Humanos , Modelos Genéticos , Dados de Sequência Molecular , Gambás/genética , Filogenia , Análise de Sequência de DNA , Peixe-Zebra/genética
14.
BMC Genomics ; 7: 209, 2006 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-16911775

RESUMO

BACKGROUND: Killer Immunoglobulin-like Receptors (KIR) are essential immuno-surveillance molecules. They are expressed on natural killer and T cells, and interact with human leukocyte antigens. KIR genes are highly polymorphic and contribute vital variability to our immune system. Numerous KIR genes, belonging to five distinct lineages, have been identified in all primates examined thus far and shown to be rapidly evolving. Since few KIR remain orthologous between species, with only one of them, KIR2DL4, shown to be common to human, apes and monkeys, the evolution of the KIR gene family in primates remains unclear. RESULTS: Using comparative analyses, we have identified the ancestral KIR lineage (provisionally named KIR3DL0) in primates. We show KIR3DL0 to be highly conserved with the identification of orthologues in human (Homo sapiens), common chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), rhesus monkey (Macaca mulatta) and common marmoset (Callithrix jacchus). We predict KIR3DL0 to encode a functional molecule in all primates by demonstrating expression in human, chimpanzee and rhesus monkey. Using the rhesus monkey as a model, we further show the expression profile to be typical of KIR by quantitative measurement of KIR3DL0 from an enriched population of natural killer cells. CONCLUSION: One reason why KIR3DL0 may have escaped discovery for so long is that, in human, it maps in between two related leukocyte immunoglobulin-like receptor clusters outside the known KIR gene cluster on Chromosome 19. Based on genomic, cDNA, expression and phylogenetic data, we report a novel lineage of immunoglobulin receptors belonging to the KIR family, which is highly conserved throughout 50 million years of primate evolution.


Assuntos
Primatas/genética , Receptores Imunológicos/genética , Sequência de Aminoácidos , Animais , Células Cultivadas , Mapeamento Cromossômico , Cromossomos Humanos Par 19/genética , DNA Complementar/química , DNA Complementar/genética , Evolução Molecular , Expressão Gênica/genética , Genoma Humano/genética , Gorilla gorilla/genética , Humanos , Células Matadoras Naturais/metabolismo , Macaca mulatta/genética , Dados de Sequência Molecular , Pan troglodytes/genética , Filogenia , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Receptores KIR , Receptores KIR2DL4 , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA , Homologia de Sequência de Aminoácidos
15.
Database (Oxford) ; 2013: bat023, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23603847

RESUMO

It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far we have progressed in this endeavour. Ninety per cent of proteins in the human proteome matched at least one of 5494 manually curated Pfam-A families. In contrast, human residue coverage by Pfam-A families was <45%, with 9418 automatically generated Pfam-B families adding a further 10%. Even after excluding predicted signal peptide regions and short regions (<50 consecutive residues) unlikely to harbour new families, for ∼38% of the human protein residues, there was no information in Pfam about conservation and evolutionary relationship with other protein regions. This uncovered portion of the human proteome was found to be distributed over almost 25 000 distinct protein regions. Comparison with proteins in the UniProtKB database suggested that the human regions that exhibited similarity to thousands of other sequences were often either divergent elements or N- or C-terminal extensions of existing families. Thirty-four per cent of regions, on the other hand, matched fewer than 100 sequences in UniProtKB. Most of these did not appear to share any relationship with existing Pfam-A families, suggesting that thousands of new families would need to be generated to cover them. Also, these latter regions were particularly rich in amino acid compositional bias such as the one associated with intrinsic disorder. This could represent a significant obstacle toward their inclusion into new Pfam families. Based on these observations, a major focus for increasing Pfam coverage of the human proteome will be to improve the definition of existing families. New families will also be built, prioritizing those that have been experimentally functionally characterized. Database URL: http://pfam.sanger.ac.uk/


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Proteoma , Homologia de Sequência de Aminoácidos , Escherichia coli/química , Humanos , Estrutura Terciária de Proteína , Proteínas/química , Saccharomyces cerevisiae/química
16.
Database (Oxford) ; 2013: bat011, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23589541

RESUMO

Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.


Assuntos
Genoma/genética , Gorilla gorilla/genética , Gorilla gorilla/imunologia , Complexo Principal de Histocompatibilidade/genética , Análise de Sequência de DNA , Animais , Sequência de Bases , Mapeamento Cromossômico , Humanos , Família Multigênica/genética , Pan troglodytes/genética
17.
PLoS One ; 7(5): e35575, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22615736

RESUMO

We have identified a new bacterial protein domain that we hypothesise binds to peptidoglycan. This domain is called the YARHG domain after the most highly conserved sequence-segment. The domain is found in the extracellular space and is likely to be composed of four alpha-helices. The domain is found associated with protein kinase domains, suggesting it is associated with signalling in some bacteria. The domain is also found associated with three different families of peptidases. The large number of different domains that are found associated with YARHG suggests that it is a useful functional module that nature has recombined multiple times.


Assuntos
Proteínas de Bactérias/química , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/fisiologia , Dados de Sequência Molecular , Peptidoglicano/metabolismo , Ligação Proteica , Homologia de Sequência de Aminoácidos , Transdução de Sinais
18.
Mol Ecol Resour ; 9(1): 346-9, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21564646

RESUMO

The major histocompatibility complex (MHC) contain genes which play a key role in immune response and mate choice, and are therefore of functional importance to molecular ecologists. Here we describe the design of 10 MHC Class I-associated microsatellite loci from the tammar wallaby. All 10 loci are highly polymorphic, with the expected heterozygosity ranging from 0.547 to 0.919. Six loci successfully cross-amplify in other macropodid species. These microsatellites will serve as useful tools for studying the level of MHC diversity, the impact of selection on genetic variation and the unique structure of the tammar wallaby MHC.

19.
Curr Protoc Bioinformatics ; Chapter 2: 2.5.1-2.5.17, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18819075

RESUMO

Pfam is a database of protein domain families, with each family represented by multiple sequence alignments and profile hidden Markov models (HMMs). In addition, each family has associated annotation, literature references, and links to other databases. The entries in Pfam are available via the World Wide Web and in flatfile format. This unit contains detailed information on how to access and utilize the information present in the Pfam database, namely the families, multiple alignments, and annotation. Details on running Pfam, both remotely and locally are presented.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Bases de Dados de Ácidos Nucleicos , Internet , Cadeias de Markov , Proteínas/análise , Proteínas/química , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Interface Usuário-Computador
20.
BMC Med Genomics ; 1: 19, 2008 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-18513384

RESUMO

BACKGROUND: The major histocompatibility complex (MHC) is essential for human immunity and is highly associated with common diseases, including cancer. While the genetics of the MHC has been studied intensively for many decades, very little is known about the epigenetics of this most polymorphic and disease-associated region of the genome. METHODS: To facilitate comprehensive epigenetic analyses of this region, we have generated a genomic tiling array of 2 Kb resolution covering the entire 4 Mb MHC region. The array has been designed to be compatible with chromatin immunoprecipitation (ChIP), methylated DNA immunoprecipitation (MeDIP), array comparative genomic hybridization (aCGH) and expression profiling, including of non-coding RNAs. The array comprises 7832 features, consisting of two replicates of both forward and reverse strands of MHC amplicons and appropriate controls. RESULTS: Using MeDIP, we demonstrate the application of the MHC array for DNA methylation profiling and the identification of tissue-specific differentially methylated regions (tDMRs). Based on the analysis of two tissues and two cell types, we identified 90 tDMRs within the MHC and describe their characterisation. CONCLUSION: A tiling array covering the MHC region was developed and validated. Its successful application for DNA methylation profiling indicates that this array represents a useful tool for molecular analyses of the MHC in the context of medical genomics.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA