Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
N Engl J Med ; 388(17): 1559-1571, 2023 Apr 27.
Artículo en Inglés | MEDLINE | ID: mdl-37043637

RESUMEN

BACKGROUND: Pediatric disorders include a range of highly penetrant, genetically heterogeneous conditions amenable to genomewide diagnostic approaches. Finding a molecular diagnosis is challenging but can have profound lifelong benefits. METHODS: We conducted a large-scale sequencing study involving more than 13,500 families with probands with severe, probably monogenic, difficult-to-diagnose developmental disorders from 24 regional genetics services in the United Kingdom and Ireland. Standardized phenotypic data were collected, and exome sequencing and microarray analyses were performed to investigate novel genetic causes. We developed an iterative variant analysis pipeline and reported candidate variants to clinical teams for validation and diagnostic interpretation to inform communication with families. Multiple regression analyses were performed to evaluate factors affecting the probability of diagnosis. RESULTS: A total of 13,449 probands were included in the analyses. On average, we reported 1.0 candidate variant per parent-offspring trio and 2.5 variants per singleton proband. Using clinical and computational approaches to variant classification, we made a diagnosis in approximately 41% of probands (5502 of 13,449). Of 3599 probands in trios who received a diagnosis by clinical assertion, approximately 76% had a pathogenic de novo variant. Another 22% of probands (2997 of 13,449) had variants of uncertain significance in genes that were strongly linked to monogenic developmental disorders. Recruitment in a parent-offspring trio had the largest effect on the probability of diagnosis (odds ratio, 4.70; 95% confidence interval [CI], 4.16 to 5.31). Probands were less likely to receive a diagnosis if they were born extremely prematurely (i.e., 22 to 27 weeks' gestation; odds ratio, 0.39; 95% CI, 0.22 to 0.68), had in utero exposure to antiepileptic medications (odds ratio, 0.44; 95% CI, 0.29 to 0.67), had mothers with diabetes (odds ratio, 0.52; 95% CI, 0.41 to 0.67), or were of African ancestry (odds ratio, 0.51; 95% CI, 0.31 to 0.78). CONCLUSIONS: Among probands with severe, probably monogenic, difficult-to-diagnose developmental disorders, multimodal analysis of genomewide data had good diagnostic power, even after previous attempts at diagnosis. (Funded by the Health Innovation Challenge Fund and Wellcome Sanger Institute.).


Asunto(s)
Genómica , Enfermedades Raras , Niño , Humanos , Exoma , Irlanda/epidemiología , Reino Unido/epidemiología , Enfermedades Raras/diagnóstico , Enfermedades Raras/epidemiología , Enfermedades Raras/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Estudios de Asociación Genética , Trastornos del Neurodesarrollo/diagnóstico , Trastornos del Neurodesarrollo/genética , Anomalías Congénitas/diagnóstico , Anomalías Congénitas/genética , Trastornos del Crecimiento/diagnóstico , Trastornos del Crecimiento/genética , Facies , Trastornos de la Conducta Infantil/diagnóstico , Trastornos de la Conducta Infantil/genética , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/genética
2.
Nature ; 586(7831): 757-762, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33057194

RESUMEN

De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent-offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.


Asunto(s)
Análisis Mutacional de ADN , Análisis de Datos , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Atención a la Salud/estadística & datos numéricos , Discapacidades del Desarrollo/genética , Enfermedades Genéticas Congénitas/genética , Estudios de Cohortes , Variaciones en el Número de Copia de ADN/genética , Discapacidades del Desarrollo/diagnóstico , Europa (Continente) , Femenino , Enfermedades Genéticas Congénitas/diagnóstico , Mutación de Línea Germinal/genética , Haploinsuficiencia/genética , Humanos , Masculino , Mutación Missense/genética , Penetrancia , Muerte Perinatal , Tamaño de la Muestra
3.
Am J Hum Genet ; 108(11): 2186-2194, 2021 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-34626536

RESUMEN

Structural variation (SV) describes a broad class of genetic variation greater than 50 bp in size. SVs can cause a wide range of genetic diseases and are prevalent in rare developmental disorders (DDs). Individuals presenting with DDs are often referred for diagnostic testing with chromosomal microarrays (CMAs) to identify large copy-number variants (CNVs) and/or with single-gene, gene-panel, or exome sequencing (ES) to identify single-nucleotide variants, small insertions/deletions, and CNVs. However, individuals with pathogenic SVs undetectable by conventional analysis often remain undiagnosed. Consequently, we have developed the tool InDelible, which interrogates short-read sequencing data for split-read clusters characteristic of SV breakpoints. We applied InDelible to 13,438 probands with severe DDs recruited as part of the Deciphering Developmental Disorders (DDD) study and discovered 63 rare, damaging variants in genes previously associated with DDs missed by standard SNV, indel, or CNV discovery approaches. Clinical review of these 63 variants determined that about half (30/63) were plausibly pathogenic. InDelible was particularly effective at ascertaining variants between 21 and 500 bp in size and increased the total number of potentially pathogenic variants identified by DDD in this size range by 42.9%. Of particular interest were seven confirmed de novo variants in MECP2, which represent 35.0% of all de novo protein-truncating variants in MECP2 among DDD study participants. InDelible provides a framework for the discovery of pathogenic SVs that are most likely missed by standard analytical workflows and has the potential to improve the diagnostic yield of ES across a broad range of genetic diseases.


Asunto(s)
Discapacidades del Desarrollo/diagnóstico , Discapacidades del Desarrollo/genética , Secuenciación del Exoma/métodos , Niño , Femenino , Humanos , Masculino , Proteína 2 de Unión a Metil-CpG/genética
4.
Am J Hum Genet ; 108(6): 1083-1094, 2021 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-34022131

RESUMEN

Clinical genetic testing of protein-coding regions identifies a likely causative variant in only around half of developmental disorder (DD) cases. The contribution of regulatory variation in non-coding regions to rare disease, including DD, remains very poorly understood. We screened 9,858 probands from the Deciphering Developmental Disorders (DDD) study for de novo mutations in the 5' untranslated regions (5' UTRs) of genes within which variants have previously been shown to cause DD through a dominant haploinsufficient mechanism. We identified four single-nucleotide variants and two copy-number variants upstream of MEF2C in a total of ten individual probands. We developed multiple bespoke and orthogonal experimental approaches to demonstrate that these variants cause DD through three distinct loss-of-function mechanisms, disrupting transcription, translation, and/or protein function. These non-coding region variants represent 23% of likely diagnoses identified in MEF2C in the DDD cohort, but these would all be missed in standard clinical genetics approaches. Nonetheless, these variants are readily detectable in exome sequence data, with 30.7% of 5' UTR bases across all genes well covered in the DDD dataset. Our analyses show that non-coding variants upstream of genes within which coding variants are known to cause DD are an important cause of severe disease and demonstrate that analyzing 5' UTRs can increase diagnostic yield. We also show how non-coding variants can help inform both the disease-causing mechanism underlying protein-coding variants and dosage tolerance of the gene.


Asunto(s)
Regiones no Traducidas 5' , Discapacidades del Desarrollo/etiología , Predisposición Genética a la Enfermedad , Mutación con Pérdida de Función , Niño , Estudios de Cohortes , Variaciones en el Número de Copia de ADN , Discapacidades del Desarrollo/patología , Humanos , Factores de Transcripción MEF2/genética , Secuenciación del Exoma
5.
Prenat Diagn ; 42(6): 736-743, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35411553

RESUMEN

OBJECTIVE: To investigate the detection of pathogenic variants using exome sequencing in an international cohort of fetuses with central nervous system (CNS) anomalies. METHODS: We reviewed trio exome sequencing (ES) results for two previously reported unselected cohorts (Prenatal Assessment of Genomes and Exomes (PAGE) and CUIMC) to identify fetuses with CNS anomalies with unremarkable karyotypes and chromosomal microarrays. Variants were classified according to ACMG guidelines and association of pathogenic variants with specific types of CNS anomalies explored. RESULTS: ES was performed in 268 pregnancies with a CNS anomaly identified using prenatal ultrasound. Of those with an isolated, single, CNS anomaly, 7/97 (7.2%) had a likely pathogenic/pathogenic (LP/P) variant. This includes 3/23 (13%) fetuses with isolated mild ventriculomegaly and 3/10 (30%) fetuses with isolated agenesis of the corpus callosum. Where there were multiple anomalies within the CNS, 12/63 (19%) had LP/P variants. Of the 108 cases with CNS and other organ system anomalies, 18 (16.7%) had LP/P findings. CONCLUSION: ES is an important tool in the prenatal evaluation of fetuses with any CNS anomaly. The rate of LP/P variants tends to be highest in fetuses with multiple CNS anomalies and multisystem anomalies, however, ES may also be of benefit for isolated CNS anomalies.


Asunto(s)
Exoma , Malformaciones del Sistema Nervioso , Femenino , Feto/anomalías , Feto/diagnóstico por imagen , Humanos , Malformaciones del Sistema Nervioso/diagnóstico por imagen , Malformaciones del Sistema Nervioso/genética , Embarazo , Diagnóstico Prenatal/métodos , Ultrasonografía Prenatal/métodos , Secuenciación del Exoma/métodos
6.
Genet Med ; 23(3): 571-575, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33149276

RESUMEN

PURPOSE: Automated variant filtering is an essential part of diagnostic genome-wide sequencing but may generate false negative results. We sought to investigate whether some previously identified pathogenic variants may be being routinely excluded by standard variant filtering pipelines. METHODS: We evaluated variants that were previously classified as pathogenic or likely pathogenic in ClinVar in known developmental disorder genes using exome sequence data from the Deciphering Developmental Disorders (DDD) study. RESULTS: Of these ClinVar pathogenic variants, 3.6% were identified among 13,462 DDD probands, and 1134/1352 (83.9%) had already been independently communicated to clinicians using DDD variant filtering pipelines as plausibly pathogenic. The remaining 218 variants failed consequence, inheritance, or other automated variant filters. Following clinical review of these additional variants, we were able to identify 112 variants in 107 (0.8%) DDD probands as potential diagnoses. CONCLUSION: Lower minor allele frequency (<0.0005%) and higher gold star review status in ClinVar (>1 star) are good predictors of a previously identified variant being plausibly diagnostic for developmental disorders. However, around half of previously identified pathogenic variants excluded by automated variant filtering did not appear to be disease-causing, underlining the continued need for clinical evaluation of candidate variants as part of the diagnostic process.


Asunto(s)
Bases de Datos Genéticas , Exoma , Frecuencia de los Genes , Humanos , Secuenciación del Exoma
7.
Lancet ; 393(10173): 747-757, 2019 02 23.
Artículo en Inglés | MEDLINE | ID: mdl-30712880

RESUMEN

BACKGROUND: Fetal structural anomalies, which are detected by ultrasonography, have a range of genetic causes, including chromosomal aneuploidy, copy number variations (CNVs; which are detectable by chromosomal microarrays), and pathogenic sequence variants in developmental genes. Testing for aneuploidy and CNVs is routine during the investigation of fetal structural anomalies, but there is little information on the clinical usefulness of genome-wide next-generation sequencing in the prenatal setting. We therefore aimed to evaluate the proportion of fetuses with structural abnormalities that had identifiable variants in genes associated with developmental disorders when assessed with whole-exome sequencing (WES). METHODS: In this prospective cohort study, two groups in Birmingham and London recruited patients from 34 fetal medicine units in England and Scotland. We used whole-exome sequencing (WES) to evaluate the presence of genetic variants in developmental disorder genes (diagnostic genetic variants) in a cohort of fetuses with structural anomalies and samples from their parents, after exclusion of aneuploidy and large CNVs. Women were eligible for inclusion if they were undergoing invasive testing for identified nuchal translucency or structural anomalies in their fetus, as detected by ultrasound after 11 weeks of gestation. The partners of these women also had to consent to participate. Sequencing results were interpreted with a targeted virtual gene panel for developmental disorders that comprised 1628 genes. Genetic results related to fetal structural anomaly phenotypes were then validated and reported postnatally. The primary endpoint, which was assessed in all fetuses, was the detection of diagnostic genetic variants considered to have caused the fetal developmental anomaly. FINDINGS: The cohort was recruited between Oct 22, 2014, and June 29, 2017, and clinical data were collected until March 31, 2018. After exclusion of fetuses with aneuploidy and CNVs, 610 fetuses with structural anomalies and 1202 matched parental samples (analysed as 596 fetus-parental trios, including two sets of twins, and 14 fetus-parent dyads) were analysed by WES. After bioinformatic filtering and prioritisation according to allele frequency and effect on protein and inheritance pattern, 321 genetic variants (representing 255 potential diagnoses) were selected as potentially pathogenic genetic variants (diagnostic genetic variants), and these variants were reviewed by a multidisciplinary clinical review panel. A diagnostic genetic variant was identified in 52 (8·5%; 95% CI 6·4-11·0) of 610 fetuses assessed and an additional 24 (3·9%) fetuses had a variant of uncertain significance that had potential clinical usefulness. Detection of diagnostic genetic variants enabled us to distinguish between syndromic and non-syndromic fetal anomalies (eg, congenital heart disease only vs a syndrome with congenital heart disease and learning disability). Diagnostic genetic variants were present in 22 (15·4%) of 143 fetuses with multisystem anomalies (ie, more than one fetal structural anomaly), nine (11·1%) of 81 fetuses with cardiac anomalies, and ten (15·4%) of 65 fetuses with skeletal anomalies; these phenotypes were most commonly associated with diagnostic variants. However, diagnostic genetic variants were least common in fetuses with isolated increased nuchal translucency (≥4·0 mm) in the first trimester (in three [3·2%] of 93 fetuses). INTERPRETATION: WES facilitates genetic diagnosis of fetal structural anomalies, which enables more accurate predictions of fetal prognosis and risk of recurrence in future pregnancies. However, the overall detection of diagnostic genetic variants in a prospectively ascertained cohort with a broad range of fetal structural anomalies is lower than that suggested by previous smaller-scale studies of fewer phenotypes. WES improved the identification of genetic disorders in fetuses with structural abnormalities; however, before clinical implementation, careful consideration should be given to case selection to maximise clinical usefulness. FUNDING: UK Department of Health and Social Care and The Wellcome Trust.


Asunto(s)
Cariotipo Anormal/estadística & datos numéricos , Anomalías Congénitas/genética , Secuenciación del Exoma/estadística & datos numéricos , Desarrollo Fetal/genética , Feto/anomalías , Cariotipo Anormal/embriología , Aborto Eugénico/estadística & datos numéricos , Aborto Espontáneo/epidemiología , Anomalías Congénitas/diagnóstico , Anomalías Congénitas/epidemiología , Variaciones en el Número de Copia de ADN/genética , Femenino , Feto/diagnóstico por imagen , Humanos , Recién Nacido , Nacimiento Vivo/epidemiología , Masculino , Medida de Translucencia Nucal , Padres , Muerte Perinatal/etiología , Embarazo , Estudios Prospectivos , Mortinato/epidemiología , Secuenciación del Exoma/métodos
8.
Genet Med ; 21(5): 1065-1073, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30293990

RESUMEN

PURPOSE: To determine the diagnostic yield of combined exome sequencing (ES) and autopsy in fetuses/neonates with prenatally identified structural anomalies resulting in termination of pregnancy, intrauterine, neonatal, or early infant death. METHODS: ES was undertaken in 27 proband/parent trios following full autopsy. Candidate pathogenic variants were classified by a multidisciplinary clinical review panel using American College of Medical Genetics and Genomics (ACMG) guidelines. RESULTS: A genetic diagnosis was established in ten cases (37%). Pathogenic/likely pathogenic variants were identified in nine different genes including four de novo autosomal dominant, three homozygous autosomal recessive, two compound heterozygous autosomal recessive, and one X-linked. KMT2D variants (associated with Kabuki syndrome postnatally) occurred in two cases. Pathogenic variants were identified in 5/13 (38%) cases with multisystem anomalies, in 2/4 (50%) cases with fetal akinesia deformation sequence, and in 1/4 (25%) cases each with cardiac and brain anomalies and hydrops fetalis. No pathogenic variants were detected in fetuses with genitourinary (1), skeletal (1), or abdominal (1) abnormalities. CONCLUSION: This cohort demonstrates the clinical utility of molecular autopsy with ES to identify an underlying genetic cause in structurally abnormal fetuses/neonates. These molecular findings provided parents with an explanation of the developmental abnormality, delineated the recurrence risks, and assisted the management of subsequent pregnancies.


Asunto(s)
Anomalías Congénitas/genética , Enfermedades Fetales/genética , Diagnóstico Prenatal/métodos , Autopsia/métodos , Estudios de Cohortes , Anomalías Congénitas/diagnóstico , Exoma/genética , Femenino , Enfermedades Fetales/diagnóstico , Feto/diagnóstico por imagen , Humanos , Recién Nacido , Masculino , Embarazo , Secuenciación del Exoma/métodos
9.
Nucleic Acids Res ; 44(D1): D279-85, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26673716

RESUMEN

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Proteoma/química , Alineación de Secuencia , Análisis de Secuencia de Proteína , Anotación de Secuencia Molecular
10.
Nucleic Acids Res ; 43(Database issue): D130-7, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25392425

RESUMEN

The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN no Traducido/química , Genómica , Internet , Anotación de Secuencia Molecular , Conformación de Ácido Nucleico , Motivos de Nucleótidos , ARN Largo no Codificante/química , ARN no Traducido/clasificación , Programas Informáticos
11.
Nucleic Acids Res ; 42(Database issue): D222-30, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24288371

RESUMEN

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.


Asunto(s)
Bases de Datos de Proteínas , Alineación de Secuencia , Análisis de Secuencia de Proteína , Internet , Proteínas Intrínsecamente Desordenadas/química , Conformación Proteica , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Proteoma/química , Análisis de Secuencia de ADN
12.
Nucleic Acids Res ; 41(Database issue): D226-32, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23125362

RESUMEN

The Rfam database (available via the website at http://rfam.sanger.ac.uk and through our mirror at http://rfam.janelia.org) is a collection of non-coding RNA families, primarily RNAs with a conserved RNA secondary structure, including both RNA genes and mRNA cis-regulatory elements. Each family is represented by a multiple sequence alignment, predicted secondary structure and covariance model. Here we discuss updates to the database in the latest release, Rfam 11.0, including the introduction of genome-based alignments for large families, the introduction of the Rfam Biomart as well as other user interface improvements. Rfam is available under the Creative Commons Zero license.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN no Traducido/química , ARN no Traducido/clasificación , Secuencia de Bases , Genómica , Internet , Anotación de Secuencia Molecular , Conformación de Ácido Nucleico , ARN no Traducido/genética , Alineación de Secuencia , Interfaz Usuario-Computador
13.
BMC Bioinformatics ; 15: 196, 2014 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-24938123

RESUMEN

BACKGROUND: Gut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain. RESULTS: Genomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded ß/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333. CONCLUSIONS: We suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity.


Asunto(s)
Glicósido Hidrolasas/química , Bacteroides/química , Bacteroides/enzimología , Biocatálisis , Glicósido Hidrolasas/genética , Glicósido Hidrolasas/metabolismo , Modelos Moleculares , Filogenia , Estructura Terciaria de Proteína , Homología Estructural de Proteína
14.
BMC Bioinformatics ; 15: 1, 2014 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-24383880

RESUMEN

BACKGROUND: The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial sequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the Asp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines and aspartic acid within the motif (HEXXHXXGXXD). The Acel_2062 protein was chosen by the Joint Center for Structural Genomics for crystal structure determination to explore novel protein sequence space and structure-based function annotation. RESULTS: The crystal structure confirmed that the Acel_2062 protein consisted of a single, zincin-like metallopeptidase-like domain. The Met-turn, a structural feature thought to be important for a Met-zincin because it stabilizes the active site, is absent, and its stabilizing role may have been conferred to the C-terminal Tyr113. In our crystallographic model there are two molecules in the asymmetric unit and from size-exclusion chromatography, the protein dimerizes in solution. A water molecule is present in the putative zinc-binding site in one monomer, which is replaced by one of two observed conformations of His95 in the other. CONCLUSIONS: The Acel_2062 protein is structurally related to the zincins. It contains the minimum structural features of a member of this protein superfamily, and can be described as a "mini- zincin". There is a striking parallel with the structure of a mini-Glu-zincin, which represents the minimum structure of a Glu-zincin (a metallopeptidase in which the third zinc ligand is a glutamic acid). Rather than being an ancestral state, phylogenetic analysis suggests that the mini-zincins are derived from larger proteins.


Asunto(s)
Proteínas Bacterianas/química , Metaloproteasas/química , Zinc/química , Actinomycetales/química , Actinomycetales/enzimología , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Proteínas Bacterianas/metabolismo , Dimerización , Metaloproteasas/metabolismo , Modelos Moleculares , Datos de Secuencia Molecular , Filogenia , Subunidades de Proteína , Alineación de Secuencia , Zinc/metabolismo
15.
BMC Bioinformatics ; 15: 112, 2014 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-24742328

RESUMEN

BACKGROUND: Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. RESULTS: BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. CONCLUSIONS: Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.


Asunto(s)
Proteínas Bacterianas/química , Glicósido Hidrolasas/química , Secuencia de Aminoácidos , Proteínas Bacterianas/genética , Bacteroides/enzimología , Biología Computacional , Tracto Gastrointestinal/microbiología , Genómica , Glicósido Hidrolasas/genética , Humanos , Estructura Terciaria de Proteína
16.
Nucleic Acids Res ; 40(Database issue): D290-301, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22127870

RESUMEN

Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Enciclopedias como Asunto , Internet , Estructura Terciaria de Proteína , Homología de Secuencia de Aminoácido
17.
Nucleic Acids Res ; 40(Database issue): D565-70, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22123736

RESUMEN

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Vocabulario Controlado , Anotación de Secuencia Molecular/normas
18.
Sci Rep ; 14(1): 8708, 2024 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622173

RESUMEN

Recent work has revealed an important role for rare, incompletely penetrant inherited coding variants in neurodevelopmental disorders (NDDs). Additionally, we have previously shown that common variants contribute to risk for rare NDDs. Here, we investigate whether common variants exert their effects by modifying gene expression, using multi-cis-expression quantitative trait loci (cis-eQTL) prediction models. We first performed a transcriptome-wide association study for NDDs using 6987 probands from the Deciphering Developmental Disorders (DDD) study and 9720 controls, and found one gene, RAB2A, that passed multiple testing correction (p = 6.7 × 10-7). We then investigated whether cis-eQTLs modify the penetrance of putatively damaging, rare coding variants inherited by NDD probands from their unaffected parents in a set of 1700 trios. We found no evidence that unaffected parents transmitting putatively damaging coding variants had higher genetically-predicted expression of the variant-harboring gene than their child. In probands carrying putatively damaging variants in constrained genes, the genetically-predicted expression of these genes in blood was lower than in controls (p = 2.7 × 10-3). However, results for proband-control comparisons were inconsistent across different sets of genes, variant filters and tissues. We find limited evidence that common cis-eQTLs modify penetrance of rare coding variants in a large cohort of NDD probands.


Asunto(s)
Trastornos del Neurodesarrollo , Polimorfismo de Nucleótido Simple , Niño , Humanos , Penetrancia , Sitios de Carácter Cuantitativo/genética , Trastornos del Neurodesarrollo/genética , Transcriptoma
19.
BMC Bioinformatics ; 14: 327, 2013 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-24246060

RESUMEN

BACKGROUND: The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-catalytic versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from analysis of the proteins and organisms in which they are found. RESULTS: Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_3511 from Bacteroides thetaiotaomicron (strain VPI-5482) [PDB:3KZT], Cj0202c from Campylobacter jejuni subsp. jejuni serotype O:2 (strain NCTC 11168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) [PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis enables us to present a hypothesis concerning their role. CONCLUSIONS: Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons).


Asunto(s)
Proteínas Bacterianas/química , Proteínas de Transporte Nucleocitoplasmático/química , Mapeo Peptídico/métodos , Bacteroides/química , Campylobacter jejuni/química , Dominio Catalítico , Cristalografía por Rayos X , Ligandos , Pliegue de Proteína , Multimerización de Proteína , Estructura Terciaria de Proteína , Ruminococcus/química
20.
BMC Bioinformatics ; 14: 265, 2013 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-24004689

RESUMEN

BACKGROUND: Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology. RESULTS: We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria. CONCLUSIONS: Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.


Asunto(s)
Proteínas Bacterianas/química , Bases de Datos de Proteínas , Legionella pneumophila/química , Secuencia de Aminoácidos , Proteínas Bacterianas/genética , Legionella pneumophila/genética , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , Alineación de Secuencia , Análisis de Secuencia de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA