Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
N Engl J Med ; 388(17): 1559-1571, 2023 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-37043637

RESUMO

BACKGROUND: Pediatric disorders include a range of highly penetrant, genetically heterogeneous conditions amenable to genomewide diagnostic approaches. Finding a molecular diagnosis is challenging but can have profound lifelong benefits. METHODS: We conducted a large-scale sequencing study involving more than 13,500 families with probands with severe, probably monogenic, difficult-to-diagnose developmental disorders from 24 regional genetics services in the United Kingdom and Ireland. Standardized phenotypic data were collected, and exome sequencing and microarray analyses were performed to investigate novel genetic causes. We developed an iterative variant analysis pipeline and reported candidate variants to clinical teams for validation and diagnostic interpretation to inform communication with families. Multiple regression analyses were performed to evaluate factors affecting the probability of diagnosis. RESULTS: A total of 13,449 probands were included in the analyses. On average, we reported 1.0 candidate variant per parent-offspring trio and 2.5 variants per singleton proband. Using clinical and computational approaches to variant classification, we made a diagnosis in approximately 41% of probands (5502 of 13,449). Of 3599 probands in trios who received a diagnosis by clinical assertion, approximately 76% had a pathogenic de novo variant. Another 22% of probands (2997 of 13,449) had variants of uncertain significance in genes that were strongly linked to monogenic developmental disorders. Recruitment in a parent-offspring trio had the largest effect on the probability of diagnosis (odds ratio, 4.70; 95% confidence interval [CI], 4.16 to 5.31). Probands were less likely to receive a diagnosis if they were born extremely prematurely (i.e., 22 to 27 weeks' gestation; odds ratio, 0.39; 95% CI, 0.22 to 0.68), had in utero exposure to antiepileptic medications (odds ratio, 0.44; 95% CI, 0.29 to 0.67), had mothers with diabetes (odds ratio, 0.52; 95% CI, 0.41 to 0.67), or were of African ancestry (odds ratio, 0.51; 95% CI, 0.31 to 0.78). CONCLUSIONS: Among probands with severe, probably monogenic, difficult-to-diagnose developmental disorders, multimodal analysis of genomewide data had good diagnostic power, even after previous attempts at diagnosis. (Funded by the Health Innovation Challenge Fund and Wellcome Sanger Institute.).


Assuntos
Genômica , Doenças Raras , Criança , Humanos , Exoma , Irlanda/epidemiologia , Reino Unido/epidemiologia , Doenças Raras/diagnóstico , Doenças Raras/epidemiologia , Doenças Raras/genética , Análise de Sequência com Séries de Oligonucleotídeos , Estudos de Associação Genética , Transtornos do Neurodesenvolvimento/diagnóstico , Transtornos do Neurodesenvolvimento/genética , Anormalidades Congênitas/diagnóstico , Anormalidades Congênitas/genética , Transtornos do Crescimento/diagnóstico , Transtornos do Crescimento/genética , Fácies , Transtornos do Comportamento Infantil/diagnóstico , Transtornos do Comportamento Infantil/genética , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética
2.
Nature ; 586(7831): 757-762, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33057194

RESUMO

De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent-offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.


Assuntos
Análise Mutacional de DNA , Análise de Dados , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Atenção à Saúde/estatística & dados numéricos , Deficiências do Desenvolvimento/genética , Doenças Genéticas Inatas/genética , Estudos de Coortes , Variações do Número de Cópias de DNA/genética , Deficiências do Desenvolvimento/diagnóstico , Europa (Continente) , Feminino , Doenças Genéticas Inatas/diagnóstico , Mutação em Linhagem Germinativa/genética , Haploinsuficiência/genética , Humanos , Masculino , Mutação de Sentido Incorreto/genética , Penetrância , Morte Perinatal , Tamanho da Amostra
3.
Am J Hum Genet ; 108(11): 2186-2194, 2021 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-34626536

RESUMO

Structural variation (SV) describes a broad class of genetic variation greater than 50 bp in size. SVs can cause a wide range of genetic diseases and are prevalent in rare developmental disorders (DDs). Individuals presenting with DDs are often referred for diagnostic testing with chromosomal microarrays (CMAs) to identify large copy-number variants (CNVs) and/or with single-gene, gene-panel, or exome sequencing (ES) to identify single-nucleotide variants, small insertions/deletions, and CNVs. However, individuals with pathogenic SVs undetectable by conventional analysis often remain undiagnosed. Consequently, we have developed the tool InDelible, which interrogates short-read sequencing data for split-read clusters characteristic of SV breakpoints. We applied InDelible to 13,438 probands with severe DDs recruited as part of the Deciphering Developmental Disorders (DDD) study and discovered 63 rare, damaging variants in genes previously associated with DDs missed by standard SNV, indel, or CNV discovery approaches. Clinical review of these 63 variants determined that about half (30/63) were plausibly pathogenic. InDelible was particularly effective at ascertaining variants between 21 and 500 bp in size and increased the total number of potentially pathogenic variants identified by DDD in this size range by 42.9%. Of particular interest were seven confirmed de novo variants in MECP2, which represent 35.0% of all de novo protein-truncating variants in MECP2 among DDD study participants. InDelible provides a framework for the discovery of pathogenic SVs that are most likely missed by standard analytical workflows and has the potential to improve the diagnostic yield of ES across a broad range of genetic diseases.


Assuntos
Deficiências do Desenvolvimento/diagnóstico , Deficiências do Desenvolvimento/genética , Sequenciamento do Exoma/métodos , Criança , Feminino , Humanos , Masculino , Proteína 2 de Ligação a Metil-CpG/genética
4.
Am J Hum Genet ; 108(6): 1083-1094, 2021 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-34022131

RESUMO

Clinical genetic testing of protein-coding regions identifies a likely causative variant in only around half of developmental disorder (DD) cases. The contribution of regulatory variation in non-coding regions to rare disease, including DD, remains very poorly understood. We screened 9,858 probands from the Deciphering Developmental Disorders (DDD) study for de novo mutations in the 5' untranslated regions (5' UTRs) of genes within which variants have previously been shown to cause DD through a dominant haploinsufficient mechanism. We identified four single-nucleotide variants and two copy-number variants upstream of MEF2C in a total of ten individual probands. We developed multiple bespoke and orthogonal experimental approaches to demonstrate that these variants cause DD through three distinct loss-of-function mechanisms, disrupting transcription, translation, and/or protein function. These non-coding region variants represent 23% of likely diagnoses identified in MEF2C in the DDD cohort, but these would all be missed in standard clinical genetics approaches. Nonetheless, these variants are readily detectable in exome sequence data, with 30.7% of 5' UTR bases across all genes well covered in the DDD dataset. Our analyses show that non-coding variants upstream of genes within which coding variants are known to cause DD are an important cause of severe disease and demonstrate that analyzing 5' UTRs can increase diagnostic yield. We also show how non-coding variants can help inform both the disease-causing mechanism underlying protein-coding variants and dosage tolerance of the gene.


Assuntos
Regiões 5' não Traduzidas , Deficiências do Desenvolvimento/etiologia , Predisposição Genética para Doença , Mutação com Perda de Função , Criança , Estudos de Coortes , Variações do Número de Cópias de DNA , Deficiências do Desenvolvimento/patologia , Humanos , Fatores de Transcrição MEF2/genética , Sequenciamento do Exoma
5.
Genet Med ; 23(3): 571-575, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33149276

RESUMO

PURPOSE: Automated variant filtering is an essential part of diagnostic genome-wide sequencing but may generate false negative results. We sought to investigate whether some previously identified pathogenic variants may be being routinely excluded by standard variant filtering pipelines. METHODS: We evaluated variants that were previously classified as pathogenic or likely pathogenic in ClinVar in known developmental disorder genes using exome sequence data from the Deciphering Developmental Disorders (DDD) study. RESULTS: Of these ClinVar pathogenic variants, 3.6% were identified among 13,462 DDD probands, and 1134/1352 (83.9%) had already been independently communicated to clinicians using DDD variant filtering pipelines as plausibly pathogenic. The remaining 218 variants failed consequence, inheritance, or other automated variant filters. Following clinical review of these additional variants, we were able to identify 112 variants in 107 (0.8%) DDD probands as potential diagnoses. CONCLUSION: Lower minor allele frequency (<0.0005%) and higher gold star review status in ClinVar (>1 star) are good predictors of a previously identified variant being plausibly diagnostic for developmental disorders. However, around half of previously identified pathogenic variants excluded by automated variant filtering did not appear to be disease-causing, underlining the continued need for clinical evaluation of candidate variants as part of the diagnostic process.


Assuntos
Bases de Dados Genéticas , Exoma , Frequência do Gene , Humanos , Sequenciamento do Exoma
6.
Lancet ; 393(10173): 747-757, 2019 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-30712880

RESUMO

BACKGROUND: Fetal structural anomalies, which are detected by ultrasonography, have a range of genetic causes, including chromosomal aneuploidy, copy number variations (CNVs; which are detectable by chromosomal microarrays), and pathogenic sequence variants in developmental genes. Testing for aneuploidy and CNVs is routine during the investigation of fetal structural anomalies, but there is little information on the clinical usefulness of genome-wide next-generation sequencing in the prenatal setting. We therefore aimed to evaluate the proportion of fetuses with structural abnormalities that had identifiable variants in genes associated with developmental disorders when assessed with whole-exome sequencing (WES). METHODS: In this prospective cohort study, two groups in Birmingham and London recruited patients from 34 fetal medicine units in England and Scotland. We used whole-exome sequencing (WES) to evaluate the presence of genetic variants in developmental disorder genes (diagnostic genetic variants) in a cohort of fetuses with structural anomalies and samples from their parents, after exclusion of aneuploidy and large CNVs. Women were eligible for inclusion if they were undergoing invasive testing for identified nuchal translucency or structural anomalies in their fetus, as detected by ultrasound after 11 weeks of gestation. The partners of these women also had to consent to participate. Sequencing results were interpreted with a targeted virtual gene panel for developmental disorders that comprised 1628 genes. Genetic results related to fetal structural anomaly phenotypes were then validated and reported postnatally. The primary endpoint, which was assessed in all fetuses, was the detection of diagnostic genetic variants considered to have caused the fetal developmental anomaly. FINDINGS: The cohort was recruited between Oct 22, 2014, and June 29, 2017, and clinical data were collected until March 31, 2018. After exclusion of fetuses with aneuploidy and CNVs, 610 fetuses with structural anomalies and 1202 matched parental samples (analysed as 596 fetus-parental trios, including two sets of twins, and 14 fetus-parent dyads) were analysed by WES. After bioinformatic filtering and prioritisation according to allele frequency and effect on protein and inheritance pattern, 321 genetic variants (representing 255 potential diagnoses) were selected as potentially pathogenic genetic variants (diagnostic genetic variants), and these variants were reviewed by a multidisciplinary clinical review panel. A diagnostic genetic variant was identified in 52 (8·5%; 95% CI 6·4-11·0) of 610 fetuses assessed and an additional 24 (3·9%) fetuses had a variant of uncertain significance that had potential clinical usefulness. Detection of diagnostic genetic variants enabled us to distinguish between syndromic and non-syndromic fetal anomalies (eg, congenital heart disease only vs a syndrome with congenital heart disease and learning disability). Diagnostic genetic variants were present in 22 (15·4%) of 143 fetuses with multisystem anomalies (ie, more than one fetal structural anomaly), nine (11·1%) of 81 fetuses with cardiac anomalies, and ten (15·4%) of 65 fetuses with skeletal anomalies; these phenotypes were most commonly associated with diagnostic variants. However, diagnostic genetic variants were least common in fetuses with isolated increased nuchal translucency (≥4·0 mm) in the first trimester (in three [3·2%] of 93 fetuses). INTERPRETATION: WES facilitates genetic diagnosis of fetal structural anomalies, which enables more accurate predictions of fetal prognosis and risk of recurrence in future pregnancies. However, the overall detection of diagnostic genetic variants in a prospectively ascertained cohort with a broad range of fetal structural anomalies is lower than that suggested by previous smaller-scale studies of fewer phenotypes. WES improved the identification of genetic disorders in fetuses with structural abnormalities; however, before clinical implementation, careful consideration should be given to case selection to maximise clinical usefulness. FUNDING: UK Department of Health and Social Care and The Wellcome Trust.


Assuntos
Cariótipo Anormal/estatística & dados numéricos , Anormalidades Congênitas/genética , Sequenciamento do Exoma/estatística & dados numéricos , Desenvolvimento Fetal/genética , Feto/anormalidades , Cariótipo Anormal/embriologia , Aborto Eugênico/estatística & dados numéricos , Aborto Espontâneo/epidemiologia , Anormalidades Congênitas/diagnóstico , Anormalidades Congênitas/epidemiologia , Variações do Número de Cópias de DNA/genética , Feminino , Feto/diagnóstico por imagem , Humanos , Recém-Nascido , Nascido Vivo/epidemiologia , Masculino , Medição da Translucência Nucal , Pais , Morte Perinatal/etiologia , Gravidez , Estudos Prospectivos , Natimorto/epidemiologia , Sequenciamento do Exoma/métodos
7.
Genet Med ; 21(5): 1065-1073, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30293990

RESUMO

PURPOSE: To determine the diagnostic yield of combined exome sequencing (ES) and autopsy in fetuses/neonates with prenatally identified structural anomalies resulting in termination of pregnancy, intrauterine, neonatal, or early infant death. METHODS: ES was undertaken in 27 proband/parent trios following full autopsy. Candidate pathogenic variants were classified by a multidisciplinary clinical review panel using American College of Medical Genetics and Genomics (ACMG) guidelines. RESULTS: A genetic diagnosis was established in ten cases (37%). Pathogenic/likely pathogenic variants were identified in nine different genes including four de novo autosomal dominant, three homozygous autosomal recessive, two compound heterozygous autosomal recessive, and one X-linked. KMT2D variants (associated with Kabuki syndrome postnatally) occurred in two cases. Pathogenic variants were identified in 5/13 (38%) cases with multisystem anomalies, in 2/4 (50%) cases with fetal akinesia deformation sequence, and in 1/4 (25%) cases each with cardiac and brain anomalies and hydrops fetalis. No pathogenic variants were detected in fetuses with genitourinary (1), skeletal (1), or abdominal (1) abnormalities. CONCLUSION: This cohort demonstrates the clinical utility of molecular autopsy with ES to identify an underlying genetic cause in structurally abnormal fetuses/neonates. These molecular findings provided parents with an explanation of the developmental abnormality, delineated the recurrence risks, and assisted the management of subsequent pregnancies.


Assuntos
Anormalidades Congênitas/genética , Doenças Fetais/genética , Diagnóstico Pré-Natal/métodos , Autopsia/métodos , Estudos de Coortes , Anormalidades Congênitas/diagnóstico , Exoma/genética , Feminino , Doenças Fetais/diagnóstico , Feto/diagnóstico por imagem , Humanos , Recém-Nascido , Masculino , Gravidez , Sequenciamento do Exoma/métodos
8.
Nucleic Acids Res ; 44(D1): D279-85, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26673716

RESUMO

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Proteoma/química , Alinhamento de Sequência , Análise de Sequência de Proteína , Anotação de Sequência Molecular
9.
Nucleic Acids Res ; 43(Database issue): D130-7, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25392425

RESUMO

The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , Genômica , Internet , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , RNA Longo não Codificante/química , RNA não Traduzido/classificação , Software
10.
Nucleic Acids Res ; 42(Database issue): D222-30, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24288371

RESUMO

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.


Assuntos
Bases de Dados de Proteínas , Alinhamento de Sequência , Análise de Sequência de Proteína , Internet , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Proteínas/química , Proteínas/classificação , Proteínas/genética , Proteoma/química , Análise de Sequência de DNA
11.
BMC Bioinformatics ; 15: 1, 2014 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-24383880

RESUMO

BACKGROUND: The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial sequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the Asp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines and aspartic acid within the motif (HEXXHXXGXXD). The Acel_2062 protein was chosen by the Joint Center for Structural Genomics for crystal structure determination to explore novel protein sequence space and structure-based function annotation. RESULTS: The crystal structure confirmed that the Acel_2062 protein consisted of a single, zincin-like metallopeptidase-like domain. The Met-turn, a structural feature thought to be important for a Met-zincin because it stabilizes the active site, is absent, and its stabilizing role may have been conferred to the C-terminal Tyr113. In our crystallographic model there are two molecules in the asymmetric unit and from size-exclusion chromatography, the protein dimerizes in solution. A water molecule is present in the putative zinc-binding site in one monomer, which is replaced by one of two observed conformations of His95 in the other. CONCLUSIONS: The Acel_2062 protein is structurally related to the zincins. It contains the minimum structural features of a member of this protein superfamily, and can be described as a "mini- zincin". There is a striking parallel with the structure of a mini-Glu-zincin, which represents the minimum structure of a Glu-zincin (a metallopeptidase in which the third zinc ligand is a glutamic acid). Rather than being an ancestral state, phylogenetic analysis suggests that the mini-zincins are derived from larger proteins.


Assuntos
Proteínas de Bactérias/química , Metaloproteases/química , Zinco/química , Actinomycetales/química , Actinomycetales/enzimologia , Motivos de Aminoácidos , Sequência de Aminoácidos , Proteínas de Bactérias/metabolismo , Dimerização , Metaloproteases/metabolismo , Modelos Moleculares , Dados de Sequência Molecular , Filogenia , Subunidades Proteicas , Alinhamento de Sequência , Zinco/metabolismo
12.
BMC Bioinformatics ; 15: 196, 2014 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-24938123

RESUMO

BACKGROUND: Gut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain. RESULTS: Genomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded ß/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333. CONCLUSIONS: We suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity.


Assuntos
Glicosídeo Hidrolases/química , Bacteroides/química , Bacteroides/enzimologia , Biocatálise , Glicosídeo Hidrolases/genética , Glicosídeo Hidrolases/metabolismo , Modelos Moleculares , Filogenia , Estrutura Terciária de Proteína , Homologia Estrutural de Proteína
13.
BMC Bioinformatics ; 15: 112, 2014 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-24742328

RESUMO

BACKGROUND: Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. RESULTS: BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. CONCLUSIONS: Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.


Assuntos
Proteínas de Bactérias/química , Glicosídeo Hidrolases/química , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Bacteroides/enzimologia , Biologia Computacional , Trato Gastrointestinal/microbiologia , Genômica , Glicosídeo Hidrolases/genética , Humanos , Estrutura Terciária de Proteína
14.
Nucleic Acids Res ; 40(Database issue): D290-301, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22127870

RESUMO

Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Enciclopédias como Assunto , Internet , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos
15.
Sci Rep ; 14(1): 8708, 2024 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622173

RESUMO

Recent work has revealed an important role for rare, incompletely penetrant inherited coding variants in neurodevelopmental disorders (NDDs). Additionally, we have previously shown that common variants contribute to risk for rare NDDs. Here, we investigate whether common variants exert their effects by modifying gene expression, using multi-cis-expression quantitative trait loci (cis-eQTL) prediction models. We first performed a transcriptome-wide association study for NDDs using 6987 probands from the Deciphering Developmental Disorders (DDD) study and 9720 controls, and found one gene, RAB2A, that passed multiple testing correction (p = 6.7 × 10-7). We then investigated whether cis-eQTLs modify the penetrance of putatively damaging, rare coding variants inherited by NDD probands from their unaffected parents in a set of 1700 trios. We found no evidence that unaffected parents transmitting putatively damaging coding variants had higher genetically-predicted expression of the variant-harboring gene than their child. In probands carrying putatively damaging variants in constrained genes, the genetically-predicted expression of these genes in blood was lower than in controls (p = 2.7 × 10-3). However, results for proband-control comparisons were inconsistent across different sets of genes, variant filters and tissues. We find limited evidence that common cis-eQTLs modify penetrance of rare coding variants in a large cohort of NDD probands.


Assuntos
Transtornos do Neurodesenvolvimento , Polimorfismo de Nucleotídeo Único , Criança , Humanos , Penetrância , Locos de Características Quantitativas/genética , Transtornos do Neurodesenvolvimento/genética , Transcriptoma
16.
Nat Genet ; 56(10): 2046-2053, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39313616

RESUMO

Autosomal recessive coding variants are well-known causes of rare disorders. We quantified the contribution of these variants to developmental disorders in a large, ancestrally diverse cohort comprising 29,745 trios, of whom 20.4% had genetically inferred non-European ancestries. The estimated fraction of patients attributable to exome-wide autosomal recessive coding variants ranged from ~2-19% across genetically inferred ancestry groups and was significantly correlated with average autozygosity. Established autosomal recessive developmental disorder-associated (ARDD) genes explained 84.0% of the total autosomal recessive coding burden, and 34.4% of the burden in these established genes was explained by variants not already reported as pathogenic in ClinVar. Statistical analyses identified two novel ARDD genes: KBTBD2 and ZDHHC16. This study expands our understanding of the genetic architecture of developmental disorders across diverse genetically inferred ancestry groups and suggests that improving strategies for interpreting missense variants in known ARDD genes may help diagnose more patients than discovering the remaining genes.


Assuntos
Deficiências do Desenvolvimento , Genes Recessivos , Humanos , Deficiências do Desenvolvimento/genética , Feminino , Masculino , Exoma/genética , Predisposição Genética para Doença , Variação Genética , Aciltransferases/genética , Estudos de Coortes , Mutação de Sentido Incorreto
17.
BMC Bioinformatics ; 14: 327, 2013 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-24246060

RESUMO

BACKGROUND: The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-catalytic versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from analysis of the proteins and organisms in which they are found. RESULTS: Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_3511 from Bacteroides thetaiotaomicron (strain VPI-5482) [PDB:3KZT], Cj0202c from Campylobacter jejuni subsp. jejuni serotype O:2 (strain NCTC 11168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) [PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis enables us to present a hypothesis concerning their role. CONCLUSIONS: Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons).


Assuntos
Proteínas de Bactérias/química , Proteínas de Transporte Nucleocitoplasmático/química , Mapeamento de Peptídeos/métodos , Bacteroides/química , Campylobacter jejuni/química , Domínio Catalítico , Cristalografia por Raios X , Ligantes , Dobramento de Proteína , Multimerização Proteica , Estrutura Terciária de Proteína , Ruminococcus/química
18.
BMC Bioinformatics ; 14: 341, 2013 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-24274019

RESUMO

BACKGROUND: A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family. RESULTS: JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome. CONCLUSIONS: We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.


Assuntos
Proteínas de Bactérias/metabolismo , Deinococcus/química , Deinococcus/metabolismo , Ácido Láctico/metabolismo , Sequência de Aminoácidos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Cristalografia por Raios X , Deinococcus/genética , Humanos , Microbiota/efeitos da radiação , Dados de Sequência Molecular , Estrutura Terciária de Proteína
19.
BMC Bioinformatics ; 14: 265, 2013 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-24004689

RESUMO

BACKGROUND: Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology. RESULTS: We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria. CONCLUSIONS: Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.


Assuntos
Proteínas de Bactérias/química , Bases de Dados de Proteínas , Legionella pneumophila/química , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Legionella pneumophila/genética , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína
20.
Genet Med Open ; 1(1): 100836, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-39346101

RESUMO

Purpose: Structural mosaicism has been previously implicated in developmental disorders. We aimed to identify rare mosaic chromosomal alterations (MCAs) in probands with severe undiagnosed developmental disorders. Methods: We identified MCAs in genotyping array data from 12,530 probands in the Deciphering Developmental Disorders study using mosaic chromosome alterations caller (MoChA). Results: We found 61 MCAs in 57 probands, many of these were tissue specific. In 23 of 26 (88.5%) cases for which the MCA was detected in saliva in which blood was also available for analysis, the MCA could not be detected in blood. The MCAs included 20 polysomies, comprising either 1 arm of a chromosome or a whole chromosome, for which we were able to show the timing of the error (25% mitosis, 40% meiosis I, and 35% meiosis II). Only 2 of 57 (3.5%) of the probands in whom we found MCAs had another likely genetic diagnosis identified by exome sequencing, despite an overall diagnostic yield of ∼40% across the cohort. Conclusion: Our results show that identification of MCAs provides candidate diagnoses for previously undiagnosed patients with developmental disorders, potentially explaining ∼0.45% of cases in the Deciphering Developmental Disorders study. Nearly 90% of these MCAs would have remained undetected by analyzing DNA from blood and no other tissue.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa