RESUMO
BACKGROUND: Approximately 4-8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS: This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS: Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.
Assuntos
Creatina , Doenças Raras , Humanos , Doenças Raras/genética , Íntrons , Algoritmos , NucleotídeosRESUMO
In hemophilia A and B, analysis of the F8 and F9 gene variants enables carrier and prenatal diagnosis and prediction of risk for the development of inhibitors. The PedNet Registry collects clinical, genetic, and phenotypic data prospectively on more than 2000 children with hemophilia. The genetic reports of F8/F9 gene variants were classified uniformly to Human Genome Variation Society nomenclature and reevaluated using international population- and disease-specific databases, literature survey and, where applicable, computational predictive programs. We report 88 novel variants in the F8 and F9 genes, 80 fulfilling criteria for Class 5 (pathogenic), six for Class 4 (likely pathogenic) and two fulfilling criteria for Class 3 (variant of unknown significance) of the American College of Medical Genetics and Genomics/Association for Molecular Pathologyguidelines together with information on the respective phenotype and inhibitor formation. The study highlights the need to reevaluate and update earlier genetic reports in hemophilia both locally but also in variant databases in light of changed nomenclature and new guidelines.
Assuntos
Fator IX/genética , Fator VIII/genética , Variação Genética , Guias como Assunto , Hemofilia A/diagnóstico , Hemofilia A/genética , Hemofilia B/diagnóstico , Hemofilia B/genética , Diagnóstico Pré-Natal , Sistema de Registros , Sociedades Científicas , Processamento Alternativo/genética , Feminino , Predisposição Genética para Doença , Humanos , Mutação de Sentido Incorreto/genética , Fenótipo , GravidezRESUMO
The last decade has proven that amyotrophic lateral sclerosis (ALS) is clinically and genetically heterogeneous, and that the genetic component in sporadic cases might be stronger than expected. This study investigates 1,200 patients to revisit ALS in the ethnically heterogeneous yet inbred Turkish population. Familial ALS (fALS) accounts for 20% of our cases. The rates of consanguinity are 30% in fALS and 23% in sporadic ALS (sALS). Major ALS genes explained the disease cause in only 35% of fALS, as compared with ~70% in Europe and North America. Whole exome sequencing resulted in a discovery rate of 42% (53/127). Whole genome analyses in 623 sALS cases and 142 population controls, sequenced within Project MinE, revealed well-established fALS gene variants, solidifying the concept of incomplete penetrance in ALS. Genome-wide association studies (GWAS) with whole genome sequencing data did not indicate a new risk locus. Coupling GWAS with a coexpression network of disease-associated candidates, points to a significant enrichment for cell cycle- and division-related genes. Within this network, literature text-mining highlights DECR1, ATL1, HDAC2, GEMIN4, and HNRNPA3 as important genes. Finally, information on ALS-related gene variants in the Turkish cohort sequenced within Project MinE was compiled in the GeNDAL variant browser (www.gendal.org).
Assuntos
Esclerose Lateral Amiotrófica/genética , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Internet , Fenótipo , Turquia , Sequenciamento Completo do GenomaRESUMO
The domestic dog serves as an excellent model to investigate the genetic basis of disease. More than 400 heritable traits analogous to human diseases have been described in dogs. To further canine medical genetics research, we established the Dog Biomedical Variant Database Consortium (DBVDC) and present a comprehensive list of functionally annotated genome variants that were identified with whole genome sequencing of 582 dogs from 126 breeds and eight wolves. The genomes used in the study have a minimum coverage of 10× and an average coverage of ~24×. In total, we identified 23 133 692 single-nucleotide variants (SNVs) and 10 048 038 short indels, including 93% undescribed variants. On average, each individual dog genome carried â¼4.1 million single-nucleotide and ~1.4 million short-indel variants with respect to the reference genome assembly. About 2% of the variants were located in coding regions of annotated genes and loci. Variant effect classification showed 247 141 SNVs and 99 562 short indels having moderate or high impact on 11 267 protein-coding genes. On average, each genome contained heterozygous loss-of-function variants in 30 potentially embryonic lethal genes and 97 genes associated with developmental disorders. More than 50 inherited disorders and traits have been unravelled using the DBVDC variant catalogue, enabling genetic testing for breeding and diagnostics. This resource of annotated variants and their corresponding genotype frequencies constitutes a highly useful tool for the identification of potential variants causative for rare inherited disorders in dogs.
Assuntos
Cães/genética , Sequenciamento Completo do Genoma , Lobos/genética , Animais , Modelos Animais de Doenças , Genes Letais , FilogeniaRESUMO
BACKGROUND: In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. RESULTS: Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. CONCLUSIONS: We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided.
Assuntos
Bases de Dados Genéticas , Estudos de Associação Genética , Alelos , Anormalidades Congênitas/genética , Anormalidades Congênitas/patologia , Frequência do Gene , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Brazilians are highly admixed with ancestry from Europe, Africa, America, and Asia and yet still underrepresented in genomic databanks. We hereby present a collection of exomic variants from 609 elderly Brazilians in a census-based cohort (SABE609) with comprehensive phenotyping. Variants were deposited in ABraOM (Online Archive of Brazilian Mutations), a Web-based public database. Population representative phenotype and genotype repositories are essential for variant interpretation through allele frequency filtering; since elderly individuals are less likely to harbor pathogenic mutations for early- and adult-onset diseases, such variant databases are of great interest. Among the over 2.3 million variants from the present cohort, 1,282,008 were high-confidence calls. Importantly, 207,621 variants were absent from major public databases. We found 9,791 potential loss-of-function variants with about 300 mutations per individual. Pathogenic variants on clinically relevant genes (ACMG) were observed in 1.15% of the individuals and were correlated with clinical phenotype. We conducted incidence estimation for prevalent recessive disorders based upon heterozygous frequency and concluded that it relies on appropriate pathogenicity assertion. These observations illustrate the relevance of collecting demographic data from diverse, poorly characterized populations. Census-based datasets of aged individuals with comprehensive phenotyping are an invaluable resource toward the improved understanding of variant pathogenicity.
Assuntos
Exoma , Genética Populacional , Idoso , Idoso de 80 Anos ou mais , Envelhecimento , Alelos , Brasil , Estudos de Coortes , Biologia Computacional , Bases de Dados Genéticas , Etnicidade , Feminino , Frequência do Gene , Variação Genética , Genótipo , Heterozigoto , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Mutação , FenótipoRESUMO
Serotonin type 3 (5-HT3 ) receptors are ligand-gated ion channels formed by five subunits (5-HT3A-E), which are encoded by the HTR3A, HTR3B, HTR3C, HTR3D, and HTR3E genes. Functional receptors are pentameric complexes of diverse composition. Different receptor subtypes confer a predisposition to nausea and vomiting during chemotherapy, pregnancy, and following surgery. In addition, different subtypes contribute to neurogastroenterologic disorders such irritable bowel syndrome (IBS) and eating disorders as well as comorbid psychiatric conditions. 5-HT3 receptor antagonists are established treatments for emesis and IBS and are beneficial in the treatment of psychiatric diseases. Several case-control and pharmacogenetic studies have demonstrated an association between HTR3 variants and psychiatric and neurogastroenterologic phenotypes. Recently, their potential as predictors of nausea and vomiting and treatment of psychiatric disorders became evident. This information is now available in the serotonin receptor 3 HTR3 gene allelic variant database (www.htr3.uni-hd.de), which contains five sub-databases, one for each of the five different serotonin receptor genes HTR3A-E. Information on HTR3 variants, their functional relevance, associated phenotypes, and pharmacogenetic data such as drug response and side effects are available. This central information pool should help clinicians as well as scientists to evaluate their findings and to use the relevant information for subsequent genotype-phenotype correlation studies and pharmacogenetic approaches.
Assuntos
Alelos , Bases de Dados de Ácidos Nucleicos , Variação Genética , Receptores 5-HT3 de Serotonina/genética , Biologia Computacional/métodos , Gastroenteropatias/tratamento farmacológico , Gastroenteropatias/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Humanos , Transtornos Mentais/tratamento farmacológico , Transtornos Mentais/genética , Farmacogenética/métodos , Ferramenta de Busca , Agonistas do Receptor 5-HT3 de Serotonina/farmacologia , Agonistas do Receptor 5-HT3 de Serotonina/uso terapêutico , Antagonistas do Receptor 5-HT3 de Serotonina/farmacologia , Antagonistas do Receptor 5-HT3 de Serotonina/uso terapêutico , Resultado do Tratamento , Interface Usuário-ComputadorRESUMO
A recent review identified 60 common inherited renal diseases caused by DNA variants in 132 different genes. These diseases can be diagnosed with DNA sequencing, but each gene probably also has a thousand normal variants. Many more normal variants have been characterised by individual laboratories than are reported in the literature or found in publicly accessible collections. At present, testing laboratories must assess each novel change they identify for pathogenicity, even when this has been done elsewhere previously, and the distinction between normal and disease-associated variants is particularly an issue with the recent surge in exomic sequencing and gene discovery projects. The Human Variome Project recommends the establishment of gene-specific DNA variant databases to facilitate the sharing of DNA variants and decisions about likely disease causation. Databases improve diagnostic accuracy and testing efficiency, and reduce costs. They also help with genotype-phenotype correlations and predictive algorithms. The Human Variome Project advocates databases that use standardised descriptions, are up-to-date, include clinical information and are freely available. Currently, the genes affected in the most common inherited renal diseases correspond to 350 different variant databases, many of which are incomplete or have insufficient clinical details for genotype-phenotype correlations. Assistance is needed from nephrologists to maximise the usefulness of these databases for the diagnosis and management of inherited renal disease.
Assuntos
Bases de Dados de Ácidos Nucleicos/normas , Nefropatias/genética , Predisposição Genética para Doença/genética , Humanos , MutaçãoRESUMO
BACKGROUND: Population-specific variation database of inborn errors of metabolism (IEMs) is essential for precise genetic diagnosis and disease prevention. Here we presented a systematic review of clinically relevant variants of 13 IEMs genes reported among Chinese patients. METHODS: A systematic search of the following electronic databases for 13 IEMs genes was conducted: PubMed-NCBI, China national knowledge infrastructure and Wanfang databases. Patient data was extracted from articles eligible for inclusion and recorded in Excel electronic form using a case-by-case approach. RESULTS: A total of 218 articles, 93 published in English and 125 in Chinese, were retrieved. After variant annotation and deduplication, 575 unique patients (241 from articles published in Chinese) were included in the population-specific variation database. Patients identified by newborn screening and symptomatic presentation were 231 (40.17%) and 344 (59.83%), respectively. Biallelic variants were observed in 525/575 (91.3%). Among the 581 unique variants identified, 83 (14.28%) were described ≥ 3 times and 97 (16.69%) were not recorded in Clinvar or HGMD. Four variants were reclassified as benign and dozens of confusing variants deserved further research. CONCLUSION: This review provides a unique resource of the well-characterized diseases and causative variants that have accumulated in Chinese population and is a preliminary attempt to build the Chinese genetic variation database of IEMs.
Assuntos
População do Leste Asiático , Erros Inatos do Metabolismo , Humanos , Recém-Nascido , China , Variação GenéticaRESUMO
We developed a highly contiguous chromosome-level reference genome for North American bison to provide a platform to evaluate the conservation, ecological, evolutionary, and population genomics of this species. Generated from a F1 hybrid between a North American bison dam and a domestic cattle bull, completeness and contiguity exceed that of other published bison genome assemblies. To demonstrate the utility for genome-wide variant frequency estimation, we compiled a genomic variant database consisting of 3 true albino bison and 44 wild-type pelage color bison. Through the examination of genomic variants fixed in the albino cohort and absent in the controls, we identified a nonsynonymous single nucleotide polymorphism (SNP) mutation on chromosome 29 in exon 3 of the tyrosinase gene (c.1114C>T). A TaqMan SNP Genotyping Assay was developed to genotype this SNP in a total of 283 animals across 29 herds. This assay confirmed the absence of homozygous variants in all animals except 7 true albino bison included in this study. In addition, the only heterozygous animals identified were 2 wild-type pelage color dams of albino offspring. Therefore, we propose that this new high-quality bison genome assembly and incipient variant database provides a highly robust and informative resource for genomics investigations for this iconic North American species.
Assuntos
Bison , Animais , Bovinos , Bison/genética , Genoma , Cromossomos , Mutação , América do NorteRESUMO
Hereditary xanthinuria is a rare autosomal recessive disease caused by missense and loss of function variants in the xanthine dehydrogenase (XDH) or molybdenum cofactor sulfurase (MOCOS) genes. The aim of this study was to uncover variants underlying risk for xanthinuria in dogs. Affected dogs included two Manchester Terriers, three Cavalier King Charles Spaniels, an English Cocker Spaniel, a Dachshund, and a mixed-breed dog. Four putative causal variants were discovered: an XDH c.654G > A splice site variant that results in skipping of exon 8 (mixed-breed dog), a MOCOS c.232G > T splice site variant that results in skipping of exon 2 (Manchester Terriers), a MOCOS p.Leu46Pro missense variant (Dachshund), and a MOCOS p.Ala128Glyfs*30 frameshift variant that results in a premature stop codon (Cavalier King Charles Spaniels and English Cocker Spaniel). The two splice site variants suggest that the regions skipped are critical to the respective enzyme function, though protein misfolding is an alternative theory for loss of function. The MOCOS p.Leu46Pro variant has not been previously reported in human or other animal cases and provides novel data supporting this residue as critical to MOCOS function. All variants were present in the homozygous state in affected dogs, indicating an autosomal recessive mode of inheritance. Allele frequencies of these variants in breed-specific populations ranged from 0 to 0.18. In conclusion, multiple diverse variants appear to be responsible for hereditary xanthinuria in dogs.
RESUMO
Aims: Genomic studies play a major role in variant observations between and within populations and in identifying causal relationships between genotypes and phenotypes. Analyses using databases such as gnomAD can provide insight into the frequencies of alleles in large populations. There have been reports that detail such frequencies for several countries and ethnic groups, but as yet, there are no such datasets for the Czech population. Patients and Methods: Whole-exome sequencing (WES) data from 222 individuals from the Czech Republic were analyzed by The Genome Analysis Toolkit best practices pipeline. These data were annotated with the ANNOVAR tool, and the allele frequencies were computed. Results: We developed a database that contains 300,111 variants in 17,512 genes. It is accessible through a simple web query available at prot2hg.com/variantbrowser. Gene-based analyses identified those genes that are most tolerant to variants in our population. Second, allele frequencies in our population were compared to the gnomAD database and groups of variants frequent in our population, but ultra-rare in gnomAD as a whole were identified. Conclusion: This tool should be useful for detecting local variants in the Czech population of patients with neurogenetic diseases.
Assuntos
Bases de Dados Genéticas , Doenças do Sistema Nervoso/genética , Doenças Neurodegenerativas/genética , Adulto , Alelos , República Tcheca , Feminino , Frequência do Gene/genética , Variação Genética/genética , Genômica/métodos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Masculino , Fenótipo , Análise de Sequência de DNA/métodos , Sequenciamento do Exoma/métodosRESUMO
Atypical hemolytic uremic syndrome (aHUS) and C3 glomerulopathy (C3G) are associated with loss of regulation of the alternative pathway of complement and its resulting overactivation. As rare diseases, genetic variants leading to aHUS and C3G were previously analysed in relatively low patient numbers. To improve this analysis, data were pooled from six centres. Totals of 610 rare variants for aHUS and 82 for C3G were presented in an interactive database for 13 genes. Using allele frequency comparisons with the Exome Aggregation Consortium as a reference genome, the patients with aHUS showed significantly more protein-altering ultrarare variants (allele frequency <0.01%) in five genes CFH, CFI, CD46, C3, and DGKE. In patients with C3G, the corresponding association was only found for C3 and CFH. Protein structure analyses of these five proteins showed distinct differences in the positioning of these variants in C3 and FH. For aHUS, variants were clustered at the C-terminus of FH and implicated changes in the binding of FH to host cell surfaces. For C3G, variants were clustered at the N-terminal C3b binding site of FH and implicated changes in the fluid-phase regulation of C3b. We discuss the utility of the Web database as a patient resource for clinicians.
Assuntos
Síndrome Hemolítico-Urêmica Atípica , Complemento C3/genética , Síndrome Hemolítico-Urêmica Atípica/genética , Síndrome Hemolítico-Urêmica Atípica/imunologia , Síndrome Hemolítico-Urêmica Atípica/fisiopatologia , Frequência do Gene , Predisposição Genética para Doença , Doenças da Deficiência Hereditária de Complemento/diagnóstico , Doenças da Deficiência Hereditária de Complemento/imunologia , Humanos , MutaçãoRESUMO
Recent years have seen a boom in the application of the next-generation sequencing technology to the study of human disorders, including Autism Spectrum Disorder (ASD), where the focus has been on identifying rare, possibly causative genomic variants in ASD individuals. Because of the high genetic heterogeneity of ASD, a large number of subjects is needed to establish evidence for a variant or gene ASD-association, thus aggregating data across cohorts and studies is necessary. However, methodological inconsistencies and subject overlap across studies complicate data aggregation. Here we present VariCarta, a web-based database developed to address these challenges by collecting, reconciling, and consistently cataloging literature-derived genomic variants found in ASD subjects using ongoing semi-manual curation. The careful manual curation combined with a robust data import pipeline rectifies errors, converts variants into a standardized format, identifies and harmonizes cohort overlaps, and documents data provenance. The harmonization aspect is especially important since it prevents the potential double counting of variants, which can lead to inflation of gene-based evidence for ASD-association. The database currently contains 170,416 variant events from 10,893 subjects, collected across 61 publications, and reconciles 16,202 variants that have been reported in literature multiple times. VariCarta is freely accessible at http://varicarta.msl.ubc.ca. Autism Res 2019, 12: 1728-1736. © 2019 International Society for Autism Research, Wiley Periodicals, Inc. LAY SUMMARY: The search for genetic factors underlying Autism Spectrum Disorder (ASD) yielded numerous studies reporting potentially causative genomic variants found in ASD individuals. However, methodological differences and subject overlap across studies complicate the assembly of these data, diminishing its utility and accessibility. We developed VariCarta, a web-based database that aggregates carefully curated, annotated, and harmonized literature-derived variants identified in individuals with ASD using ongoing semi-manual curation.
Assuntos
Transtorno do Espectro Autista/genética , Bases de Dados Genéticas/estatística & dados numéricos , Feminino , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , MasculinoRESUMO
BACKGROUND: The ability to discover genetic variants in a patient runs far ahead of the ability to interpret them. Databases with accurate descriptions of the causal relationship between the variants and the phenotype are valuable since these are critical tools in clinical genetic diagnostics. Here, we introduce a comprehensive and global genotype-phenotype database focusing on rare diseases. METHODS: This database (CentoMD ®) is a browser-based tool that enables access to a comprehensive, independently curated system utilizing stringent high-quality criteria and a quickly growing repository of genetic and human phenotype ontology (HPO)-based clinical information. Its main goals are to aid the evaluation of genetic variants, to enhance the validity of the genetic analytical workflow, to increase the quality of genetic diagnoses, and to improve evaluation of treatment options for patients with hereditary diseases. The database software correlates clinical information from consented patients and probands of different geographical backgrounds with a large dataset of genetic variants and, when available, biomarker information. An automated follow-up tool is incorporated that informs all users whenever a variant classification has changed. These unique features fully embedded in a CLIA/CAP-accredited quality management system allow appropriate data quality and enhanced patient safety. RESULTS: More than 100,000 genetically screened individuals are documented in the database, resulting in more than 470 million variant detections. Approximately, 57% of the clinically relevant and uncertain variants in the database are novel. Notably, 3% of the genetic variants identified and previously reported in the literature as being associated with a particular rare disease were reclassified, based on internal evidence, as clinically irrelevant. CONCLUSIONS: The database offers a comprehensive summary of the clinical validity and causality of detected gene variants with their associated phenotypes, and is a valuable tool for identifying new disease genes through the correlation of novel genetic variants with specific, well-defined phenotypes.
RESUMO
Amelogenesis imperfecta (AI) is the name given to a heterogeneous group of conditions characterized by inherited developmental enamel defects. AI enamel is abnormally thin, soft, fragile, pitted and/or badly discolored, with poor function and aesthetics, causing patients problems such as early tooth loss, severe embarrassment, eating difficulties, and pain. It was first described separately from diseases of dentine nearly 80 years ago, but the underlying genetic and mechanistic basis of the condition is only now coming to light. Mutations in the gene AMELX, encoding an extracellular matrix protein secreted by ameloblasts during enamel formation, were first identified as a cause of AI in 1991. Since then, mutations in at least eighteen genes have been shown to cause AI presenting in isolation of other health problems, with many more implicated in syndromic AI. Some of the encoded proteins have well documented roles in amelogenesis, acting as enamel matrix proteins or the proteases that degrade them, cell adhesion molecules or regulators of calcium homeostasis. However, for others, function is less clear and further research is needed to understand the pathways and processes essential for the development of healthy enamel. Here, we review the genes and mutations underlying AI presenting in isolation of other health problems, the proteins they encode and knowledge of their roles in amelogenesis, combining evidence from human phenotypes, inheritance patterns, mouse models, and in vitro studies. An LOVD resource (http://dna2.leeds.ac.uk/LOVD/) containing all published gene mutations for AI presenting in isolation of other health problems is described. We use this resource to identify trends in the genes and mutations reported to cause AI in the 270 families for which molecular diagnoses have been reported by 23rd May 2017. Finally we discuss the potential value of the translation of AI genetics to clinical care with improved patient pathways and speculate on the possibility of novel treatments and prevention strategies for AI.
RESUMO
Protein families evolve functional variation by accumulating point mutations at functionally important amino acid positions. Homologs in the LacI/GalR family of transcription regulators have evolved to bind diverse DNA sequences and allosteric regulatory molecules. In addition to playing key roles in bacterial metabolism, these proteins have been widely used as a model family for benchmarking structural and functional prediction algorithms. We have collected manually curated sequence alignments for >3000 sequences, in vivo phenotypic and biochemical data for >5750 LacI/GalR mutational variants, and noncovalent residue contact networks for 65 LacI/GalR homolog structures. Using this rich data resource, we compared the noncovalent residue contact networks of the LacI/GalR subfamilies to design and experimentally validate an allosteric mutant of a synthetic LacI/GalR repressor for use in biotechnology. The AlloRep database (freely available at www.AlloRep.org) is a key resource for future evolutionary studies of LacI/GalR homologs and for benchmarking computational predictions of functional change.