Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 182(2): 463-480.e30, 2020 07 23.
Article in English | MEDLINE | ID: mdl-32533916

ABSTRACT

Although base editors are widely used to install targeted point mutations, the factors that determine base editing outcomes are not well understood. We characterized sequence-activity relationships of 11 cytosine and adenine base editors (CBEs and ABEs) on 38,538 genomically integrated targets in mammalian cells and used the resulting outcomes to train BE-Hive, a machine learning model that accurately predicts base editing genotypic outcomes (R ≈ 0.9) and efficiency (R ≈ 0.7). We corrected 3,388 disease-associated SNVs with ≥90% precision, including 675 alleles with bystander nucleotides that BE-Hive correctly predicted would not be edited. We discovered determinants of previously unpredictable C-to-G, or C-to-A editing and used these discoveries to correct coding sequences of 174 pathogenic transversion SNVs with ≥90% precision. Finally, we used insights from BE-Hive to engineer novel CBE variants that modulate editing outcomes. These discoveries illuminate base editing, enable editing at previously intractable targets, and provide new base editors with improved editing capabilities.


Subject(s)
Gene Editing/methods , Machine Learning , Animals , Gene Library , Humans , Mice , Mouse Embryonic Stem Cells/cytology , Mouse Embryonic Stem Cells/metabolism , Point Mutation , RNA, Guide, Kinetoplastida/metabolism
2.
Am J Hum Genet ; 110(6): 940-949, 2023 06 01.
Article in English | MEDLINE | ID: mdl-37236177

ABSTRACT

While pathogenic variants can significantly increase disease risk, it is still challenging to estimate the clinical impact of rare missense variants more generally. Even in genes such as BRCA2 or PALB2, large cohort studies find no significant association between breast cancer and rare missense variants collectively. Here, we introduce REGatta, a method to estimate clinical risk from variants in smaller segments of individual genes. We first define these regions by using the density of pathogenic diagnostic reports and then calculate the relative risk in each region by using over 200,000 exome sequences in the UK Biobank. We apply this method in 13 genes with established roles across several monogenic disorders. In genes with no significant difference at the gene level, this approach significantly separates disease risk for individuals with rare missense variants at higher or lower risk (BRCA2 regional model OR = 1.46 [1.12, 1.79], p = 0.0036 vs. BRCA2 gene model OR = 0.96 [0.85, 1.07] p = 0.4171). We find high concordance between these regional risk estimates and high-throughput functional assays of variant impact. We compare our method with existing methods and the use of protein domains (Pfam) as regions and find REGatta better identifies individuals at elevated or reduced risk. These regions provide useful priors and are potentially useful for improving risk assessment for genes associated with monogenic diseases.


Subject(s)
Breast Neoplasms , Genetic Predisposition to Disease , Humans , Female , BRCA2 Protein/genetics , Mutation, Missense , Sequence Analysis, DNA , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Cohort Studies
3.
Nature ; 567(7746): E1-E2, 2019 03.
Article in English | MEDLINE | ID: mdl-30765887

ABSTRACT

In this Article, a data processing error affected Fig. 3e and Extended Data Table 2; these errors have been corrected online.

4.
Nature ; 563(7733): 646-651, 2018 11.
Article in English | MEDLINE | ID: mdl-30405244

ABSTRACT

Following Cas9 cleavage, DNA repair without a donor template is generally considered stochastic, heterogeneous and impractical beyond gene disruption. Here, we show that template-free Cas9 editing is predictable and capable of precise repair to a predicted genotype, enabling correction of disease-associated mutations in humans. We constructed a library of 2,000 Cas9 guide RNAs paired with DNA target sites and trained inDelphi, a machine learning model that predicts genotypes and frequencies of 1- to 60-base-pair deletions and 1-base-pair insertions with high accuracy (r = 0.87) in five human and mouse cell lines. inDelphi predicts that 5-11% of Cas9 guide RNAs targeting the human genome are 'precise-50', yielding a single genotype comprising greater than or equal to 50% of all major editing products. We experimentally confirmed precise-50 insertions and deletions in 195 human disease-relevant alleles, including correction in primary patient-derived fibroblasts of pathogenic alleles to wild-type genotype for Hermansky-Pudlak syndrome and Menkes disease. This study establishes an approach for precise, template-free genome editing.


Subject(s)
CRISPR-Cas Systems/genetics , Gene Editing/methods , Gene Editing/standards , Hermanski-Pudlak Syndrome/genetics , Machine Learning , Menkes Kinky Hair Syndrome/genetics , Templates, Genetic , Alleles , Base Sequence , CRISPR-Associated Protein 9/metabolism , DNA Repair/genetics , Fibroblasts/metabolism , Fibroblasts/pathology , HCT116 Cells , HEK293 Cells , Hermanski-Pudlak Syndrome/pathology , Humans , K562 Cells , Menkes Kinky Hair Syndrome/pathology , Reproducibility of Results , Substrate Specificity
5.
Genet Med ; 25(1): 16-26, 2023 01.
Article in English | MEDLINE | ID: mdl-36305854

ABSTRACT

PURPOSE: This study aimed to explore whether evidence of pathogenicity from prior variant classifications in ClinVar could be used to inform variant interpretation using the American College of Medical Genetics and Genomics/Association for Molecular Pathology clinical guidelines. METHODS: We identified distinct single-nucleotide variants (SNVs) that are either similar in location or in functional consequence to pathogenic variants in ClinVar and analyzed evidence in support of pathogenicity using 3 interpretation criteria. RESULTS: Thousands of variants, including many in clinically actionable disease genes (American College of Medical Genetics and Genomics secondary findings v3.0), have evidence of pathogenicity from existing variant classifications, accounting for 2.5% of nonsynonymous SNVs within ClinVar. Notably, there are many variants with uncertain or conflicting classifications that cause the same amino acid substitution as other pathogenic variants (PS1, N = 323), variants that are predicted to cause different amino acid substitutions in the same codon as pathogenic variants (PM5, N = 7692), and loss-of-function variants that are present in genes in which many loss-of-function variants are classified as pathogenic (PVS1, N = 3635). Most of these variants have similar computational predictions of pathogenicity and splicing effect as their associated pathogenic variants. CONCLUSION: Broadly, for >1.4 million SNVs exome wide, information from previously classified variants could be used to provide evidence of pathogenicity. We have developed a pipeline to identify variants meeting these criteria that may inform interpretation efforts.


Subject(s)
Genetic Testing , Genomics , Humans , Exome , RNA Splicing , Pathology, Molecular , Genetic Variation/genetics
6.
PLoS Comput Biol ; 17(1): e1008605, 2021 01.
Article in English | MEDLINE | ID: mdl-33417623

ABSTRACT

Restoring gene function by the induced skipping of deleterious exons has been shown to be effective for treating genetic disorders. However, many of the clinically successful therapies for exon skipping are transient oligonucleotide-based treatments that require frequent dosing. CRISPR-Cas9 based genome editing that causes exon skipping is a promising therapeutic modality that may offer permanent alleviation of genetic disease. We show that machine learning can select Cas9 guide RNAs that disrupt splice acceptors and cause the skipping of targeted exons. We experimentally measured the exon skipping frequencies of a diverse genome-integrated library of 791 splice sequences targeted by 1,063 guide RNAs in mouse embryonic stem cells. We found that our method, SkipGuide, is able to identify effective guide RNAs with a precision of 0.68 (50% threshold predicted exon skipping frequency) and 0.93 (70% threshold predicted exon skipping frequency). We anticipate that SkipGuide will be useful for selecting guide RNA candidates for evaluation of CRISPR-Cas9-mediated exon skipping therapy.


Subject(s)
CRISPR-Cas Systems/genetics , Gene Editing/methods , Genetic Therapy/methods , Machine Learning , RNA, Guide, Kinetoplastida/genetics , Animals , Cells, Cultured , Embryonic Stem Cells , Exons , Gene Library , Humans , Mice
7.
Nature ; 524(7564): 225-9, 2015 Aug 13.
Article in English | MEDLINE | ID: mdl-26123021

ABSTRACT

Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.


Subject(s)
Disease/genetics , Genomics , Mutation, Missense/genetics , Suppression, Genetic/genetics , Adaptor Proteins, Signal Transducing/genetics , Alleles , Animals , Evolution, Molecular , Genome, Human/genetics , Humans , Immediate-Early Proteins/genetics , Microcephaly/genetics , Microtubule-Associated Proteins , Phenotype , Proteins/genetics , Sequence Alignment , Tumor Suppressor Proteins/genetics
8.
Genet Med ; 20(9): 936-941, 2018 09.
Article in English | MEDLINE | ID: mdl-29388949

ABSTRACT

PURPOSE: Over 150,000 variants have been reported to cause Mendelian disease in the medical literature. It is still difficult to leverage this knowledge base in clinical practice, as many reports lack strong statistical evidence or may include false associations. Clinical laboratories assess whether these variants (along with newly observed variants that are adjacent to these published ones) underlie clinical disorders. METHODS: We investigated whether citation data-including journal impact factor and the number of cited variants (NCV) in each gene with published disease associations-can be used to improve variant assessment. RESULTS: Surprisingly, we found that impact factor is not predictive of pathogenicity, but the NCV score for each gene can provide statistical support for prediction of pathogenicity. When this gene-level citation metric is combined with variant-level evolutionary conservation and structural features, classification accuracy reaches 89.5%. Further, variants identified in clinical exome sequencing cases have higher NCVs than do simulated rare variants from the Exome Aggregation Consortium database within the same set of genes and functional consequences (P < 2.22 × 10-16). CONCLUSION: Aggregate citation data can complement existing variant-based predictive algorithms, and can boost their performance without the need to access and review large numbers of papers. The NCV is a slow-growing metric of scientific knowledge about each gene's association with disease.


Subject(s)
Computational Biology/methods , Genome-Wide Association Study/methods , Algorithms , Databases, Genetic , Forecasting , Genetic Variation , Humans , Journal Impact Factor
9.
PLoS Genet ; 11(8): e1005436, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26317225

ABSTRACT

Population bottlenecks followed by re-expansions have been common throughout history of many populations. The response of alleles under selection to such demographic perturbations has been a subject of great interest in population genetics. On the basis of theoretical analysis and computer simulations, we suggest that this response qualitatively depends on dominance. The number of dominant or additive deleterious alleles per haploid genome is expected to be slightly increased following the bottleneck and re-expansion. In contrast, the number of completely or partially recessive alleles should be sharply reduced. Changes of population size expose differences between recessive and additive selection, potentially providing insight into the prevalence of dominance in natural populations. Specifically, we use a simple statistic, [Formula: see text], where xi represents the derived allele frequency, to compare the number of mutations in different populations, and detail its functional dependence on the strength of selection and the intensity of the population bottleneck. We also provide empirical evidence showing that gene sets associated with autosomal recessive disease in humans may have a BR indicative of recessive selection. Together, these theoretical predictions and empirical observations show that complex demographic history may facilitate rather than impede inference of parameters of natural selection.


Subject(s)
Gene Frequency/genetics , Genes, Dominant/genetics , Genetics, Population/statistics & numerical data , Population Dynamics/statistics & numerical data , Animals , Biological Evolution , Computer Simulation , Humans , Models, Genetic , Models, Statistical , Selection, Genetic
10.
Hum Mutat ; 36(10): 998-1003, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26378430

ABSTRACT

Clinical sequencing is expanding, but causal variants are still not identified in the majority of cases. These unsolved cases can aid in gene discovery when individuals with similar phenotypes are identified in systems such as the Matchmaker Exchange. We describe risks for gene discovery in this growing set of unsolved cases. In a set of rare disease cases with the same phenotype, it is not difficult to find two individuals with the same phenotype that carry variants in the same gene. We quantify the risk of false-positive association in a cohort of individuals with the same phenotype, using the prior probability of observing a variant in each gene from over 60,000 individuals (Exome Aggregation Consortium). Based on the number of individuals with a genic variant, cohort size, specific gene, and mode of inheritance, we calculate a P value that the match represents a true association. A match in two of 10 patients in MECP2 is statistically significant (P = 0.0014), whereas a match in TTN would not reach significance, as expected (P > 0.999). Finally, we analyze the probability of matching in clinical exome cases to estimate the number of cases needed to identify genes related to different disorders. We offer Rare Disease Match, an online tool to mitigate the uncertainty of false-positive associations.


Subject(s)
Computational Biology/methods , Genetic Association Studies/methods , Rare Diseases/genetics , Algorithms , Databases, Genetic , Exome , False Positive Reactions , Genetic Variation , Humans , Phenotype , Web Browser
11.
Genome Res ; 22(3): 421-8, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22147367

ABSTRACT

There is an emerging consensus that when investigators obtain genomic data from research participants, they may incur an ethical responsibility to inform at-risk individuals about clinically significant variants discovered during the course of their research. With whole-exome sequencing becoming commonplace and the falling costs of full-genome sequencing, there will be an increasingly large number of variants identified in research participants that may be of sufficient clinical relevance to share. An explicit approach to triaging and communicating these results has yet to be developed, and even the magnitude of the task is uncertain. To develop an estimate of the number of variants that might qualify for disclosure, we apply recently published recommendations for the return of results to a defined and representative set of variants and then extrapolate these estimates to genome scale. We find that the total number of variants meeting the threshold for recommended disclosure ranges from 3955-12,579 (3.79%-12.06%, 95% CI) in the most conservative estimate to 6998-17,189 (6.69%-16.48%, 95% CI) in an estimate including variants with variable disease expressivity. Additionally, if the growth rate from the previous 4 yr continues, we estimate that the total number of disease-associated variants will grow 37% over the next 4 yr.


Subject(s)
Disclosure/ethics , Genetic Privacy/ethics , Disclosure/legislation & jurisprudence , Ethics, Research , Genetic Counseling/ethics , Genetic Counseling/legislation & jurisprudence , Genetic Privacy/legislation & jurisprudence , Genetic Research/ethics , Genetic Research/legislation & jurisprudence , Genetic Variation , Genome-Wide Association Study , Humans , United States
12.
Nat Genet ; 56(5): 925-937, 2024 May.
Article in English | MEDLINE | ID: mdl-38658794

ABSTRACT

CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.


Subject(s)
CRISPR-Cas Systems , Gene Editing , Genotype , Phenotype , RNA, Guide, CRISPR-Cas Systems , Humans , Gene Editing/methods , RNA, Guide, CRISPR-Cas Systems/genetics , Bayes Theorem , Receptors, LDL/genetics , HEK293 Cells
13.
Hum Mutat ; 34(9): 1216-20, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23818451

ABSTRACT

It is now affordable to order clinically interpreted whole-genome sequence reports from clinical laboratories. One major component of these reports is derived from the knowledge base of previously identified pathogenic variants, including research articles, locus-specific, and other databases. While over 150,000 such pathogenic variants have been identified, many of these were originally discovered in small cohort studies of affected individuals, so their applicability to asymptomatic populations is unclear. We analyzed the prevalence of a large set of pathogenic variants from the medical and scientific literature in a large set of asymptomatic individuals (N = 1,092) and found 8.5% of these pathogenic variants in at least one individual. In the average individual in the 1000 Genomes Project, previously identified pathogenic variants occur on average 294 times (σ = 25.5) in homozygous form and 942 times (σ = 68.2) in heterozygous form. We also find that many of these pathogenic variants are frequently occurring: there are 3,744 variants with minor allele frequency (MAF) ≥ 0.01 (4.6%) and 2,837 variants with MAF ≥ 0.05 (3.5%). This indicates that many of these variants may be erroneous findings or have lower penetrance than previously expected.


Subject(s)
Gene Frequency , Genetic Variation , Sequence Analysis, DNA , Databases, Genetic , Genome, Human , Genotype , Heterozygote , Homozygote , Humans , Incidental Findings , Penetrance
14.
medRxiv ; 2023 Jan 09.
Article in English | MEDLINE | ID: mdl-36711752

ABSTRACT

While pathogenic variants significantly increase disease risk in many genes, it is still challenging to estimate the clinical impact of rare missense variants more generally. Even in genes such as BRCA2 or PALB2 , large cohort studies find no significant association between breast cancer and rare germline missense variants collectively. Here we introduce REGatta, a method to improve the estimation of clinical risk in gene segments. We define gene regions using the density of pathogenic diagnostic reports, and then calculate the relative risk in each of these regions using 109,581 exome sequences from women in the UK Biobank. We apply this method in seven established breast cancer genes, and identify regions in each gene with statistically significant differences in breast cancer incidence for rare missense carriers. Even in genes with no significant difference at the gene level, this approach significantly separates rare missense variant carriers at higher or lower risk ( BRCA2 regional model OR=1.46 [1.12, 1.79], p=0.0036 vs. BRCA2 gene model OR=0.96 [0.85,1.07] p=0.4171). We find high concordance between these regional risk estimates and high-throughput functional assays of variant impact. We compare with existing methods and the use of protein domains (Pfam) as regions, and find REGatta better identifies individuals at elevated or reduced risk. These regions provide useful priors which can potentially be used to improve risk assessment and clinical management.

15.
medRxiv ; 2023 Jan 07.
Article in English | MEDLINE | ID: mdl-36711907

ABSTRACT

Deep mutational scanning assays enable the functional assessment of variants in high throughput. Phenotypic measurements from these assays are broadly concordant with clinical outcomes but are prone to noise at the individual variant level. We develop a framework to exploit related measurements within and across experimental assays to jointly estimate variant impact. Drawing from a large corpus of deep mutational scanning data, we collectively estimate the mean functional effect per AA residue position within each gene, normalize observed functional effects by substitution type, and make estimates for individual allelic variants with a pipeline called FUSE (Functional Substitution Estimation). FUSE improves the correlation of functional screening datasets covering the same variants, better separates estimated functional impacts for known pathogenic and benign variants (ClinVar BRCA1, p=2.24×10-51), and increases the number of variants for which predictions can be made (2,741 to 10,347) by inferring additional variant effects for substitutions not experimentally screened. For UK Biobank patients who carry a rare variant in TP53, FUSE significantly improves the separation of patients who develop cancer syndromes from those without cancer (p=1.77×10-6). These approaches promise to improve estimates of variant impact and broaden the utility of screening data generated from functional assays.

16.
Nat Commun ; 14(1): 2230, 2023 04 19.
Article in English | MEDLINE | ID: mdl-37076482

ABSTRACT

Despite the increasing use of genomic sequencing in clinical practice, the interpretation of rare genetic variants remains challenging even in well-studied disease genes, resulting in many patients with Variants of Uncertain Significance (VUSs). Computational Variant Effect Predictors (VEPs) provide valuable evidence in variant assessment, but they are prone to misclassifying benign variants, contributing to false positives. Here, we develop Deciphering Mutations in Actionable Genes (DeMAG), a supervised classifier for missense variants trained using extensive diagnostic data available in 59 actionable disease genes (American College of Medical Genetics and Genomics Secondary Findings v2.0, ACMG SF v2.0). DeMAG improves performance over existing VEPs by reaching balanced specificity (82%) and sensitivity (94%) on clinical data, and includes a novel epistatic feature, the 'partners score', which leverages evolutionary and structural partnerships of residues. The 'partners score' provides a general framework for modeling epistatic interactions, integrating both clinical and functional information. We provide our tool and predictions for all missense variants in 316 clinically actionable disease genes (demag.org) to facilitate the interpretation of variants and improve clinical decision-making.


Subject(s)
Genomics , Mutation, Missense , Humans , United States , Genomics/methods , Genetic Variation , Genetic Testing/methods
17.
medRxiv ; 2023 Sep 10.
Article in English | MEDLINE | ID: mdl-37732177

ABSTRACT

CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.

18.
bioRxiv ; 2023 Jan 10.
Article in English | MEDLINE | ID: mdl-36711952

ABSTRACT

Genetic variation contributes greatly to LDL cholesterol (LDL-C) levels and coronary artery disease risk. By combining analysis of rare coding variants from the UK Biobank and genome-scale CRISPR-Cas9 knockout and activation screening, we have substantially improved the identification of genes whose disruption alters serum LDL-C levels. We identify 21 genes in which rare coding variants significantly alter LDL-C levels at least partially through altered LDL-C uptake. We use co-essentiality-based gene module analysis to show that dysfunction of the RAB10 vesicle transport pathway leads to hypercholesterolemia in humans and mice by impairing surface LDL receptor levels. Further, we demonstrate that loss of function of OTX2 leads to robust reduction in serum LDL-C levels in mice and humans by increasing cellular LDL-C uptake. Altogether, we present an integrated approach that improves our understanding of genetic regulators of LDL-C levels and provides a roadmap for further efforts to dissect complex human disease genetics.

19.
Cell Genom ; 3(5): 100304, 2023 May 10.
Article in English | MEDLINE | ID: mdl-37228746

ABSTRACT

Genetic variation contributes greatly to LDL cholesterol (LDL-C) levels and coronary artery disease risk. By combining analysis of rare coding variants from the UK Biobank and genome-scale CRISPR-Cas9 knockout and activation screening, we substantially improve the identification of genes whose disruption alters serum LDL-C levels. We identify 21 genes in which rare coding variants significantly alter LDL-C levels at least partially through altered LDL-C uptake. We use co-essentiality-based gene module analysis to show that dysfunction of the RAB10 vesicle transport pathway leads to hypercholesterolemia in humans and mice by impairing surface LDL receptor levels. Further, we demonstrate that loss of function of OTX2 leads to robust reduction in serum LDL-C levels in mice and humans by increasing cellular LDL-C uptake. Altogether, we present an integrated approach that improves our understanding of the genetic regulators of LDL-C levels and provides a roadmap for further efforts to dissect complex human disease genetics.

20.
Bioinformatics ; 27(6): 891-3, 2011 Mar 15.
Article in English | MEDLINE | ID: mdl-21258063

ABSTRACT

SUMMARY: Accurate annotations of genomic variants are necessary to achieve full-genome clinical interpretations that are scientifically sound and medically relevant. Many disease associations, especially those reported before the completion of the HGP, are limited in applicability because of potential inconsistencies with our current standards for genomic coordinates, nomenclature and gene structure. In an effort to validate and link variants from the medical genetics literature to an unambiguous reference for each variant, we developed a software pipeline and reviewed 68 641 single amino acid mutations from Online Mendelian Inheritance in Man (OMIM), Human Gene Mutation Database (HGMD) and dbSNP. The frequency of unresolved mutation annotations varied widely among the databases, ranging from 4 to 23%. A taxonomy of primary causes for unresolved mutations was produced. AVAILABILITY: This program is freely available from the web site (http://safegene.hms.harvard.edu/aa2nt/).


Subject(s)
Computational Biology/methods , Databases, Genetic , Electronic Data Processing/methods , Software , Algorithms , Amino Acid Substitution , Codon , Humans , Molecular Sequence Annotation , Point Mutation , Sequence Analysis, DNA/methods
SELECTION OF CITATIONS
SEARCH DETAIL