Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 109(12): 2163-2177, 2022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36413997

RESUMEN

Recommendations from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) for interpreting sequence variants specify the use of computational predictors as "supporting" level of evidence for pathogenicity or benignity using criteria PP3 and BP4, respectively. However, score intervals defined by tool developers, and ACMG/AMP recommendations that require the consensus of multiple predictors, lack quantitative support. Previously, we described a probabilistic framework that quantified the strengths of evidence (supporting, moderate, strong, very strong) within ACMG/AMP recommendations. We have extended this framework to computational predictors and introduce a new standard that converts a tool's scores to PP3 and BP4 evidence strengths. Our approach is based on estimating the local positive predictive value and can calibrate any computational tool or other continuous-scale evidence on any variant type. We estimate thresholds (score intervals) corresponding to each strength of evidence for pathogenicity and benignity for thirteen missense variant interpretation tools, using carefully assembled independent data sets. Most tools achieved supporting evidence level for both pathogenic and benign classification using newly established thresholds. Multiple tools reached score thresholds justifying moderate and several reached strong evidence levels. One tool reached very strong evidence level for benign classification on some variants. Based on these findings, we provide recommendations for evidence-based revisions of the PP3 and BP4 ACMG/AMP criteria using individual tools and future assessment of computational methods for clinical interpretation.


Asunto(s)
Calibración , Humanos , Consenso , Escolaridad , Virulencia
2.
Hum Genomics ; 18(1): 44, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38685113

RESUMEN

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.


Asunto(s)
Enfermedades Raras , Humanos , Enfermedades Raras/genética , Enfermedades Raras/diagnóstico , Genoma Humano/genética , Variación Genética/genética , Biología Computacional/métodos , Fenotipo
3.
Clin Gastroenterol Hepatol ; 21(10): 2578-2587.e11, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-36610497

RESUMEN

BACKGROUND & AIMS: Genetic variants affecting liver disease risk vary among racial and ethnic groups. Hispanics/Latinos in the United States have a high prevalence of PNPLA3 I148M, which increases liver disease risk, and a low prevalence of HSD17B13 predicted loss-of-function (pLoF) variants, which reduce risk. Less is known about the prevalence of liver disease-associated variants among Hispanic/Latino subpopulations defined by country of origin and genetic ancestry. We evaluated the prevalence of HSD17B13 pLoF variants and PNPLA3 I148M, and their associations with quantitative liver phenotypes in Hispanic/Latino participants from an electronic health record-linked biobank in New York City. METHODS: This study included 8739 adult Hispanic/Latino participants of the BioMe biobank with genotyping and exome sequencing data. We estimated the prevalence of Hispanic/Latino individuals harboring HSD17B13 and PNPLA3 variants, stratified by genetic ancestry, and performed association analyses between variants and liver enzymes and Fibrosis-4 (FIB-4) scores. RESULTS: Individuals with ancestry from Ecuador and Mexico had the lowest frequency of HSD17B13 pLoF variants (10%/7%) and the highest frequency of PNPLA3 I148M (54%/65%). These ancestry groups had the highest outpatient alanine aminotransferase (ALT) and aspartate aminotransferase (AST) levels, and the largest proportion of individuals with a FIB-4 score greater than 2.67. HSD17B13 pLoF variants were associated with reduced ALT level (P = .002), AST level (P < .001), and FIB-4 score (P = .045). PNPLA3 I148M was associated with increased ALT level, AST level, and FIB-4 score (P < .001 for all). HSD17B13 pLoF variants mitigated the increase in ALT conferred by PNPLA3 I148M (P = .006). CONCLUSIONS: Variation in HSD17B13 and PNPLA3 variants across genetic ancestry groups may contribute to differential risk for liver fibrosis among Hispanic/Latino individuals.


Asunto(s)
Cirrosis Hepática , Enfermedad del Hígado Graso no Alcohólico , Humanos , Predisposición Genética a la Enfermedad , Hispánicos o Latinos/genética , Cirrosis Hepática/enzimología , Cirrosis Hepática/genética , Enfermedad del Hígado Graso no Alcohólico/enzimología , Enfermedad del Hígado Graso no Alcohólico/genética , Polimorfismo de Nucleótido Simple
4.
BMC Med Inform Decis Mak ; 23(1): 2, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36609379

RESUMEN

BACKGROUND: Low back pain (LBP) is a common condition made up of a variety of anatomic and clinical subtypes. Lumbar disc herniation (LDH) and lumbar spinal stenosis (LSS) are two subtypes highly associated with LBP. Patients with LDH/LSS are often started with non-surgical treatments and if those are not effective then go on to have decompression surgery. However, recommendation of surgery is complicated as the outcome may depend on the patient's health characteristics. We developed a deep learning (DL) model to predict decompression surgery for patients with LDH/LSS. MATERIALS AND METHOD: We used datasets of 8387 and 8620 patients from a prospective study that collected data from four healthcare systems to predict early (within 2 months) and late surgery (within 12 months after a 2 month gap), respectively. We developed a DL model to use patients' demographics, diagnosis and procedure codes, drug names, and diagnostic imaging reports to predict surgery. For each prediction task, we evaluated the model's performance using classical and generalizability evaluation. For classical evaluation, we split the data into training (80%) and testing (20%). For generalizability evaluation, we split the data based on the healthcare system. We used the area under the curve (AUC) to assess performance for each evaluation. We compared results to a benchmark model (i.e. LASSO logistic regression). RESULTS: For classical performance, the DL model outperformed the benchmark model for early surgery with an AUC of 0.725 compared to 0.597. For late surgery, the DL model outperformed the benchmark model with an AUC of 0.655 compared to 0.635. For generalizability performance, the DL model outperformed the benchmark model for early surgery. For late surgery, the benchmark model outperformed the DL model. CONCLUSIONS: For early surgery, the DL model was preferred for classical and generalizability evaluation. However, for late surgery, the benchmark and DL model had comparable performance. Depending on the prediction task, the balance of performance may shift between DL and a conventional ML method. As a result, thorough assessment is needed to quantify the value of DL, a relatively computationally expensive, time-consuming and less interpretable method.


Asunto(s)
Aprendizaje Profundo , Desplazamiento del Disco Intervertebral , Dolor de la Región Lumbar , Estenosis Espinal , Humanos , Descompresión Quirúrgica/efectos adversos , Descompresión Quirúrgica/métodos , Estudios Prospectivos , Vértebras Lumbares/cirugía , Dolor de la Región Lumbar/diagnóstico , Dolor de la Región Lumbar/cirugía , Dolor de la Región Lumbar/complicaciones , Desplazamiento del Disco Intervertebral/cirugía , Estenosis Espinal/cirugía , Resultado del Tratamiento , Estudios Retrospectivos
5.
BMC Public Health ; 20(1): 46, 2020 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-31931781

RESUMEN

BACKGROUND: The increasing adoption of electronic health record (EHR) systems enables automated, large scale, and meaningful analysis of regional population health. We explored how EHR systems could inform surveillance of trauma-related emergency department visits arising from seasonal, holiday-related, and rare environmental events. METHODS: We analyzed temporal variation in diagnosis codes over 24 years of trauma visit data at the three hospitals in the University of Washington Medicine system in Seattle, Washington, USA. We identified seasons and days in which specific codes and categories of codes were statistically enriched, meaning that a significantly greater than average proportion of trauma visits included a given diagnosis code during that time period. RESULTS: We confirmed known seasonal patterns in emergency department visits for trauma. As expected, cold weather-related incidents (e.g. frostbite, snowboarding injury) were enriched in the winter, whereas fair weather-related incidents (e.g. bug bites, boating accidents, bicycle accidents) were enriched in the spring and summer. Our analysis of specific days of the year found that holidays were enriched for alcohol poisoning, assaults, and firework accidents. We also detected one time regional events such as the 2001 Nisqually earthquake and the 2006 Hanukkah Eve Windstorm. CONCLUSIONS: Though EHR systems were developed to prioritize operational rather than analytic priorities and have consequent limitations for surveillance, our EHR enrichment analysis nonetheless re-identified expected temporal population health patterns. EHRs are potentially a valuable source of information to inform public health policy, both in retrospective analysis and in a surveillance capacity.


Asunto(s)
Registros Electrónicos de Salud , Servicio de Urgencia en Hospital/estadística & datos numéricos , Intoxicación/epidemiología , Vigilancia de la Población/métodos , Heridas y Lesiones/epidemiología , Vacaciones y Feriados , Humanos , Intoxicación/terapia , Estaciones del Año , Washingtón/epidemiología , Tiempo (Meteorología) , Heridas y Lesiones/terapia
6.
Hum Mutat ; 40(9): 1495-1506, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31184403

RESUMEN

Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.


Asunto(s)
Biología Computacional/métodos , Metiltransferasas/química , Mutación , Fosfohidrolasa PTEN/química , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Metiltransferasas/genética , Fosfohidrolasa PTEN/genética , Estabilidad Proteica
7.
Hum Mutat ; 40(9): 1546-1556, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31294896

RESUMEN

Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.


Asunto(s)
Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias de la Mama/diagnóstico , Biología Computacional/métodos , Neoplasias Ováricas/diagnóstico , Neoplasias de la Mama/genética , Detección Precoz del Cáncer , Femenino , Predisposición Genética a la Enfermedad , Pruebas Genéticas , Variación Genética , Humanos , Modelos Genéticos , Neoplasias Ováricas/genética
8.
Hum Mutat ; 40(9): 1519-1529, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31342580

RESUMEN

The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.


Asunto(s)
Acetilglucosaminidasa/metabolismo , Biología Computacional/métodos , Mutación Missense , Acetilglucosaminidasa/genética , Humanos , Modelos Genéticos , Análisis de Regresión
9.
Hum Mutat ; 40(9): 1612-1622, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31241222

RESUMEN

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.


Asunto(s)
Neoplasias de la Mama/genética , Quinasa de Punto de Control 2/genética , Biología Computacional/métodos , Hispánicos o Latinos/genética , Polimorfismo de Nucleótido Simple , Adulto , Anciano , Neoplasias de la Mama/etnología , Estudios de Casos y Controles , Simulación por Computador , Femenino , Predisposición Genética a la Enfermedad , Humanos , Modelos Lineales , Persona de Mediana Edad , Estados Unidos/etnología , Secuenciación del Exoma
10.
Am J Hum Genet ; 99(4): 877-885, 2016 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-27666373

RESUMEN

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.


Asunto(s)
Enfermedad/genética , Mutación Missense/genética , Programas Informáticos , Área Bajo la Curva , Análisis Mutacional de ADN , Exoma/genética , Frecuencia de los Genes , Humanos , Curva ROC
11.
Annu Rev Public Health ; 39: 95-112, 2018 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-29261408

RESUMEN

The digital world is generating data at a staggering and still increasing rate. While these "big data" have unlocked novel opportunities to understand public health, they hold still greater potential for research and practice. This review explores several key issues that have arisen around big data. First, we propose a taxonomy of sources of big data to clarify terminology and identify threads common across some subtypes of big data. Next, we consider common public health research and practice uses for big data, including surveillance, hypothesis-generating research, and causal inference, while exploring the role that machine learning may play in each use. We then consider the ethical implications of the big data revolution with particular emphasis on maintaining appropriate care for privacy in a world in which technology is rapidly changing social norms regarding the need for (and even the meaning of) privacy. Finally, we make suggestions regarding structuring teams and training to succeed in working with big data in research and practice.


Asunto(s)
Macrodatos , Aprendizaje Automático , Salud Pública , Investigación/organización & administración , Causalidad , Confidencialidad/normas , Humanos , Vigilancia de la Población/métodos , Investigación/normas , Terminología como Asunto
12.
Bioinformatics ; 33(14): i389-i398, 2017 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-28882004

RESUMEN

MOTIVATION: Loss-of-function genetic variants are frequently associated with severe clinical phenotypes, yet many are present in the genomes of healthy individuals. The available methods to assess the impact of these variants rely primarily upon evolutionary conservation with little to no consideration of the structural and functional implications for the protein. They further do not provide information to the user regarding specific molecular alterations potentially causative of disease. RESULTS: To address this, we investigate protein features underlying loss-of-function genetic variation and develop a machine learning method, MutPred-LOF, for the discrimination of pathogenic and tolerated variants that can also generate hypotheses on specific molecular events disrupted by the variant. We investigate a large set of human variants derived from the Human Gene Mutation Database, ClinVar and the Exome Aggregation Consortium. Our prediction method shows an area under the Receiver Operating Characteristic curve of 0.85 for all loss-of-function variants and 0.75 for proteins in which both pathogenic and neutral variants have been observed. We applied MutPred-LOF to a set of 1142 de novo vari3ants from neurodevelopmental disorders and find enrichment of pathogenic variants in affected individuals. Overall, our results highlight the potential of computational tools to elucidate causal mechanisms underlying loss of protein function in loss-of-function variants. AVAILABILITY AND IMPLEMENTATION: http://mutpred.mutdb.org. CONTACT: predrag@indiana.edu.


Asunto(s)
Mutación con Pérdida de Función , Aprendizaje Automático , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Biología Computacional/métodos , Humanos , Conformación Proteica , Proteínas/metabolismo , Proteínas/fisiología
13.
Hum Mutat ; 38(9): 1092-1108, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28508593

RESUMEN

The steady advances in machine learning and accumulation of biomedical data have contributed to the development of numerous computational models that assess the impact of missense variants. Different methods, however, operationalize impact differently. Two common tasks in this context are the prediction of the pathogenicity of variants and the prediction of their effects on a protein's function. These are related but distinct problems, and it is unclear whether methods developed for one are optimized for the other. The Critical Assessment of Genome Interpretation (CAGI) experiment provides a means to address this question empirically. To this end, we participated in various protein-specific challenges in CAGI with two objectives in mind. First, to compare the performance of methods in the MutPred family with the state-of-the-art. Second and more importantly, to investigate the applicability of general-purpose pathogenicity predictors to the classification of specific function-altering variants without additional training or calibration. We find that our pathogenicity predictors performed competitively with other methods, outputting score distributions in agreement with experimental outcomes. Overall, we conclude that binary classifiers learned from disease-causing mutations are capable of modeling important aspects of the underlying biology and the alteration of protein function resulting from mutations.


Asunto(s)
Biología Computacional/métodos , Mutación Missense , Proteínas/genética , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Aprendizaje Automático
14.
Hum Mutat ; 38(9): 1182-1192, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28634997

RESUMEN

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype-phenotype relationships.


Asunto(s)
Trastorno Bipolar/genética , Enfermedad de Crohn/genética , Secuenciación del Exoma/métodos , Medicina de Precisión/métodos , Warfarina/uso terapéutico , Biología Computacional/métodos , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Difusión de la Información , Variantes Farmacogenómicas , Fenotipo , Warfarina/farmacología
15.
PLoS Comput Biol ; 12(8): e1005091, 2016 08.
Artículo en Inglés | MEDLINE | ID: mdl-27564311

RESUMEN

Elucidating the precise molecular events altered by disease-causing genetic variants represents a major challenge in translational bioinformatics. To this end, many studies have investigated the structural and functional impact of amino acid substitutions. Most of these studies were however limited in scope to either individual molecular functions or were concerned with functional effects (e.g. deleterious vs. neutral) without specifically considering possible molecular alterations. The recent growth of structural, molecular and genetic data presents an opportunity for more comprehensive studies to consider the structural environment of a residue of interest, to hypothesize specific molecular effects of sequence variants and to statistically associate these effects with genetic disease. In this study, we analyzed data sets of disease-causing and putatively neutral human variants mapped to protein 3D structures as part of a systematic study of the loss and gain of various types of functional attribute potentially underlying pathogenic molecular alterations. We first propose a formal model to assess probabilistically function-impacting variants. We then develop an array of structure-based functional residue predictors, evaluate their performance, and use them to quantify the impact of disease-causing amino acid substitutions on catalytic activity, metal binding, macromolecular binding, ligand binding, allosteric regulation and post-translational modifications. We show that our methodology generates actionable biological hypotheses for up to 41% of disease-causing genetic variants mapped to protein structures suggesting that it can be reliably used to guide experimental validation. Our results suggest that a significant fraction of disease-causing human variants mapping to protein structures are function-altering both in the presence and absence of stability disruption.


Asunto(s)
Secuencia de Aminoácidos/genética , Enfermedad/genética , Modelos Estadísticos , Mutación/genética , Sustitución de Aminoácidos/genética , Biología Computacional , Simulación por Computador , Humanos , Modelos Moleculares , Unión Proteica
16.
Int J Mass Spectrom ; 368: 6-14, 2014 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-26023288

RESUMEN

Cross sections for 61 palmitoylated peptides and 73 cysteine-unmodified peptides are determined and used together with a previously obtained tryptic peptide library to derive a set of intrinsic size parameters (ISPs) for the palmitoyl (Pal) group (1.26 ± 0.04), carboxyamidomethyl (Am) group (0.92 ± 0.04), and the 20 amino acid residues to assess the influence of Pal- and Am-modification on cysteine and other amino acid residues. These values highlight the influence of the intrinsic hydrophobic and hydrophilic nature of these modifications on the overall cross sections. As a part of this analysis, we find that ISPs derived from a database of a modifier on one amino acid residue (CysPal) can be applied on the same modification group on different amino acid residues (SerPal and TyrPal). Using these ISP values, we are able to calculate peptide cross sections to within ± 2% of experimental values for 83% of Pal-modified peptide ions and 63% of Am-modified peptide ions. We propose that modification groups should be treated as individual contribution factors, instead of treating the combination of the particular group and the amino acid residue they are on as a whole when considering their effects on the peptide ion mobility features.

17.
medRxiv ; 2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38496501

RESUMEN

Purpose: To investigate the number of rare missense variants observed in human genome sequences by ACMG/AMP PP3/BP4 evidence strength, following the calibrated PP3/BP4 computational recommendations. Methods: Missense variants from the genome sequences of 300 probands from the Rare Genomes Project with suspected rare disease were analyzed using computational prediction tools able to reach PP3_Strong and BP4_Moderate evidence strengths (BayesDel, MutPred2, REVEL, and VEST4). The numbers of variants at each evidence strength were analyzed across disease-associated genes and genome-wide. Results: From a median of 75.5 rare (≤1% allele frequency) missense variants in disease-associated genes per proband, a median of one reached PP3_Strong, 3-5 PP3_Moderate, and 3-5 PP3_Supporting. Most were allocated BP4 evidence (median 41-49 per proband) or were indeterminate (median 17.5-19 per proband). Extending the analysis to all protein-coding genes genome-wide, the number of PP3_Strong variants increased approximately 2.6-fold compared to disease-associated genes, with a median per proband of 1-3 PP3_Strong, 8-16 PP3_Moderate, and 10-17 PP3_Supporting. Conclusion: A small number of variants per proband reached PP3_Strong and PP3_Moderate in 3,424 disease-associated genes, and though not the intended use of the recommendations, also genome-wide. Use of PP3/BP4 evidence as recommended from calibrated computational prediction tools in the clinical diagnostic laboratory is unlikely to inappropriately contribute to the classification of an excessive number of variants as Pathogenic or Likely Pathogenic by ACMG/AMP rules.

18.
bioRxiv ; 2024 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-38895200

RESUMEN

Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.

19.
Bioinformatics ; 28(11): 1527-9, 2012 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-22495752

RESUMEN

MOTIVATION: Gene clusters are arrangements of functionally related genes on a chromosome. In bacteria, it is expected that evolutionary pressures would conserve these arrangements due to the functional advantages they provide. Visualization of conserved gene clusters across multiple genomes provides key insights into their evolutionary histories. Therefore, a software tool that enables visualization and functional analyses of gene clusters would be a great asset to the biological research community. RESULTS: We have developed GeneclusterViz, a Java-based tool that allows for the visualization, exploration and downstream analyses of conserved gene clusters across multiple genomes. GeneclusterViz combines an easy-to-use exploration interface for gene clusters with a host of other analysis features such as multiple sequence alignments, phylogenetic analyses and integration with the KEGG pathway database. AVAILABILITY: http://biohealth.snu.ac.kr/GeneclusterViz/; http://microbial.informatics.indiana.edu/GeneclusterViz/


Asunto(s)
Alphaproteobacteria/clasificación , Alphaproteobacteria/genética , Familia de Multigenes , Filogenia , Programas Informáticos , Análisis por Conglomerados , Escherichia coli/genética , Genoma , Alineación de Secuencia
20.
Pac Symp Biocomput ; 28: 323-334, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540988

RESUMEN

The accurate interpretation of genetic variants is essential for clinical actionability. However, a majority of variants remain of uncertain significance. Multiplexed assays of variant effects (MAVEs), can help provide functional evidence for variants of uncertain significance (VUS) at the scale of entire genes. Although the systematic prioritization of genes for such assays has been of great interest from the clinical perspective, existing strategies have rarely emphasized this motivation. Here, we propose three objectives for quantifying the importance of genes each satisfying a specific clinical goal: (1) Movability scores to prioritize genes with the most VUS moving to non-VUS categories, (2) Correction scores to prioritize genes with the most pathogenic and/or benign variants that could be reclassified, and (3) Uncertainty scores to prioritize genes with VUS for which variant pathogenicity predictors used in clinical classification exhibit the greatest uncertainty. We demonstrate that existing approaches are sub-optimal when considering these explicit clinical objectives. We also propose a combined weighted score that optimizes the three objectives simultaneously and finds optimal weights to improve over existing approaches. Our strategy generally results in better performance than existing knowledge-driven and data-driven strategies and yields gene sets that are clinically relevant. Our work has implications for systematic efforts that aim to iterate between predictor development, experimentation and translation to the clinic.


Asunto(s)
Predisposición Genética a la Enfermedad , Pruebas Genéticas , Humanos , Pruebas Genéticas/métodos , Variación Genética , Biología Computacional/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA