Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Am J Hum Genet ; 102(3): 415-426, 2018 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-29455857

RESUMEN

The spatial distribution of genetic variation within proteins is shaped by evolutionary constraint and provides insight into the functional importance of protein regions and the potential pathogenicity of protein alterations. Here, we comprehensively evaluate the 3D spatial patterns of human germline and somatic variation in 6,604 experimentally derived protein structures and 33,144 computationally derived homology models covering 77% of all human proteins. Using a systematic approach, we quantify differences in the spatial distributions of neutral germline variants, disease-causing germline variants, and recurrent somatic variants. Neutral missense variants exhibit a general trend toward spatial dispersion, which is driven by constraint on core residues. In contrast, germline disease-causing variants are generally clustered in protein structures and form clusters more frequently than recurrent somatic variants identified from tumor sequencing. In total, we identify 215 proteins with significant spatial constraints on the distribution of disease-causing missense variants in experimentally derived protein structures, only 65 (30%) of which have been previously reported. This analysis identifies many clusters not detectable from sequence information alone; only 12% of proteins with significant clustering in 3D were identified from similar analyses of linear protein sequence. Furthermore, spatial analyses of mutations in homology-based structural models are highly correlated with those from experimentally derived structures, supporting the use of computationally derived models. Our approach highlights significant differences in the spatial constraints on different classes of mutations in protein structure and identifies regions of potential function within individual proteins.


Asunto(s)
Mutación Missense/genética , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Análisis por Conglomerados , Humanos , Modelos Moleculares
2.
BMC Bioinformatics ; 19(1): 18, 2018 01 23.
Artículo en Inglés | MEDLINE | ID: mdl-29361909

RESUMEN

BACKGROUND: Next-generation sequencing of individuals with genetic diseases often detects candidate rare variants in numerous genes, but determining which are causal remains challenging. We hypothesized that the spatial distribution of missense variants in protein structures contains information about function and pathogenicity that can help prioritize variants of unknown significance (VUS) and elucidate the structural mechanisms leading to disease. RESULTS: To illustrate this approach in a clinical application, we analyzed 13 candidate missense variants in regulator of telomere elongation helicase 1 (RTEL1) identified in patients with Familial Interstitial Pneumonia (FIP). We curated pathogenic and neutral RTEL1 variants from the literature and public databases. We then used homology modeling to construct a 3D structural model of RTEL1 and mapped known variants into this structure. We next developed a pathogenicity prediction algorithm based on proximity to known disease causing and neutral variants and evaluated its performance with leave-one-out cross-validation. We further validated our predictions with segregation analyses, telomere lengths, and mutagenesis data from the homologous XPD protein. Our algorithm for classifying RTEL1 VUS based on spatial proximity to pathogenic and neutral variation accurately distinguished 7 known pathogenic from 29 neutral variants (ROC AUC = 0.85) in the N-terminal domains of RTEL1. Pathogenic proximity scores were also significantly correlated with effects on ATPase activity (Pearson r = -0.65, p = 0.0004) in XPD, a related helicase. Applying the algorithm to 13 VUS identified from sequencing of RTEL1 from patients predicted five out of six disease-segregating VUS to be pathogenic. We provide structural hypotheses regarding how these mutations may disrupt RTEL1 ATPase and helicase function. CONCLUSIONS: Spatial analysis of missense variation accurately classified candidate VUS in RTEL1 and suggests how such variants cause disease. Incorporating spatial proximity analyses into other pathogenicity prediction tools may improve accuracy for other genes and genetic diseases.


Asunto(s)
Algoritmos , ADN Helicasas/genética , Enfermedades Pulmonares Intersticiales/patología , Análisis Espacial , Área Bajo la Curva , ADN Helicasas/química , ADN Helicasas/metabolismo , Humanos , Enfermedades Pulmonares Intersticiales/genética , Mutación Missense , Estructura Terciaria de Proteína , Curva ROC
3.
bioRxiv ; 2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-39149406

RESUMEN

Effective diagnosis and treatment of rare genetic disorders requires the interpretation of a patient's genetic variants of unknown significance (VUSs). Today, clinical decision-making is primarily guided by gene-phenotype association databases and DNA-based scoring methods. Our web-accessible variant analysis pipeline, VUStruct, supplements these established approaches by deeply analyzing the downstream molecular impact of variation in context of 3D protein structure. VUStruct's growing impact is fueled by the co-proliferation of protein 3D structural models, gene sequencing, compute power, and artificial intelligence. Contextualizing VUSs in protein 3D structural models also illuminates longitudinal genomics studies and biochemical bench research focused on VUS, and we created VUStruct for clinicians and researchers alike. We now introduce VUStruct to the broad scientific community as a mature, web-facing, extensible, High Performance Computing (HPC) software pipeline. VUStruct maps missense variants onto automatically selected protein structures and launches a broad range of analyses. These include energy-based assessments of protein folding and stability, pathogenicity prediction through spatial clustering analysis, and machine learning (ML) predictors of binding surface disruptions and nearby post-translational modification sites. The pipeline also considers the entire input set of VUS and identifies genes potentially involved in digenic disease. VUStruct's utility in clinical rare disease genome interpretation has been demonstrated through its analysis of over 175 Undiagnosed Disease Network (UDN) Patient cases. VUStruct-leveraged hypotheses have often informed clinicians in their consideration of additional patient testing, and we report here details from two cases where VUStruct was key to their solution. We also note successes with academic research collaborators, for whom VUStruct has informed research directions in both computational genomics and wet lab studies.

4.
Circ Genom Precis Med ; 14(2): e003304, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33651632

RESUMEN

BACKGROUND: There is considerable interest in whether genetic data can be used to improve standard cardiovascular disease risk calculators, as the latter are routinely used in clinical practice to manage preventative treatment. METHODS: Using the UK Biobank resource, we developed our own polygenic risk score for coronary artery disease (CAD). We used an additional 60 000 UK Biobank individuals to develop an integrated risk tool (IRT) that combined our polygenic risk score with established risk tools (either the American Heart Association/American College of Cardiology pooled cohort equations [PCE] or UK QRISK3), and we tested our IRT in an additional, independent set of 186 451 UK Biobank individuals. RESULTS: The novel CAD polygenic risk score shows superior predictive power for CAD events, compared with other published polygenic risk scores, and is largely uncorrelated with PCE and QRISK3. When combined with PCE into an IRT, it has superior predictive accuracy. Overall, 10.4% of incident CAD cases were misclassified as low risk by PCE and correctly classified as high risk by the IRT, compared with 4.4% misclassified by the IRT and correctly classified by PCE. The overall net reclassification improvement for the IRT was 5.9% (95% CI, 4.7-7.0). When individuals were stratified into age-by-sex subgroups, the improvement was larger for all subgroups (range, 8.3%-15.4%), with the best performance in 40- to 54-year-old men (15.4% [95% CI, 11.6-19.3]). Comparable results were found using a different risk tool (QRISK3) and also a broader definition of cardiovascular disease. Use of the IRT is estimated to avoid up to 12 000 deaths in the United States over a 5-year period. CONCLUSIONS: An IRT that includes polygenic risk outperforms current risk stratification tools and offers greater opportunity for early interventions. Given the plummeting costs of genetic tests, future iterations of CAD risk tools would be enhanced with the addition of a person's polygenic risk.


Asunto(s)
Enfermedad de la Arteria Coronaria/diagnóstico , Adulto , Anciano , Enfermedad de la Arteria Coronaria/epidemiología , Enfermedad de la Arteria Coronaria/genética , Bases de Datos Genéticas , Femenino , Predisposición Genética a la Enfermedad , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Modelos de Riesgos Proporcionales , Factores de Riesgo
5.
Am J Cardiol ; 148: 157-164, 2021 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-33675770

RESUMEN

The American College of Cardiology / American Heart Association pooled cohort equations tool (ASCVD-PCE) is currently recommended to assess 10-year risk for atherosclerotic cardiovascular disease (ASCVD). ASCVD-PCE does not currently include genetic risk factors. Polygenic risk scores (PRSs) have been shown to offer a powerful new approach to measuring genetic risk for common diseases, including ASCVD, and to enhance risk prediction when combined with ASCVD-PCE. Most work to date, including the assessment of tools, has focused on performance in individuals of European ancestries. Here we present evidence for the clinical validation of a new integrated risk tool (IRT), ASCVD-IRT, which combines ASCVD-PCE with PRS to predict 10-year risk of ASCVD across diverse ethnicity and ancestry groups. We demonstrate improved predictive performance of ASCVD-IRT over ASCVD-PCE, not only in individuals of self-reported White ethnicities (net reclassification improvement [NRI]; with 95% confidence interval = 2.7% [1.1 to 4.2]) but also Black / African American / Black Caribbean / Black African (NRI = 2.5% [0.6-4.3]) and South Asian (Indian, Bangladeshi or Pakistani) ethnicities (NRI = 8.7% [3.1 to 14.4]). NRI confidence intervals were wider and included zero for ethnicities with smaller sample sizes, including Hispanic (NRI = 7.5% [-1.4 to 16.5]), but PRS effect sizes in these ethnicities were significant and of comparable size to those seen in individuals of White ethnicities. Comparable results were obtained when individuals were analyzed by genetically inferred ancestry. Together, these results validate the performance of ASCVD-IRT in multiple ethnicities and ancestries, and favor their generalization to all ethnicities and ancestries.


Asunto(s)
Aterosclerosis/epidemiología , Predisposición Genética a la Enfermedad , Factores de Riesgo de Enfermedad Cardiaca , Adulto , Anciano , Asia Occidental , Pueblo Asiatico , Aterosclerosis/etnología , Aterosclerosis/genética , Población Negra , Estudios de Cohortes , Femenino , Humanos , Masculino , Persona de Mediana Edad , Reproducibilidad de los Resultados , Población Blanca
6.
Cell Metab ; 25(4): 838-855.e15, 2017 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-28380376

RESUMEN

Sirtuins are NAD+-dependent protein deacylases that regulate several aspects of metabolism and aging. In contrast to the other mammalian sirtuins, the primary enzymatic activity of mitochondrial sirtuin 4 (SIRT4) and its overall role in metabolic control have remained enigmatic. Using a combination of phylogenetics, structural biology, and enzymology, we show that SIRT4 removes three acyl moieties from lysine residues: methylglutaryl (MG)-, hydroxymethylglutaryl (HMG)-, and 3-methylglutaconyl (MGc)-lysine. The metabolites leading to these post-translational modifications are intermediates in leucine oxidation, and we show a primary role for SIRT4 in controlling this pathway in mice. Furthermore, we find that dysregulated leucine metabolism in SIRT4KO mice leads to elevated basal and stimulated insulin secretion, which progressively develops into glucose intolerance and insulin resistance. These findings identify a robust enzymatic activity for SIRT4, uncover a mechanism controlling branched-chain amino acid flux, and position SIRT4 as a crucial player maintaining insulin secretion and glucose homeostasis during aging.


Asunto(s)
Amidohidrolasas/metabolismo , Insulina/metabolismo , Leucina/metabolismo , Lisina/metabolismo , Proteínas Mitocondriales/metabolismo , Sirtuinas/metabolismo , Secuencia de Aminoácidos , Animales , Ligasas de Carbono-Carbono/metabolismo , Glucosa/metabolismo , Células HEK293 , Homeostasis , Humanos , Resistencia a la Insulina , Secreción de Insulina , Análisis de Flujos Metabólicos , Ratones Endogámicos C57BL , Ratones Noqueados , Proteínas Mitocondriales/química , Modelos Moleculares , Filogenia , Sirtuinas/química
7.
Artículo en Inglés | MEDLINE | ID: mdl-25541630

RESUMEN

Rarely occurring genetic variants are hypothesized to influence human diseases, but statistically associating these rare variants to disease is challenging due to a lack of statistical power in most feasibly sized datasets. Several statistical tests have been developed to either collapse multiple rare variants from a genomic region into a single variable (presence/absence) or to tally the number of rare alleles within a region, relating the burden of rare alleles to disease risk. Both these approaches, however, rely on user-specification of a genomic region to generate these collapsed or burden variables, usually an entire gene. Recent studies indicate that most risk variants for common diseases are found within regulatory regions, not genes. To capture the effect of rare alleles within non-genic regulatory regions for burden tests, we contrast a simple sliding window approach with a knowledge-guided k-medoids clustering method to group rare variants into statistically powerful, biologically meaningful windows. We apply these methods to detect genomic regions that alter expression of nearby genes.

8.
Database (Oxford) ; 2013: bat056, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23894185

RESUMEN

Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.


Asunto(s)
Sistemas de Administración de Bases de Datos , Genómica , Almacenamiento y Recuperación de la Información , Algoritmos , Bases de Datos Genéticas , Motor de Búsqueda
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA