Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 139
Filtrar
1.
Sci Rep ; 14(1): 10514, 2024 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-38714721

RESUMEN

Adverse pregnancy outcomes (APOs) affect a large proportion of pregnancies and represent an important cause of morbidity and mortality worldwide. Yet the pathophysiology of APOs is poorly understood, limiting our ability to prevent and treat these conditions. To search for genetic markers of maternal risk for four APOs, we performed multi-ancestry genome-wide association studies (GWAS) for pregnancy loss, gestational length, gestational diabetes, and preeclampsia. We clustered participants by their genetic ancestry and focused our analyses on three sub-cohorts with the largest sample sizes: European, African, and Admixed American. Association tests were carried out separately for each sub-cohort and then meta-analyzed together. Two novel loci were significantly associated with an increased risk of pregnancy loss: a cluster of SNPs located downstream of the TRMU gene (top SNP: rs142795512), and the SNP rs62021480 near RGMA. In the GWAS of gestational length we identified two new variants, rs2550487 and rs58548906 near WFDC1 and AC005052.1, respectively. Lastly, three new loci were significantly associated with gestational diabetes (top SNPs: rs72956265, rs10890563, rs79596863), located on or near ZBTB20, GUCY1A2, and RPL7P20, respectively. Fourteen loci previously correlated with preterm birth, gestational diabetes, and preeclampsia were found to be associated with these outcomes as well.


Asunto(s)
Diabetes Gestacional , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Resultado del Embarazo , Humanos , Embarazo , Femenino , Resultado del Embarazo/genética , Diabetes Gestacional/genética , Adulto , Preeclampsia/genética , Predisposición Genética a la Enfermedad , Paridad/genética
2.
bioRxiv ; 2024 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-38645134

RESUMEN

Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.

3.
medRxiv ; 2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38496501

RESUMEN

Purpose: To investigate the number of rare missense variants observed in human genome sequences by ACMG/AMP PP3/BP4 evidence strength, following the calibrated PP3/BP4 computational recommendations. Methods: Missense variants from the genome sequences of 300 probands from the Rare Genomes Project with suspected rare disease were analyzed using computational prediction tools able to reach PP3_Strong and BP4_Moderate evidence strengths (BayesDel, MutPred2, REVEL, and VEST4). The numbers of variants at each evidence strength were analyzed across disease-associated genes and genome-wide. Results: From a median of 75.5 rare (≤1% allele frequency) missense variants in disease-associated genes per proband, a median of one reached PP3_Strong, 3-5 PP3_Moderate, and 3-5 PP3_Supporting. Most were allocated BP4 evidence (median 41-49 per proband) or were indeterminate (median 17.5-19 per proband). Extending the analysis to all protein-coding genes genome-wide, the number of PP3_Strong variants increased approximately 2.6-fold compared to disease-associated genes, with a median per proband of 1-3 PP3_Strong, 8-16 PP3_Moderate, and 10-17 PP3_Supporting. Conclusion: A small number of variants per proband reached PP3_Strong and PP3_Moderate in 3,424 disease-associated genes, and though not the intended use of the recommendations, also genome-wide. Use of PP3/BP4 evidence as recommended from calibrated computational prediction tools in the clinical diagnostic laboratory is unlikely to inappropriately contribute to the classification of an excessive number of variants as Pathogenic or Likely Pathogenic by ACMG/AMP rules.

4.
Bioinform Adv ; 4(1): vbae043, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38545087

RESUMEN

We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requirements include a small number of standard Python libraries, making CAFA-evaluator easy to maintain. The code replicates the Critical Assessment of protein Function Annotation (CAFA) benchmarking, which evaluates predictions of the consistent subgraphs in Gene Ontology. Owing to its reliability and accuracy, the organizers have selected CAFA-evaluator as the official CAFA evaluation software. Availability and implementation: https://pypi.org/project/cafaeval.

5.
Hum Genet ; 2024 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-38170232

RESUMEN

Variants which disrupt splicing are a frequent cause of rare disease that have been under-ascertained clinically. Accurate and efficient methods to predict a variant's impact on splicing are needed to interpret the growing number of variants of unknown significance (VUS) identified by exome and genome sequencing. Here, we present the results of the CAGI6 Splicing VUS challenge, which invited predictions of the splicing impact of 56 variants ascertained clinically and functionally validated to determine splicing impact. The performance of 12 prediction methods, along with SpliceAI and CADD, was compared on the 56 functionally validated variants. The maximum accuracy achieved was 82% from two different approaches, one weighting SpliceAI scores by minor allele frequency, and one applying the recently published Splicing Prediction Pipeline (SPiP). SPiP performed optimally in terms of sensitivity, while an ensemble method combining multiple prediction tools and information from databases exceeded all others for specificity. Several challenge methods equalled or exceeded the performance of SpliceAI, with ultimate choice of prediction method likely to depend on experimental or clinical aims. One quarter of the variants were incorrectly predicted by at least 50% of the methods, highlighting the need for further improvements to splicing prediction methods for successful clinical application.

6.
Nucleic Acids Res ; 51(19): 10162-10175, 2023 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-37739408

RESUMEN

Determining the repertoire of a microbe's molecular functions is a central question in microbial biology. Modern techniques achieve this goal by comparing microbial genetic material against reference databases of functionally annotated genes/proteins or known taxonomic markers such as 16S rRNA. Here, we describe a novel approach to exploring bacterial functional repertoires without reference databases. Our Fusion scheme establishes functional relationships between bacteria and assigns organisms to Fusion-taxa that differ from otherwise defined taxonomic clades. Three key findings of our work stand out. First, bacterial functional comparisons outperform marker genes in assigning taxonomic clades. Fusion profiles are also better for this task than other functional annotation schemes. Second, Fusion-taxa are robust to addition of novel organisms and are, arguably, able to capture the environment-driven bacterial diversity. Finally, our alignment-free nucleic acid-based Siamese Neural Network model, created using Fusion functions, enables finding shared functionality of very distant, possibly structurally different, microbial homologs. Our work can thus help annotate functional repertoires of bacterial organisms and further guide our understanding of microbial communities.


Asunto(s)
Bacterias , Bacterias/citología , Bacterias/genética , Bases de Datos Factuales , Microbiota , Filogenia , ARN Ribosómico 16S/genética , Fenómenos Fisiológicos Bacterianos
7.
Genetics ; 225(2)2023 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-37602697

RESUMEN

Adverse pregnancy outcomes (APOs) are major risk factors for women's health during pregnancy and even in the years after pregnancy. Due to the heterogeneity of APOs, only few genetic associations have been identified. In this report, we conducted genome-wide association studies (GWASs) of 479 traits that are possibly related to APOs using a large and racially diverse study, Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b). To display extensive results, we developed a web-based tool GnuMoM2b (https://gnumom2b.cumcobgyn.org/) for searching, visualizing, and sharing results from a GWAS of 479 pregnancy traits as well as phenome-wide association studies of more than 17 million single nucleotide polymorphisms. The genetic results from 3 ancestries (Europeans, Africans, and Admixed Americans) and meta-analyses are populated in GnuMoM2b. In conclusion, GnuMoM2b is a valuable resource for extraction of pregnancy-related genetic results and shows the potential to facilitate meaningful discoveries.


Asunto(s)
Estudio de Asociación del Genoma Completo , Fenómica , Embarazo , Femenino , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Factores de Riesgo , Polimorfismo de Nucleótido Simple
8.
medRxiv ; 2023 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-37333377

RESUMEN

Adverse pregnancy outcomes (APOs) are major risk factors for women's health during pregnancy and even in the years after pregnancy. Due to the heterogeneity of APOs, only few genetic associations have been identified. In this report, we conducted genome-wide association studies (GWAS) of 479 traits that are possibly related to APOs using a large and racially diverse study, Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b). To display the extensive results, we developed a web-based tool GnuMoM2b ( https://gnumom2b.cumcobgyn.org/ ) for searching, visualizing, and sharing results from GWAS of 479 pregnancy traits as well as phenome-wide association studies (PheWAS) of more than 17 million single nucleotide polymorphisms (SNPs). The genetic results from three ancestries (Europeans, Africans, and Admixed Americans) and meta-analyses are populated in GnuMoM2b. In conclusion, GnuMoM2b is a valuable resource for extraction of pregnancy-related genetic results and shows the potential to facilitate meaningful discoveries.

9.
Front Artif Intell ; 6: 1029943, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37035530

RESUMEN

We consider the problem of active feature elicitation in which, given some examples with all the features (say, the full Electronic Health Record), and many examples with some of the features (say, demographics), the goal is to identify the set of examples on which more information (say, lab tests) need to be collected. The observation is that some set of features may be more expensive, personal or cumbersome to collect. We propose a classifier-independent, similarity metric-independent, general active learning approach which identifies examples that are dissimilar to the ones with the full set of data and acquire the complete set of features for these examples. Motivated by four real clinical tasks, our extensive evaluation demonstrates the effectiveness of this approach. To demonstrate the generalization capabilities of the proposed approach, we consider different divergence metrics and classifiers and present consistent results across the domains.

10.
Proteomes ; 11(1)2023 Feb 11.
Artículo en Inglés | MEDLINE | ID: mdl-36810564

RESUMEN

Staphylococcus aureus is one of the major community-acquired human pathogens, with growing multidrug-resistance, leading to a major threat of more prevalent infections to humans. A variety of virulence factors and toxic proteins are secreted during infection via the general secretory (Sec) pathway, which requires an N-terminal signal peptide to be cleaved from the N-terminus of the protein. This N-terminal signal peptide is recognized and processed by a type I signal peptidase (SPase). SPase-mediated signal peptide processing is the crucial step in the pathogenicity of S. aureus. In the present study, the SPase-mediated N-terminal protein processing and their cleavage specificity were evaluated using a combination of N-terminal amidination bottom-up and top-down proteomics-based mass spectrometry approaches. Secretory proteins were found to be cleaved by SPase, specifically and non-specifically, on both sides of the normal SPase cleavage site. The non-specific cleavages occur at the relatively smaller residues that are present next to the -1, +1, and +2 locations from the original SPase cleavage site to a lesser extent. Additional random cleavages at the middle and near the C-terminus of some protein sequences were also observed. This additional processing could be a part of some stress conditions and unknown signal peptidase mechanisms.

11.
Pac Symp Biocomput ; 28: 209-220, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540978

RESUMEN

Racial and ethnic disparities in adverse pregnancy outcomes (APOs) have been well-documented in the United States, but the extent to which the disparities are present in high-risk subgroups have not been studied. To address this problem, we first applied association rule mining to the clinical data derived from the prospective nuMoM2b study cohort to identify subgroups at increased risk of developing four APOs (gestational diabetes, hypertension acquired during pregnancy, preeclampsia, and preterm birth). We then quantified racial/ethnic disparities within the cohort as well as within high-risk subgroups to assess potential effects of risk-reduction strategies. We identify significant differences in distributions of major risk factors across racial/ethnic groups and find surprising heterogeneity in APO prevalence across these populations, both in the cohort and in its high-risk subgroups. Our results suggest that risk-reducing strategies that simultaneously reduce disparities may require targeting of high-risk subgroups with considerations for the population context.


Asunto(s)
Resultado del Embarazo , Nacimiento Prematuro , Embarazo , Femenino , Recién Nacido , Humanos , Estados Unidos , Nacimiento Prematuro/epidemiología , Nacimiento Prematuro/etiología , Estudios Prospectivos , Biología Computacional , Factores de Riesgo
12.
Pac Symp Biocomput ; 28: 311-322, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540987

RESUMEN

Data biases are a known impediment to the development of trustworthy machine learning models and their application to many biomedical problems. When biased data is suspected, the assumption that the labeled data is representative of the population must be relaxed and methods that exploit a typically representative unlabeled data must be developed. To mitigate the adverse effects of unrepresentative data, we consider a binary semi-supervised setting and focus on identifying whether the labeled data is biased and to what extent. We assume that the class-conditional distributions were generated by a family of component distributions represented at different proportions in labeled and unlabeled data. We also assume that the training data can be transformed to and subsequently modeled by a nested mixture of multivariate Gaussian distributions. We then develop a multi-sample expectation-maximization algorithm that learns all individual and shared parameters of the model from the combined data. Using these parameters, we develop a statistical test for the presence of the general form of bias in labeled data and estimate the level of this bias by computing the distance between corresponding class-conditional distributions in labeled and unlabeled data. We first study the new methods on synthetic data to understand their behavior and then apply them to real-world biomedical data to provide evidence that the bias estimation procedure is both possible and effective.


Asunto(s)
Algoritmos , Biología Computacional , Humanos , Biología Computacional/métodos , Aprendizaje Automático , Aprendizaje Automático Supervisado
13.
Pac Symp Biocomput ; 28: 323-334, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540988

RESUMEN

The accurate interpretation of genetic variants is essential for clinical actionability. However, a majority of variants remain of uncertain significance. Multiplexed assays of variant effects (MAVEs), can help provide functional evidence for variants of uncertain significance (VUS) at the scale of entire genes. Although the systematic prioritization of genes for such assays has been of great interest from the clinical perspective, existing strategies have rarely emphasized this motivation. Here, we propose three objectives for quantifying the importance of genes each satisfying a specific clinical goal: (1) Movability scores to prioritize genes with the most VUS moving to non-VUS categories, (2) Correction scores to prioritize genes with the most pathogenic and/or benign variants that could be reclassified, and (3) Uncertainty scores to prioritize genes with VUS for which variant pathogenicity predictors used in clinical classification exhibit the greatest uncertainty. We demonstrate that existing approaches are sub-optimal when considering these explicit clinical objectives. We also propose a combined weighted score that optimizes the three objectives simultaneously and finds optimal weights to improve over existing approaches. Our strategy generally results in better performance than existing knowledge-driven and data-driven strategies and yields gene sets that are clinically relevant. Our work has implications for systematic efforts that aim to iterate between predictor development, experimentation and translation to the clinic.


Asunto(s)
Predisposición Genética a la Enfermedad , Pruebas Genéticas , Humanos , Pruebas Genéticas/métodos , Variación Genética , Biología Computacional/métodos
14.
Pac Symp Biocomput ; 28: 359-370, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540991

RESUMEN

We consider the problem of modeling gestational diabetes in a clinical study and develop a domain expert-guided probabilistic model that is both interpretable and explainable. Specifically, we construct a probabilistic model based on causal independence (Noisy-Or) from a carefully chosen set of features. We validate the efficacy of the model on the clinical study and demonstrate the importance of the features and the causal independence model.


Asunto(s)
Diabetes Gestacional , Embarazo , Femenino , Humanos , Biología Computacional , Modelos Estadísticos , Causalidad
15.
Am J Hum Genet ; 109(12): 2163-2177, 2022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36413997

RESUMEN

Recommendations from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) for interpreting sequence variants specify the use of computational predictors as "supporting" level of evidence for pathogenicity or benignity using criteria PP3 and BP4, respectively. However, score intervals defined by tool developers, and ACMG/AMP recommendations that require the consensus of multiple predictors, lack quantitative support. Previously, we described a probabilistic framework that quantified the strengths of evidence (supporting, moderate, strong, very strong) within ACMG/AMP recommendations. We have extended this framework to computational predictors and introduce a new standard that converts a tool's scores to PP3 and BP4 evidence strengths. Our approach is based on estimating the local positive predictive value and can calibrate any computational tool or other continuous-scale evidence on any variant type. We estimate thresholds (score intervals) corresponding to each strength of evidence for pathogenicity and benignity for thirteen missense variant interpretation tools, using carefully assembled independent data sets. Most tools achieved supporting evidence level for both pathogenic and benign classification using newly established thresholds. Multiple tools reached score thresholds justifying moderate and several reached strong evidence levels. One tool reached very strong evidence level for benign classification on some variants. Based on these findings, we provide recommendations for evidence-based revisions of the PP3 and BP4 ACMG/AMP criteria using individual tools and future assessment of computational methods for clinical interpretation.


Asunto(s)
Calibración , Humanos , Consenso , Escolaridad , Virulencia
17.
JAMA Netw Open ; 5(8): e2229158, 2022 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-36040739

RESUMEN

Importance: Polygenic risk scores (PRS) for type 2 diabetes (T2D) can improve risk prediction for gestational diabetes (GD), yet the strength of the association between genetic and lifestyle risk factors has not been quantified. Objective: To assess the association of PRS and physical activity in existing GD risk models and identify patient subgroups who may receive the most benefits from a PRS or physical activity intervention. Design, Settings, and Participants: The Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be cohort was established to study individuals without previous pregnancy lasting at least 20 weeks (nulliparous) and to elucidate factors associated with adverse pregnancy outcomes. A subcohort of 3533 participants with European ancestry was used for risk assessment and performance evaluation. Participants were enrolled from October 5, 2010, to December 3, 2013, and underwent genotyping between February 19, 2019, and February 28, 2020. Data were analyzed from September 15, 2020, to November 10, 2021. Exposures: Self-reported total physical activity in early pregnancy was quantified as metabolic equivalents of task (METs). Polygenic risk scores were calculated for T2D using contributions of 84 single nucleotide variants, weighted by their association in the Diabetes Genetics Replication and Meta-analysis Consortium data. Main Outcomes and Measures: Estimation of the development of GD from clinical, genetic, and environmental variables collected in early pregnancy, assessed using measures of model discrimination. Odds ratios and positive likelihood ratios were used to evaluate the association of PRS and physical activity with GD risk. Results: A total of 3533 women were included in this analysis (mean [SD] age, 28.6 [4.9] years). In high-risk population subgroups (body mass index ≥25 or aged ≥35 years), individuals with high PRS (top 25th percentile) or low activity levels (METs <450) had increased odds of a GD diagnosis of 25% to 75%. Compared with the general population, participants with both high PRS and low activity levels had higher odds of a GD diagnosis (odds ratio, 3.4 [95% CI, 2.3-5.3]), whereas participants with low PRS and high METs had significantly reduced risk of a GD diagnosis (odds ratio, 0.5 [95% CI, 0.3-0.9]; P = .01). Conclusions and Relevance: In this cohort study, the addition of PRS was associated with the stratified risk of GD diagnosis among high-risk patient subgroups, suggesting the benefits of targeted PRS ascertainment to encourage early intervention.


Asunto(s)
Diabetes Mellitus Tipo 2 , Diabetes Gestacional , Adulto , Estudios de Cohortes , Diabetes Mellitus Tipo 2/epidemiología , Diabetes Mellitus Tipo 2/genética , Diabetes Gestacional/epidemiología , Diabetes Gestacional/genética , Ejercicio Físico , Femenino , Predisposición Genética a la Enfermedad , Humanos , Embarazo
18.
Database (Oxford) ; 20222022 08 12.
Artículo en Inglés | MEDLINE | ID: mdl-35961013

RESUMEN

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.


Asunto(s)
Genómica , Proteínas , Secuencia de Bases , Biología Computacional , Genoma , Anotación de Secuencia Molecular
19.
Cell Syst ; 13(6): 435-437, 2022 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-35709700

RESUMEN

Identifying homologous proteins with divergent amino acid sequences can add to our understanding of protein evolution, structure, and function. A new study reports the development of a deep-network-based method to identify 6.8 million new Pfam members, a dramatic singular increase that exceeds a decade of accumulation using traditional approaches.


Asunto(s)
Proteínas , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/química , Homología de Secuencia de Aminoácido
20.
Hum Genet ; 141(10): 1595-1613, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-34549350

RESUMEN

Whole-exome and whole-genome sequencing studies in autism spectrum disorder (ASD) have identified hundreds of thousands of exonic variants. Only a handful of them, primarily loss-of-function variants, have been shown to increase the risk for ASD, while the contributory roles of other variants, including most missense variants, remain unknown. New approaches that combine tissue-specific molecular profiles with patients' genetic data can thus play an important role in elucidating the functional impact of exonic variation and improve understanding of ASD pathogenesis. Here, we integrate spatio-temporal gene co-expression networks from the developing human brain and protein-protein interaction networks to first reach accurate prioritization of ASD risk genes based on their connectivity patterns with previously known high-confidence ASD risk genes. We subsequently integrate these gene scores with variant pathogenicity predictions to further prioritize individual exonic variants based on the positive-unlabeled learning framework with gene- and variant-score calibration. We demonstrate that this approach discriminates among variants between cases and controls at the high end of the prediction range. Finally, we experimentally validate our top-scoring de novo mutation NP_001243143.1:p.Phe309Ser in the sodium/potassium-transporting ATPase ATP1A3 to disrupt protein binding with different partners.


Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Adenosina Trifosfatasas/genética , Adenosina Trifosfatasas/metabolismo , Trastorno del Espectro Autista/genética , Trastorno Autístico/genética , Predisposición Genética a la Enfermedad , Humanos , Mutación , Potasio/metabolismo , Sodio/metabolismo , ATPasa Intercambiadora de Sodio-Potasio/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...