Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Bioinform Adv ; 4(1): vbae043, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38545087

RESUMEN

We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requirements include a small number of standard Python libraries, making CAFA-evaluator easy to maintain. The code replicates the Critical Assessment of protein Function Annotation (CAFA) benchmarking, which evaluates predictions of the consistent subgraphs in Gene Ontology. Owing to its reliability and accuracy, the organizers have selected CAFA-evaluator as the official CAFA evaluation software. Availability and implementation: https://pypi.org/project/cafaeval.

2.
Pac Symp Biocomput ; 28: 209-220, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540978

RESUMEN

Racial and ethnic disparities in adverse pregnancy outcomes (APOs) have been well-documented in the United States, but the extent to which the disparities are present in high-risk subgroups have not been studied. To address this problem, we first applied association rule mining to the clinical data derived from the prospective nuMoM2b study cohort to identify subgroups at increased risk of developing four APOs (gestational diabetes, hypertension acquired during pregnancy, preeclampsia, and preterm birth). We then quantified racial/ethnic disparities within the cohort as well as within high-risk subgroups to assess potential effects of risk-reduction strategies. We identify significant differences in distributions of major risk factors across racial/ethnic groups and find surprising heterogeneity in APO prevalence across these populations, both in the cohort and in its high-risk subgroups. Our results suggest that risk-reducing strategies that simultaneously reduce disparities may require targeting of high-risk subgroups with considerations for the population context.


Asunto(s)
Resultado del Embarazo , Nacimiento Prematuro , Embarazo , Femenino , Recién Nacido , Humanos , Estados Unidos , Nacimiento Prematuro/epidemiología , Nacimiento Prematuro/etiología , Estudios Prospectivos , Biología Computacional , Factores de Riesgo
3.
JAMA Netw Open ; 5(8): e2229158, 2022 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-36040739

RESUMEN

Importance: Polygenic risk scores (PRS) for type 2 diabetes (T2D) can improve risk prediction for gestational diabetes (GD), yet the strength of the association between genetic and lifestyle risk factors has not been quantified. Objective: To assess the association of PRS and physical activity in existing GD risk models and identify patient subgroups who may receive the most benefits from a PRS or physical activity intervention. Design, Settings, and Participants: The Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be cohort was established to study individuals without previous pregnancy lasting at least 20 weeks (nulliparous) and to elucidate factors associated with adverse pregnancy outcomes. A subcohort of 3533 participants with European ancestry was used for risk assessment and performance evaluation. Participants were enrolled from October 5, 2010, to December 3, 2013, and underwent genotyping between February 19, 2019, and February 28, 2020. Data were analyzed from September 15, 2020, to November 10, 2021. Exposures: Self-reported total physical activity in early pregnancy was quantified as metabolic equivalents of task (METs). Polygenic risk scores were calculated for T2D using contributions of 84 single nucleotide variants, weighted by their association in the Diabetes Genetics Replication and Meta-analysis Consortium data. Main Outcomes and Measures: Estimation of the development of GD from clinical, genetic, and environmental variables collected in early pregnancy, assessed using measures of model discrimination. Odds ratios and positive likelihood ratios were used to evaluate the association of PRS and physical activity with GD risk. Results: A total of 3533 women were included in this analysis (mean [SD] age, 28.6 [4.9] years). In high-risk population subgroups (body mass index ≥25 or aged ≥35 years), individuals with high PRS (top 25th percentile) or low activity levels (METs <450) had increased odds of a GD diagnosis of 25% to 75%. Compared with the general population, participants with both high PRS and low activity levels had higher odds of a GD diagnosis (odds ratio, 3.4 [95% CI, 2.3-5.3]), whereas participants with low PRS and high METs had significantly reduced risk of a GD diagnosis (odds ratio, 0.5 [95% CI, 0.3-0.9]; P = .01). Conclusions and Relevance: In this cohort study, the addition of PRS was associated with the stratified risk of GD diagnosis among high-risk patient subgroups, suggesting the benefits of targeted PRS ascertainment to encourage early intervention.


Asunto(s)
Diabetes Mellitus Tipo 2 , Diabetes Gestacional , Adulto , Estudios de Cohortes , Diabetes Mellitus Tipo 2/epidemiología , Diabetes Mellitus Tipo 2/genética , Diabetes Gestacional/epidemiología , Diabetes Gestacional/genética , Ejercicio Físico , Femenino , Predisposición Genética a la Enfermedad , Humanos , Embarazo
4.
Bioinform Adv ; 2(1): vbac057, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36699361

RESUMEN

Motivation: Experimental biologists, biocurators, and computational biologists all play a role in characterizing a protein's function. The discovery of protein function in the laboratory by experimental scientists is the foundation of our knowledge about proteins. Experimental findings are compiled in knowledgebases by biocurators to provide standardized, readily accessible, and computationally amenable information. Computational biologists train their methods using these data to predict protein function and guide subsequent experiments. To understand the state of affairs in this ecosystem, centered here around protein function prediction, we surveyed scientists from these three constituent communities. Results: We show that the three communities have common but also idiosyncratic perspectives on the field. Most strikingly, experimentalists rarely use state-of-the-art prediction software, but when presented with predictions, report many to be surprising and useful. Ontologies appear to be highly valued by biocurators, less so by experimentalists and computational biologists, yet controlled vocabularies bridge the communities and simplify the prediction task. Additionally, many software tools are not readily accessible and the predictions presented to the users can be broad and uninformative. We conclude that to meet both the social and technical challenges in the field, a more productive and meaningful interaction between members of the core communities is necessary. Availability and implementation: Data cannot be shared for ethical/privacy reasons. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

5.
Pac Symp Biocomput ; 24: 124-135, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30864316

RESUMEN

Accurately estimating performance accuracy of machine learning classifiers is of fundamental importance in biomedical research with potentially societal consequences upon the deployment of bestperforming tools in everyday life. Although classification has been extensively studied over the past decades, there remain understudied problems when the training data violate the main statistical assumptions relied upon for accurate learning and model characterization. This particularly holds true in the open world setting where observations of a phenomenon generally guarantee its presence but the absence of such evidence cannot be interpreted as the evidence of its absence. Learning from such data is often referred to as positive-unlabeled learning, a form of semi-supervised learning where all labeled data belong to one (say, positive) class. To improve the best practices in the field, we here study the quality of estimated performance in positive-unlabeled learning in the biomedical domain. We provide evidence that such estimates can be wildly inaccurate, depending on the fraction of positive examples in the unlabeled data and the fraction of negative examples mislabeled as positives in the labeled data. We then present correction methods for four such measures and demonstrate that the knowledge or accurate estimates of class priors in the unlabeled data and noise in the labeled data are sufficient for the recovery of true classification performance. We provide theoretical support as well as empirical evidence for the efficacy of the new performance estimation methods.


Asunto(s)
Clasificación/métodos , Aprendizaje Automático , Algoritmos , Biología Computacional/métodos , Simulación por Computador , Humanos , Aprendizaje Automático/estadística & datos numéricos , Modelos Estadísticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...