Your browser doesn't support javascript.
loading
Correlation Analysis of Variables From the Atherosclerosis Risk in Communities Study.
Mandal, Meisha; Levy, Josh; Ives, Cataia; Hwang, Stephen; Zhou, Yi-Hui; Motsinger-Reif, Alison; Pan, Huaqin; Huggins, Wayne; Hamilton, Carol; Wright, Fred; Edwards, Stephen.
Afiliação
  • Mandal M; GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, United States.
  • Levy J; Levy Informatics, Chapel Hill, NC, United States.
  • Ives C; GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, United States.
  • Hwang S; GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, United States.
  • Zhou YH; Department of Statistics, North Carolina State University, Raleigh, NC, United States.
  • Motsinger-Reif A; Bioinformatics Research Center and Department of Biological Sciences, North Carolina State University, Raleigh, NC, United States.
  • Pan H; Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, United States.
  • Huggins W; GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, United States.
  • Hamilton C; GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, United States.
  • Wright F; GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, United States.
  • Edwards S; Department of Statistics, North Carolina State University, Raleigh, NC, United States.
Front Pharmacol ; 13: 883433, 2022.
Article em En | MEDLINE | ID: mdl-35899108
ABSTRACT
The need to test chemicals in a timely and cost-effective manner has driven the development of new alternative methods (NAMs) that utilize in silico and in vitro approaches for toxicity prediction. There is a wealth of existing data from human studies that can aid in understanding the ability of NAMs to support chemical safety assessment. This study aims to streamline the integration of data from existing human cohorts by programmatically identifying related variables within each study. Study variables from the Atherosclerosis Risk in Communities (ARIC) study were clustered based on their correlation within the study. The quality of the clusters was evaluated via a combination of manual review and natural language processing (NLP). We identified 391 clusters including 3,285 variables. Manual review of the clusters containing more than one variable determined that human reviewers considered 95% of the clusters related to some degree. To evaluate potential bias in the human reviewers, clusters were also scored via NLP, which showed a high concordance with the human classification. Clusters were further consolidated into cluster groups using the Louvain community finding algorithm. Manual review of the cluster groups confirmed that clusters within a group were more related than clusters from different groups. Our data-driven approach can facilitate data harmonization and curation efforts by providing human annotators with groups of related variables reflecting the themes present in the data. Reviewing groups of related variables should increase efficiency of the human review, and the number of variables reviewed can be reduced by focusing curator attention on variable groups whose theme is relevant for the topic being studied.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos