Your browser doesn't support javascript.
loading
Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease.
Liem, David A; Murali, Sanjana; Sigdel, Dibakar; Shi, Yu; Wang, Xuan; Shen, Jiaming; Choi, Howard; Caufield, John H; Wang, Wei; Ping, Peipei; Han, JiaWei.
Afiliação
  • Liem DA; NIH BD2K Program Centers of Excellence for Big Data Computing-Heart BD2K Center, Departments of Physiology, Medicine/Cardiology, and Bioinformatics, David Geffen School of Medicine, University of California , Los Angeles, California.
  • Murali S; NIH BD2K Program Centers of Excellence for Big Data Computing-Heart BD2K Center, Departments of Physiology, Medicine/Cardiology, and Bioinformatics, David Geffen School of Medicine, University of California , Los Angeles, California.
  • Sigdel D; NIH BD2K Program Centers of Excellence for Big Data Computing-Heart BD2K Center, Departments of Physiology, Medicine/Cardiology, and Bioinformatics, David Geffen School of Medicine, University of California , Los Angeles, California.
  • Shi Y; NIH BD2K Program Centers of Excellence for Big Data Computing-KnowEng Center, Department of Computer Science, University of Illinois at Urbana-Champaign , Champaign, Illinois.
  • Wang X; NIH BD2K Program Centers of Excellence for Big Data Computing-KnowEng Center, Department of Computer Science, University of Illinois at Urbana-Champaign , Champaign, Illinois.
  • Shen J; NIH BD2K Program Centers of Excellence for Big Data Computing-KnowEng Center, Department of Computer Science, University of Illinois at Urbana-Champaign , Champaign, Illinois.
  • Choi H; NIH BD2K Program Centers of Excellence for Big Data Computing-Heart BD2K Center, Departments of Physiology, Medicine/Cardiology, and Bioinformatics, David Geffen School of Medicine, University of California , Los Angeles, California.
  • Caufield JH; NIH BD2K Program Centers of Excellence for Big Data Computing-Heart BD2K Center, Departments of Physiology, Medicine/Cardiology, and Bioinformatics, David Geffen School of Medicine, University of California , Los Angeles, California.
  • Wang W; NIH BD2K Program Centers of Excellence for Big Data Computing-Heart BD2K Center, Heart Big Data to Knowledge Center, Department of Computer Science, Scalable Analytics Institute, Henry Samueli School of Engineering and Applied Science, University of California , Los Angeles, California.
  • Ping P; NIH BD2K Program Centers of Excellence for Big Data Computing-Heart BD2K Center, Departments of Physiology, Medicine/Cardiology, and Bioinformatics, David Geffen School of Medicine, University of California , Los Angeles, California.
  • Han J; NIH BD2K Program Centers of Excellence for Big Data Computing-Heart BD2K Center, Heart Big Data to Knowledge Center, Department of Computer Science, Scalable Analytics Institute, Henry Samueli School of Engineering and Applied Science, University of California , Los Angeles, California.
Am J Physiol Heart Circ Physiol ; 315(4): H910-H924, 2018 10 01.
Article em En | MEDLINE | ID: mdl-29775406
ABSTRACT
Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. Using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely, ischemic heart disease, cardiomyopathies, cerebrovascular accident, congenital heart disease, arrhythmias, and valve disease, anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the 6 groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-Aware Semantic Online Analytical Processing was applied to semantically rank the association of proteins to each CVD and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein was visualized as a six-dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins, whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all 6 CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely, insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters predominantly associated with a targeted CVD; analyses of these proteins revealed unexpected insights underlying the key ECM-related molecular pathogenesis of each CVD, including virus assembly and release in arrhythmias. NEW & NOTEWORTHY The present study is the first application of a text-mining algorithm to characterize the relationships of 709 extracellular matrix-related proteins with 6 categories of cardiovascular disease described in 1,099,254 abstracts. Our analysis informed unexpected extracellular matrix functions, pathways, and molecular relationships implicated in the six cardiovascular diseases.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Reconhecimento Automatizado de Padrão / Doenças Cardiovasculares / Proteínas da Matriz Extracelular / Matriz Extracelular / Mineração de Dados / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Am J Physiol Heart Circ Physiol Assunto da revista: CARDIOLOGIA / FISIOLOGIA Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Reconhecimento Automatizado de Padrão / Doenças Cardiovasculares / Proteínas da Matriz Extracelular / Matriz Extracelular / Mineração de Dados / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Am J Physiol Heart Circ Physiol Assunto da revista: CARDIOLOGIA / FISIOLOGIA Ano de publicação: 2018 Tipo de documento: Article