Your browser doesn't support javascript.
loading
Literature mining discerns latent disease-gene relationships.
Rai, Priyadarshini; Jain, Atishay; Kumar, Shivani; Sharma, Divya; Jha, Neha; Chawla, Smriti; Raj, Abhijit; Gupta, Apoorva; Poonia, Sarita; Majumdar, Angshul; Chakraborty, Tanmoy; Ahuja, Gaurav; Sengupta, Debarka.
Afiliação
  • Rai P; Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
  • Jain A; Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
  • Kumar S; Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
  • Sharma D; Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
  • Jha N; Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
  • Chawla S; Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
  • Raj A; Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
  • Gupta A; Department of Biotechnology, Delhi Technological University, Shahbad Daulatpur, Delhi 110042, India.
  • Poonia S; Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
  • Majumdar A; IAI, TCG CREST, Kolkata 700091, India.
  • Chakraborty T; Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi 110016, India.
  • Ahuja G; Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi 110016, India.
  • Sengupta D; Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla Phase III, New Delhi 110020, India.
Bioinformatics ; 40(4)2024 Mar 29.
Article em En | MEDLINE | ID: mdl-38608194
ABSTRACT
MOTIVATION Dysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape project, researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates.

RESULTS:

To circumvent this, we mined ∼18 million PubMed abstracts published till May 2019 and automatically selected ∼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained bidirectional encoder representations from transformers (BERT) for language modeling from the domain of natural language processing to learn vector representation of entities such as genes, diseases, tissues, cell-types, etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in silico synthesis of hypotheses linking different biological entities such as genes and conditions. AVAILABILITY AND IMPLEMENTATION PathoBERT pretrained model https//github.com/Priyadarshini-Rai/Pathomap-Model. BioSentVec-based abstract classification model https//github.com/Priyadarshini-Rai/Pathomap-Model. Pathomap R package https//github.com/Priyadarshini-Rai/Pathomap.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Mineração de Dados Limite: Humans Idioma: En Revista: Bioinformatics Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Mineração de Dados Limite: Humans Idioma: En Revista: Bioinformatics Ano de publicação: 2024 Tipo de documento: Article