Búsqueda | OPS/OMS Uruguay

Systematic tissue annotations of genomics samples by modeling unstructured metadata.

Hawkins, Nathaniel T; Maldaver, Marc; Yannakopoulos, Anna; Guare, Lindsay A; Krishnan, Arjun.

Nat Commun ; 13(1): 6736, 2022 11 08.

Artículo en Inglés | MEDLINE | ID: mdl-36347858

RESUMEN

There are currently >1.3 million human -omics samples that are publicly available. This valuable resource remains acutely underused because discovering particular samples from this ever-growing data collection remains a significant challenge. The major impediment is that sample attributes are routinely described using varied terminologies written in unstructured natural language. We propose a natural-language-processing-based machine learning approach (NLP-ML) to infer tissue and cell-type annotations for genomics samples based only on their free-text metadata. NLP-ML works by creating numerical representations of sample descriptions and using these representations as features in a supervised learning classifier that predicts tissue/cell-type terms. Our approach significantly outperforms an advanced graph-based reasoning annotation method (MetaSRA) and a baseline exact string matching method (TAGGER). Model similarities between related tissues demonstrate that NLP-ML models capture biologically-meaningful signals in text. Additionally, these models correctly classify tissue-associated biological processes and diseases based on their text descriptions alone. NLP-ML models are nearly as accurate as models based on gene-expression profiles in predicting sample tissue annotations but have the distinct capability to classify samples irrespective of the genomics experiment type based on their text metadata. Python NLP-ML prediction code and trained tissue models are available at https://github.com/krishnanlab/txt2onto .

Asunto(s)

Metadatos , Procesamiento de Lenguaje Natural , Humanos , Aprendizaje Automático , Genómica , Lenguaje

Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease.

Zhou, Wei; Kanai, Masahiro; Wu, Kuan-Han H; Rasheed, Humaira; Tsuo, Kristin; Hirbo, Jibril B; Wang, Ying; Bhattacharya, Arjun; Zhao, Huiling; Namba, Shinichi; Surakka, Ida; Wolford, Brooke N; Lo Faro, Valeria; Lopera-Maya, Esteban A; Läll, Kristi; Favé, Marie-Julie; Partanen, Juulia J; Chapman, Sinéad B; Karjalainen, Juha; Kurki, Mitja; Maasha, Mutaamba; Brumpton, Ben M; Chavan, Sameer; Chen, Tzu-Ting; Daya, Michelle; Ding, Yi; Feng, Yen-Chen A; Guare, Lindsay A; Gignoux, Christopher R; Graham, Sarah E; Hornsby, Whitney E; Ingold, Nathan; Ismail, Said I; Johnson, Ruth; Laisk, Triin; Lin, Kuang; Lv, Jun; Millwood, Iona Y; Moreno-Grau, Sonia; Nam, Kisung; Palta, Priit; Pandit, Anita; Preuss, Michael H; Saad, Chadi; Setia-Verma, Shefali; Thorsteinsdottir, Unnur; Uzunovic, Jasmina; Verma, Anurag; Zawistowski, Matthew; Zhong, Xue.

Cell Genom ; 2(10): 100192, 2022 Oct 12.

Artículo en Inglés | MEDLINE | ID: mdl-36777996

RESUMEN

Biobanks facilitate genome-wide association studies (GWASs), which have mapped genomic loci across a range of human diseases and traits. However, most biobanks are primarily composed of individuals of European ancestry. We introduce the Global Biobank Meta-analysis Initiative (GBMI)-a collaborative network of 23 biobanks from 4 continents representing more than 2.2 million consented individuals with genetic data linked to electronic health records. GBMI meta-analyzes summary statistics from GWASs generated using harmonized genotypes and phenotypes from member biobanks for 14 exemplar diseases and endpoints. This strategy validates that GWASs conducted in diverse biobanks can be integrated despite heterogeneity in case definitions, recruitment strategies, and baseline characteristics. This collaborative effort improves GWAS power for diseases, benefits understudied diseases, and improves risk prediction while also enabling the nomination of disease genes and drug candidates by incorporating gene and protein expression data and providing insight into the underlying biology of human diseases and traits.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA