KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data.
Bioinform Adv
; 3(1): vbad100, 2023.
Article
en En
| MEDLINE
| ID: mdl-37565237
Motivation: The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associations between gene loss-of-function and phenotype. To date, the IMPC has identified over 90â000 gene-phenotype associations, but many phenotypes have not yet been measured for each gene, resulting in largely incomplete data; â¼75.6% of association summary statistics are still missing in the latest IMPC summary statistics dataset (IMPC release version 16). Results: To overcome these challenges, we propose KOMPUTE, a novel method for imputing missing summary statistics in the IMPC dataset. Using conditional distribution properties of multivariate normal, KOMPUTE estimates the association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes. Our evaluation of the method using simulated and real-world datasets demonstrates its superiority over the singular value decomposition matrix completion method in various scenarios. Availability and implementation: An R package for KOMPUTE is publicly available at https://github.com/statsleelab/kompute, along with usage examples and results for different phenotype domains at https://statsleelab.github.io/komputeExamples.
Texto completo:
1
Bases de datos:
MEDLINE
Tipo de estudio:
Prognostic_studies
Idioma:
En
Revista:
Bioinform Adv
Año:
2023
Tipo del documento:
Article
País de afiliación:
Estados Unidos