RESUMO
MOTIVATION: Circulating-cell free DNA (cfDNA) is widely explored as a non-invasive biomarker for cancer screening and diagnosis. The ability to decode the cells of origin in cfDNA would provide biological insights into pathophysiological mechanisms, aiding in cancer characterization and directing clinical management and follow-up. RESULTS: We developed a DNA methylation signature-based deconvolution algorithm, MetDecode, for cancer tissue origin identification. We built a reference atlas exploiting de novo and published whole-genome methylation sequencing data for colorectal, breast, ovarian and cervical cancer, and blood-cell-derived entities. MetDecode models the contributors absent in the atlas with methylation patterns learnt on-the-fly from the input cfDNA methylation profiles. Additionally, our model accounts for the coverage of each marker region to alleviate potential sources of noise. In-silico experiments showed a limit of detection down to 2.88% of tumour tissue contribution in cfDNA. MetDecode produced Pearson correlation coefficients above 0.95 and outperformed other methods in simulations (p < 0.001; T-test; one-sided). In plasma cfDNA profiles from cancer patients, MetDecode assigned the correct tissue-of-origin in 84.2% of cases. In conclusion, MetDecode can unravel alterations in the cfDNA pool components by accurately estimating the contribution of multiple tissues, while supplied with an imperfect reference atlas. AVAILABILITY: MetDecode is available at https://github.com/JorisVermeeschLab/MetDecode. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Circulating cell-free DNA (cfDNA) fragments have characteristics that are specific to the cell types that release them. Current methods for cfDNA deconvolution typically use disease tailored marker selection in a limited number of bulk tissues or cell lines. Here, we utilize single cell transcriptome data as a comprehensive cellular reference set for disease-agnostic cfDNA cell-of-origin analysis. We correlate cfDNA-inferred nucleosome spacing with gene expression to rank the relative contribution of over 490 cell types to plasma cfDNA. In 744 healthy individuals and patients, we uncover cell type signatures in support of emerging disease paradigms in oncology and prenatal care. We train predictive models that can differentiate patients with colorectal cancer (84.7%), early-stage breast cancer (90.1%), multiple myeloma (AUC 95.0%), and preeclampsia (88.3%) from matched controls. Importantly, our approach performs well in ultra-low coverage cfDNA datasets and can be readily transferred to diverse clinical settings for the expansion of liquid biopsy.