RESUMO
Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.
Assuntos
Proteínas Fúngicas/genética , Genoma Fúngico , Malassezia/genética , Anotação de Sequência Molecular/métodos , Proteogenômica/métodos , Genes Fúngicos , Genoma Mitocondrial , Peptídeos/genética , Domínios Proteicos , Análise de Sequência de RNARESUMO
PURPOSE: We developed a method to monitor copy number variations (CNV) in plasma cell-free DNA (cfDNA) from patients with metastatic squamous non-small cell lung cancer (NSCLC). We aimed to explore the association between tumor-derived cfDNA and clinical outcomes, and sought CNVs that may suggest potential resistance mechanisms. EXPERIMENTAL DESIGN: Sensitivity and specificity of low-pass whole-genome sequencing (LP-WGS) were first determined using cell line DNA and cfDNA. LP-WGS was performed on baseline and longitudinal cfDNA of 152 patients with squamous NSCLC treated with chemotherapy, or in combination with pictilisib, a pan-PI3K inhibitor. cfDNA tumor fraction and detected CNVs were analyzed in association with clinical outcomes. RESULTS: LP-WGS successfully detected CNVs in cfDNA with tumor fraction ≥10%, which represented approximately 30% of the first-line NSCLC patients in this study. The most frequent CNVs were gains in chromosome 3q, which harbors the PIK3CA and SOX2 oncogenes. The CNV landscape in cfDNA with a high tumor fraction generally matched that of corresponding tumor tissue. Tumor fraction in cfDNA was dynamic during treatment, and increases in tumor fraction and corresponding CNVs could be detected before radiographic progression in 7 of 12 patients. Recurrent CNVs, such as MYC amplification, were enriched in cfDNA from posttreatment samples compared with the baseline, suggesting a potential resistance mechanism to pictilisib. CONCLUSIONS: LP-WGS offers an unbiased and high-throughput way to investigate CNVs and tumor fraction in cfDNA of patients with cancer. It may also be valuable for monitoring treatment response, detecting disease progression early, and identifying emergent clones associated with therapeutic resistance.