RESUMO
We present CELL-E 2, a novel bidirectional transformer that can generate images depicting protein subcellular localization from the amino acid sequences (and vice versa). Protein localization is a challenging problem that requires integrating sequence and image information, which most existing methods ignore. CELL-E 2 extends the work of CELL-E, not only capturing the spatial complexity of protein localization and produce probability estimates of localization atop a nucleus image, but also being able to generate sequences from images, enabling de novo protein design. We train and finetune CELL-E 2 on two large-scale datasets of human proteins. We also demonstrate how to use CELL-E 2 to create hundreds of novel nuclear localization signals (NLS). Results and interactive demos are featured at https://bohuanglab.github.io/CELL-E_2/.
RESUMO
Circulating cell-free DNA from blood plasma of cancer patients can be used to non-invasively interrogate somatic tumor alterations. Here we develop MSK-ACCESS (Memorial Sloan Kettering - Analysis of Circulating cfDNA to Examine Somatic Status), an NGS assay for detection of very low frequency somatic alterations in 129 genes. Analytical validation demonstrated 92% sensitivity in de-novo mutation calling down to 0.5% allele frequency and 99% for a priori mutation profiling. To evaluate the performance of MSK-ACCESS, we report results from 681 prospective blood samples that underwent clinical analysis to guide patient management. Somatic alterations are detected in 73% of the samples, 56% of which have clinically actionable alterations. The utilization of matched normal sequencing allows retention of somatic alterations while removing over 10,000 germline and clonal hematopoiesis variants. Our experience illustrates the importance of analyzing matched normal samples when interpreting cfDNA results and highlights the importance of cfDNA as a genomic profiling source for cancer patients.