DEEPOMICS FFPE, a deep neural network model, identifies DNA sequencing artifacts from formalin fixed paraffin embedded tissue with high accuracy.

Heo, Dong-Hyuk; Kim, Inyoung; Seo, Heejae; Kim, Seong-Gwang; Kim, Minji; Park, Jiin; Park, Hongsil; Kang, Seungmo; Kim, Juhee; Paik, Soonmyung; Hong, Seong-Eui

Heo, Dong-Hyuk; Kim, Inyoung; Seo, Heejae; Kim, Seong-Gwang; Kim, Minji; Park, Jiin; Park, Hongsil; Kang, Seungmo; Kim, Juhee; Paik, Soonmyung; Hong, Seong-Eui.

Afiliação

Heo DH; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Kim I; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Seo H; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Kim SG; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Kim M; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Park J; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Park H; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Kang S; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Kim J; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Paik S; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea.
Hong SE; Theragen Bio Co., Ltd., Seongnam, Gyeonggi-do, 13488, Republic of Korea. seongeui.hong@theragenbio.com.

Sci Rep ; 14(1): 2559, 2024 01 31.

Article em En | MEDLINE | ID: mdl-38297116

ABSTRACT

ABSTRACT

Formalin-fixed, paraffin-embedded (FFPE) tissue specimens are routinely used in pathological diagnosis, but their large number of artifactual mutations complicate the evaluation of companion diagnostics and analysis of next-generation sequencing data. Identification of variants with low allele frequencies is challenging because existing FFPE filtering tools label all low-frequency variants as artifacts. To address this problem, we aimed to develop DEEPOMICS FFPE, an AI model that can classify a true variant from an artifact. Paired whole exome sequencing data from fresh frozen and FFPE samples from 24 tumors were obtained from public sources and used as training and validation sets at a ratio of 73. A deep neural network model with three hidden layers was trained with input features using outputs of the MuTect2 caller. Contributing features were identified using the SHapley Additive exPlanations algorithm and optimized based on training results. The performance of the final model (DEEPOMICS FFPE) was compared with those of existing models (MuTect filter, FFPolish, and SOBDetector) by using well-defined test datasets. We found 41 discriminating properties for FFPE artifacts. Optimization of property quantification improved the model performance. DEEPOMICS FFPE removed 99.6% of artifacts while maintaining 87.1% of true variants, with an F1-score of 88.3 in the entire dataset not used for training, which is significantly higher than those of existing tools. Its performance was maintained even for low-allele-fraction variants with a specificity of 0.995, suggesting that it can be used to identify subclonal variants. Different from existing methods, DEEPOMICS FFPE identified most of the sequencing artifacts in the FFPE samples while retaining more of true variants, including those of low allele frequencies. The newly developed tool DEEPOMICS FFPE may be useful in designing capture panels for personalized circulating tumor DNA assay and identifying candidate neoepitopes for personalized vaccine design. DEEPOMICS FFPE is freely available on the web ( http//deepomics.co.kr/ffpe ) for research.

Assuntos

Artefatos; Formaldeído; Inclusão em Parafina; Fixação de Tecidos/métodos; Análise de Sequência de DNA; Sequenciamento de Nucleotídeos em Larga Escala/métodos; Redes Neurais de Computação

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Artefatos / Formaldeído Tipo de estudo: Prognostic_studies Idioma: En Revista: Sci Rep Ano de publicação: 2024 Tipo de documento: Article País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google