Building Cancer Diagnosis Text to OncoTree Mapping Pipelines for Clinical Sequencing Data Integration and Curation.
AMIA Jt Summits Transl Sci Proc
; 2020: 440-448, 2020.
Article
em En
| MEDLINE
| ID: mdl-32477665
Precision oncology research seeks to derive knowledge from existing data. Current work seeks to integrate clinical and genomic data across cancer centers to enable impactful secondary use. However, integrated data reliability depends on the data curation method used and its systematicity. In practice, data integration and mapping are often done manually even though crucial data such as oncological diagnoses (DX) show varying accuracy and specificity levels. We hypothesized that mapping of text-form cancer DX to a standardized terminology (OncoTree) could be automated using existing methods (e.g. natural language processing (NLP) modules and application programming interfaces [APIs]). We found that our best-performing pipeline prototype was effective but limited by API development limitations (accurately mapped 96.2% of textual DX dataset to NCI Thesaurus (NCIt), 44.2% through NCIt to OncoTree). These results suggest the pipeline model could be viable to automate data curation. Such techniques may become increasingly more reliable with further development.
Texto completo:
1
Base de dados:
MEDLINE
Tipo de estudo:
Diagnostic_studies
Idioma:
En
Ano de publicação:
2020
Tipo de documento:
Article