RESUMO
Rare diseases pose significant challenges due to their heterogeneity and lack of knowledge. This study develops a comprehensive pipeline interoperable with a document-oriented clinical data warehouse, integrating cohort characterization, patient clustering and interpretation. Leveraging NLP, semantic similarity, machine learning and visualization, the pipeline enables the identification of prevalent phenotype patterns and patient stratification. To enhance interpretability, discriminant phenotypes characterizing each cluster are provided. Users can visually test hypotheses by marking patients exhibiting specific keywords in the EHR like genes, drugs and procedures. Implemented through a web interface, the pipeline enables clinicians to navigate through different modules, discover intricate patterns and generate interpretable insights that may advance rare diseases understanding, guide decision-making, and ultimately improve patient outcomes.
Assuntos
Registros Eletrônicos de Saúde , Fenótipo , Doenças Raras , Humanos , Aprendizado de Máquina , Data Warehousing , Processamento de Linguagem Natural , Análise por Conglomerados , Interface Usuário-ComputadorRESUMO
BACKGROUND: Rare diseases affect approximately 400 million people worldwide. Many of them suffer from delayed diagnosis. Among them, NPHP1-related renal ciliopathies need to be diagnosed as early as possible as potential treatments have been recently investigated with promising results. Our objective was to develop a supervised machine learning pipeline for the detection of NPHP1 ciliopathy patients from a large number of nephrology patients using electronic health records (EHRs). METHODS AND RESULTS: We designed a pipeline combining a phenotyping module re-using unstructured EHR data, a semantic similarity module to address the phenotype dependence, a feature selection step to deal with high dimensionality, an undersampling step to address the class imbalance, and a classification step with multiple train-test split for the small number of rare cases. The pipeline was applied to thirty NPHP1 patients and 7231 controls and achieved good performances (sensitivity 86% with specificity 90%). A qualitative review of the EHRs of 40 misclassified controls showed that 25% had phenotypes belonging to the ciliopathy spectrum, which demonstrates the ability of our system to detect patients with similar conditions. CONCLUSIONS: Our pipeline reached very encouraging performance scores for pre-diagnosing ciliopathy patients. The identified patients could then undergo genetic testing. The same data-driven approach can be adapted to other rare diseases facing underdiagnosis challenges.
Assuntos
Ciliopatias , Doenças Raras , Humanos , Registros Eletrônicos de Saúde , Semântica , Aprendizado de Máquina Supervisionado , Ciliopatias/diagnóstico , Ciliopatias/genética , AlgoritmosRESUMO
Identificar e comparar, a partir das características socio-culturais de vida de alguns agricultores convencionais e orgânicos de Petrópolis, os fatores que dão origem às suas representações de risco sobre os agrotóxicos.