Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Sci Rep ; 10(1): 4542, 2020 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-32161279

RESUMO

A major challenge in radiomics is assembling data from multiple centers. Sharing data between hospitals is restricted by legal and ethical regulations. Distributed learning is a technique, enabling training models on multicenter data without data leaving the hospitals ("privacy-preserving" distributed learning). This study tested feasibility of distributed learning of radiomics data for prediction of two year overall survival and HPV status in head and neck cancer (HNC) patients. Pretreatment CT images were collected from 1174 HNC patients in 6 different cohorts. 981 radiomic features were extracted using Z-Rad software implementation. Hierarchical clustering was performed to preselect features. Classification was done using logistic regression. In the validation dataset, the receiver operating characteristics (ROC) were compared between the models trained in the centralized and distributed manner. No difference in ROC was observed with respect to feature selection. The logistic regression coefficients were identical between the methods (absolute difference <10-7). In comparison of the full workflow (feature selection and classification), no significant difference in ROC was found between centralized and distributed models for both studied endpoints (DeLong p > 0.05). In conclusion, both feature selection and classification are feasible in a distributed manner using radiomics data, which opens new possibility for training more reliable radiomics models.


Assuntos
Confiabilidade dos Dados , Aprendizado Profundo , Neoplasias de Cabeça e Pescoço/mortalidade , Papillomaviridae/isolamento & purificação , Infecções por Papillomavirus/complicações , Privacidade , Tomografia Computadorizada por Raios X/métodos , Neoplasias de Cabeça e Pescoço/diagnóstico por imagem , Neoplasias de Cabeça e Pescoço/virologia , Humanos , Interpretação de Imagem Assistida por Computador , Infecções por Papillomavirus/virologia , Prognóstico , Curva ROC , Estudos Retrospectivos , Taxa de Sobrevida
2.
Radiother Oncol ; 144: 189-200, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31911366

RESUMO

BACKGROUND AND PURPOSE: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute. MATERIALS AND METHODS: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots. RESULTS: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015. CONCLUSION: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy.


Assuntos
Neoplasias Pulmonares , Aprendizado de Máquina , Algoritmos , China , Humanos , Privacidade
3.
Sci Data ; 6(1): 218, 2019 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-31641134

RESUMO

Prediction modelling with radiomics is a rapidly developing research topic that requires access to vast amounts of imaging data. Methods that work on decentralized data are urgently needed, because of concerns about patient privacy. Previously published computed tomography medical image sets with gross tumour volume (GTV) outlines for non-small cell lung cancer have been updated with extended follow-up. In a previous study, these were referred to as Lung1 (n = 421) and Lung2 (n = 221). The Lung1 dataset is made publicly accessible via The Cancer Imaging Archive (TCIA; https://www.cancerimagingarchive.net ). We performed a decentralized multi-centre study to develop a radiomic signature (hereafter "ZS2019") in one institution and validated the performance in an independent institution, without the need for data exchange and compared this to an analysis where all data was centralized. The performance of ZS2019 for 2-year overall survival validated in distributed radiomics was not statistically different from the centralized validation (AUC 0.61 vs 0.61; p = 0.52). Although slightly different in terms of data and methods, no statistically significant difference in performance was observed between the new signature and previous work (c-index 0.58 vs 0.65; p = 0.37). Our objective was not the development of a new signature with the best performance, but to suggest an approach for distributed radiomics. Therefore, we used a similar method as an earlier study. We foresee that the Lung1 dataset can be further re-used for testing radiomic models and investigating feature reproducibility.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/diagnóstico por imagem , Neoplasias Pulmonares/diagnóstico por imagem , Conjuntos de Dados como Assunto , Humanos , Tomografia Computadorizada por Raios X
4.
Bioinformatics ; 35(20): 4072-4080, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30903692

RESUMO

MOTIVATION: In a predictive modeling setting, if sufficient details of the system behavior are known, one can build and use a simulation for making predictions. When sufficient system details are not known, one typically turns to machine learning, which builds a black-box model of the system using a large dataset of input sample features and outputs. We consider a setting which is between these two extremes: some details of the system mechanics are known but not enough for creating simulations that can be used to make high quality predictions. In this context we propose using approximate simulations to build a kernel for use in kernelized machine learning methods, such as support vector machines. The results of multiple simulations (under various uncertainty scenarios) are used to compute similarity measures between every pair of samples: sample pairs are given a high similarity score if they behave similarly under a wide range of simulation parameters. These similarity values, rather than the original high dimensional feature data, are used to build the kernel. RESULTS: We demonstrate and explore the simulation-based kernel (SimKern) concept using four synthetic complex systems-three biologically inspired models and one network flow optimization model. We show that, when the number of training samples is small compared to the number of features, the SimKern approach dominates over no-prior-knowledge methods. This approach should be applicable in all disciplines where predictive models are sought and informative yet approximate simulations are available. AVAILABILITY AND IMPLEMENTATION: The Python SimKern software, the demonstration models (in MATLAB, R), and the datasets are available at https://github.com/davidcraft/SimKern. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Software , Máquina de Vetores de Suporte
6.
Med Phys ; 45(7): 3449-3459, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29763967

RESUMO

PURPOSE: Machine learning classification algorithms (classifiers) for prediction of treatment response are becoming more popular in radiotherapy literature. General Machine learning literature provides evidence in favor of some classifier families (random forest, support vector machine, gradient boosting) in terms of classification performance. The purpose of this study is to compare such classifiers specifically for (chemo)radiotherapy datasets and to estimate their average discriminative performance for radiation treatment outcome prediction. METHODS: We collected 12 datasets (3496 patients) from prior studies on post-(chemo)radiotherapy toxicity, survival, or tumor control with clinical, dosimetric, or blood biomarker features from multiple institutions and for different tumor sites, that is, (non-)small-cell lung cancer, head and neck cancer, and meningioma. Six common classification algorithms with built-in feature selection (decision tree, random forest, neural network, support vector machine, elastic net logistic regression, LogitBoost) were applied on each dataset using the popular open-source R package caret. The R code and documentation for the analysis are available online (https://github.com/timodeist/classifier_selection_code). All classifiers were run on each dataset in a 100-repeated nested fivefold cross-validation with hyperparameter tuning. Performance metrics (AUC, calibration slope and intercept, accuracy, Cohen's kappa, and Brier score) were computed. We ranked classifiers by AUC to determine which classifier is likely to also perform well in future studies. We simulated the benefit for potential investigators to select a certain classifier for a new dataset based on our study (pre-selection based on other datasets) or estimating the best classifier for a dataset (set-specific selection based on information from the new dataset) compared with uninformed classifier selection (random selection). RESULTS: Random forest (best in 6/12 datasets) and elastic net logistic regression (best in 4/12 datasets) showed the overall best discrimination, but there was no single best classifier across datasets. Both classifiers had a median AUC rank of 2. Preselection and set-specific selection yielded a significant average AUC improvement of 0.02 and 0.02 over random selection with an average AUC rank improvement of 0.42 and 0.66, respectively. CONCLUSION: Random forest and elastic net logistic regression yield higher discriminative performance in (chemo)radiotherapy outcome and toxicity prediction than other studied classifiers. Thus, one of these two classifiers should be the first choice for investigators when building classification models or to benchmark one's own modeling results against. Our results also show that an informed preselection of classifiers based on existing datasets can improve discrimination over random selection.


Assuntos
Quimiorradioterapia/métodos , Aprendizado de Máquina , Neoplasias/diagnóstico , Neoplasias/radioterapia , Área Sob a Curva , Quimiorradioterapia/efeitos adversos , Árvores de Decisões , Humanos , Modelos Logísticos , Neoplasias/mortalidade , Redes Neurais de Computação , Prognóstico , Software
7.
Nat Rev Clin Oncol ; 14(12): 749-762, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28975929

RESUMO

Radiomics, the high-throughput mining of quantitative image features from standard-of-care medical imaging that enables data to be extracted and applied within clinical-decision support systems to improve diagnostic, prognostic, and predictive accuracy, is gaining importance in cancer research. Radiomic analysis exploits sophisticated image analysis tools and the rapid development and validation of medical imaging data that uses image-based signatures for precision diagnosis and treatment, providing a powerful tool in modern medicine. Herein, we describe the process of radiomics, its pitfalls, challenges, opportunities, and its capacity to improve clinical decision making, emphasizing the utility for patients with cancer. Currently, the field of radiomics lacks standardized evaluation of both the scientific integrity and the clinical relevance of the numerous published radiomics investigations resulting from the rapid growth of this area. Rigorous evaluation criteria and reporting guidelines need to be established in order for radiomics to mature as a discipline. Herein, we provide guidance for investigations to meet this urgent need in the field of radiomics.


Assuntos
Mineração de Dados/métodos , Técnicas de Apoio para a Decisão , Diagnóstico por Imagem/métodos , Neoplasias/diagnóstico por imagem , Neoplasias/terapia , Medicina de Precisão/métodos , Tomada de Decisão Clínica , Difusão de Inovações , Humanos , Neoplasias/patologia , Modelagem Computacional Específica para o Paciente , Valor Preditivo dos Testes , Prognóstico
8.
Int J Radiat Oncol Biol Phys ; 99(2): 344-352, 2017 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-28871984

RESUMO

PURPOSE: Tools for survival prediction for non-small cell lung cancer (NSCLC) patients treated with chemoradiation or radiation therapy are of limited quality. In this work, we developed a predictive model of survival at 2 years. The model is based on a large volume of historical patient data and serves as a proof of concept to demonstrate the distributed learning approach. METHODS AND MATERIALS: Clinical data from 698 lung cancer patients, treated with curative intent with chemoradiation or radiation therapy alone, were collected and stored at 2 different cancer institutes (559 patients at Maastro clinic (Netherlands) and 139 at Michigan university [United States]). The model was further validated on 196 patients originating from The Christie (United Kingdon). A Bayesian network model was adapted for distributed learning (the animation can be viewed at https://www.youtube.com/watch?v=ZDJFOxpwqEA). Two-year posttreatment survival was chosen as the endpoint. The Maastro clinic cohort data are publicly available at https://www.cancerdata.org/publication/developing-and-validating-survival-prediction-model-nsclc-patients-through-distributed, and the developed models can be found at www.predictcancer.org. RESULTS: Variables included in the final model were T and N category, age, performance status, and total tumor dose. The model has an area under the curve (AUC) of 0.66 on the external validation set and an AUC of 0.62 on a 5-fold cross validation. A model based on the T and N category performed with an AUC of 0.47 on the validation set, significantly worse than our model (P<.001). Learning the model in a centralized or distributed fashion yields a minor difference on the probabilities of the conditional probability tables (0.6%); the discriminative performance of the models on the validation set is similar (P=.26). CONCLUSIONS: Distributed learning from federated databases allows learning of predictive models on data originating from multiple institutions while avoiding many of the data-sharing barriers. We believe that distributed learning is the future of sharing data in health care.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/mortalidade , Carcinoma Pulmonar de Células não Pequenas/terapia , Aprendizagem , Neoplasias Pulmonares/mortalidade , Neoplasias Pulmonares/terapia , Fatores Etários , Idoso , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Área Sob a Curva , Teorema de Bayes , Quimiorradioterapia/mortalidade , Estudos de Coortes , Bases de Dados Factuais/estatística & dados numéricos , Feminino , Previsões/métodos , Humanos , Estimativa de Kaplan-Meier , Linfonodos/patologia , Masculino , Modelos Estatísticos , Estadiamento de Neoplasias/normas , Radioterapia Conformacional/mortalidade , Índice de Gravidade de Doença , Fatores de Tempo
9.
Adv Drug Deliv Rev ; 109: 131-153, 2017 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-26774327

RESUMO

A paradigm shift from current population based medicine to personalized and participative medicine is underway. This transition is being supported by the development of clinical decision support systems based on prediction models of treatment outcome. In radiation oncology, these models 'learn' using advanced and innovative information technologies (ideally in a distributed fashion - please watch the animation: http://youtu.be/ZDJFOxpwqEA) from all available/appropriate medical data (clinical, treatment, imaging, biological/genetic, etc.) to achieve the highest possible accuracy with respect to prediction of tumor response and normal tissue toxicity. In this position paper, we deliver an overview of the factors that are associated with outcome in radiation oncology and discuss the methodology behind the development of accurate prediction models, which is a multi-faceted process. Subsequent to initial development/validation and clinical introduction, decision support systems should be constantly re-evaluated (through quality assurance procedures) in different patient datasets in order to refine and re-optimize the models, ensuring the continuous utility of the models. In the reasonably near future, decision support systems will be fully integrated within the clinic, with data and knowledge being shared in a standardized, dynamic, and potentially global manner enabling truly personalized and participative medicine.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Neoplasias/radioterapia , Medicina de Precisão/métodos , Radioterapia (Especialidade)/métodos , Humanos , Neoplasias/diagnóstico , Resultado do Tratamento
10.
Clin Transl Radiat Oncol ; 4: 24-31, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29594204

RESUMO

Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identifiable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible. We developed and implemented an IT infrastructure in five radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future 'big data' infrastructures and distributed learning studies. Lung cancer patient data was collected in all five locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade [Formula: see text]. The discriminative performance was assessed by the area under the curve (AUC) in a five-fold cross-validation (learning on four sites and validating on the fifth). The performance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in five radiation clinics across three countries. SVM models can be learned on data distributed over all five clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufficient variation will pave the way for generalizable prediction models and personalized medicine.

11.
Radiother Oncol ; 121(3): 459-467, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-28029405

RESUMO

PURPOSE: One of the major hurdles in enabling personalized medicine is obtaining sufficient patient data to feed into predictive models. Combining data originating from multiple hospitals is difficult because of ethical, legal, political, and administrative barriers associated with data sharing. In order to avoid these issues, a distributed learning approach can be used. Distributed learning is defined as learning from data without the data leaving the hospital. PATIENTS AND METHODS: Clinical data from 287 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected from and stored in 5 different medical institutes (123 patients at MAASTRO (Netherlands, Dutch), 24 at Jessa (Belgium, Dutch), 34 at Liege (Belgium, Dutch and French), 48 at Aachen (Germany, German) and 58 at Eindhoven (Netherlands, Dutch)). A Bayesian network model is adapted for distributed learning (watch the animation: http://youtu.be/nQpqMIuHyOk). The model predicts dyspnea, which is a common side effect after radiotherapy treatment of lung cancer. RESULTS: We show that it is possible to use the distributed learning approach to train a Bayesian network model on patient data originating from multiple hospitals without these data leaving the individual hospital. The AUC of the model is 0.61 (95%CI, 0.51-0.70) on a 5-fold cross-validation and ranges from 0.59 to 0.71 on external validation sets. CONCLUSION: Distributed learning can allow the learning of predictive models on data originating from multiple hospitals while avoiding many of the data sharing barriers. Furthermore, the distributed learning approach can be used to extract and employ knowledge from routine patient data from multiple hospitals while being compliant to the various national and European privacy laws.


Assuntos
Mineração de Dados/métodos , Disseminação de Informação/métodos , Neoplasias Pulmonares/radioterapia , Teorema de Bayes , Confidencialidade , Mineração de Dados/ética , Dispneia/etiologia , Europa (Continente) , Feminino , Humanos , Disseminação de Informação/ética , Masculino , Modelos Teóricos , Medicina de Precisão/métodos , Valor Preditivo dos Testes , Curva ROC , Lesões por Radiação/etiologia , Radioterapia/efeitos adversos
12.
Mol Ecol Resour ; 16(2): 540-8, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26417651

RESUMO

Geography and landscape are important determinants of genetic variation in natural populations, and several ancestry estimation methods have been proposed to investigate population structure using genetic and geographic data simultaneously. Those approaches are often based on computer-intensive stochastic simulations and do not scale with the dimensions of the data sets generated by high-throughput sequencing technologies. There is a growing demand for faster algorithms able to analyse genomewide patterns of population genetic variation in their geographic context. In this study, we present TESS3, a major update of the spatial ancestry estimation program TESS. By combining matrix factorization and spatial statistical methods, TESS3 provides estimates of ancestry coefficients with accuracy comparable to TESS and with run-times much faster than the Bayesian version. In addition, the TESS3 program can be used to perform genome scans for selection, and separate adaptive from nonadaptive genetic variation using ancestral allele frequency differentiation tests. The main features of TESS3 are illustrated using simulated data and analysing genomic data from European lines of the plant species Arabidopsis thaliana.


Assuntos
Biologia Computacional/métodos , Variação Genética , Genética Populacional/métodos , Filogeografia/métodos , Arabidopsis/classificação , Arabidopsis/genética , Europa (Continente) , Genoma de Planta
13.
Acta Oncol ; 54(9): 1289-300, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26395528

RESUMO

BACKGROUND: Trials are vital in informing routine clinical care; however, current designs have major deficiencies. An overview of the various challenges that face modern clinical research and the methods that can be exploited to solve these challenges, in the context of personalised cancer treatment in the 21st century is provided. AIM: The purpose of this manuscript, without intending to be comprehensive, is to spark thought whilst presenting and discussing two important and complementary alternatives to traditional evidence-based medicine, specifically rapid learning health care and cohort multiple randomised controlled trial design. Rapid learning health care is an approach that proposes to extract and apply knowledge from routine clinical care data rather than exclusively depending on clinical trial evidence, (please watch the animation: http://youtu.be/ZDJFOxpwqEA). The cohort multiple randomised controlled trial design is a pragmatic method which has been proposed to help overcome the weaknesses of conventional randomised trials, taking advantage of the standardised follow-up approaches more and more used in routine patient care. This approach is particularly useful when the new intervention is a priori attractive for the patient (i.e. proton therapy, patient decision aids or expensive medications), when the outcomes are easily collected, and when there is no need of a placebo arm. DISCUSSION: Truly personalised cancer treatment is the goal in modern radiotherapy. However, personalised cancer treatment is also an immense challenge. The vast variety of both cancer patients and treatment options makes it extremely difficult to determine which decisions are optimal for the individual patient. Nevertheless, rapid learning health care and cohort multiple randomised controlled trial design are two approaches (among others) that can help meet this challenge.


Assuntos
Medicina Baseada em Evidências/métodos , Neoplasias/radioterapia , Medicina de Precisão/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...