Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 43
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Lab Invest ; 104(6): 102049, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38513977

ABSTRACT

Although pathological tissue analysis is typically performed on single 2-dimensional (2D) histologic reference slides, 3-dimensional (3D) reconstruction from a sequence of histologic sections could provide novel opportunities for spatial analysis of the extracted tissue. In this review, we analyze recent works published after 2018 and report information on the extracted tissue types, the section thickness, and the number of sections used for reconstruction. By analyzing the technological requirements for 3D reconstruction, we observe that software tools exist, both free and commercial, which include the functionality to perform 3D reconstruction from a sequence of histologic images. Through the analysis of the most recent works, we provide an overview of the workflows and tools that are currently used for 3D reconstruction from histologic sections and address points for future work, such as a missing common file format or computer-aided analysis of the reconstructed model.


Subject(s)
Imaging, Three-Dimensional , Imaging, Three-Dimensional/methods , Humans , Software , Animals
2.
J Med Internet Res ; 25: e42621, 2023 07 12.
Article in English | MEDLINE | ID: mdl-37436815

ABSTRACT

BACKGROUND: Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures. OBJECTIVE: Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond. METHODS: The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime. RESULTS: FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites. CONCLUSIONS: FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond.


Subject(s)
Algorithms , Artificial Intelligence , Humans , Health Occupations , Software , Computer Communication Networks , Privacy
3.
Mod Pathol ; 35(12): 1759-1769, 2022 12.
Article in English | MEDLINE | ID: mdl-36088478

ABSTRACT

Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations on compiling test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help pathologists and regulatory agencies verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.


Subject(s)
Artificial Intelligence , Pathology , Humans , Forecasting , Datasets as Topic
5.
BMC Bioinformatics ; 15 Suppl 6: S5, 2014.
Article in English | MEDLINE | ID: mdl-25079119

ABSTRACT

BACKGROUND: This paper presents multilevel data glyphs optimized for the interactive knowledge discovery and visualization of large biomedical data sets. Data glyphs are three- dimensional objects defined by multiple levels of geometric descriptions (levels of detail) combined with a mapping of data attributes to graphical elements and methods, which specify their spatial position. METHODS: In the data mapping phase, which is done by a biomedical expert, meta information about the data attributes (scale, number of distinct values) are compared with the visual capabilities of the graphical elements in order to give a feedback to the user about the correctness of the variable mapping. The spatial arrangement of glyphs is done in a dimetric view, which leads to high data density, a simplified 3D navigation and avoids perspective distortion. RESULTS: We show the usage of data glyphs in the disease analyser a visual analytics application for personalized medicine and provide an outlook to a biomedical web visualization scenario. CONCLUSIONS: Data glyphs can be successfully applied in the disease analyser for the analysis of big medical data sets. Especially the automatic validation of the data mapping, selection of subgroups within histograms and the visual comparison of the value distributions were seen by experts as an important functionality.


Subject(s)
Medical Informatics/methods , Data Mining , Humans , Internet , Medical Informatics/instrumentation
6.
Biopreserv Biobank ; 2024 Mar 18.
Article in English | MEDLINE | ID: mdl-38497765

ABSTRACT

Introduction: The Minimum Information About BIobank Data Sharing (MIABIS) is a biobank-specific terminology enabling the sharing of biobank-related data for different purposes across a wide range of database implementations. After 4 years in use and with the first version of the individual-level MIABIS component Sample, Sample donor, and Event, it was necessary to revise the terminology, especially to include biobanks that work more in the data domain than with samples. Materials & Methods: Nine use-cases representing different types of biobanks, studies, and networks participated in the development work. They represent types of data, specific sample types, or levels of organization that were not included earlier in MIABIS. To support our revision of the Biobank entity, we conducted a survey of European biobanks to chart the services they provide. An important stakeholder group for biobanks include researchers as the main users of biobanks. To be able to render MIABIS more researcher-friendly, we collected different sample/data requests to analyze the terminology adjustment needs in detail. During the update process, the Core terminology was iteratively reviewed by a large group of experts until a consensus was reached. Results: With this update, MIABIS was adjusted to encompass data-driven biobanks and to include data collections, while also describing the services and capabilities biobanks offer to their users, besides the retrospective samples. The terminology was also extended to accommodate sample and data collections of nonhuman origin. Additionally, a set of organizational attributes was compiled to describe networks. Discussion: The usability of MIABIS Core v3 was increased by extending it to cover more topics of the biobanking domain. Additionally, the focus was on a more general terminology and harmonization of attributes with the individual-level entities Sample, Sample donor, and Event to keep the overall terminology minimal. With this work, the internal semantics of the MIABIS terminology was improved.

7.
Front Med (Lausanne) ; 11: 1365501, 2024.
Article in English | MEDLINE | ID: mdl-38813389

ABSTRACT

The emerging European Health Data Space (EHDS) Regulation opens new prospects for large-scale sharing and re-use of health data. Yet, the proposed regulation suffers from two important limitations: it is designed to benefit the whole population with limited consideration for individuals, and the generation of secondary datasets from heterogeneous, unlinked patient data will remain burdensome. AIDAVA, a Horizon Europe project that started in September 2022, proposes to address both shortcomings by providing patients with an AI-based virtual assistant that maximises automation in the integration and transformation of their health data into an interoperable, longitudinal health record. This personal record can then be used to inform patient-related decisions at the point of care, whether this is the usual point of care or a possible cross-border point of care. The personal record can also be used to generate population datasets for research and policymaking. The proposed solution will enable a much-needed paradigm shift in health data management, implementing a 'curate once at patient level, use many times' approach, primarily for the benefit of patients and their care providers, but also for more efficient generation of high-quality secondary datasets. After 15 months, the project shows promising preliminary results in achieving automation in the integration and transformation of heterogeneous data of each individual patient, once the content of the data sources managed by the data holders has been formally described. Additionally, the conceptualization phase of the project identified a set of recommendations for the development of a patient-centric EHDS, significantly facilitating the generation of data for secondary use.

8.
Sci Data ; 11(1): 464, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38719839

ABSTRACT

Improving patient care and advancing scientific discovery requires responsible sharing of research data, healthcare records, biosamples, and biomedical resources that must also respect applicable use conditions. Defining a standard to structure and manage these use conditions is a complex and challenging task. This is exemplified by a near unlimited range of asset types, a high variability of applicable conditions, and differing applications at the individual or collective level. Furthermore, the specifics and granularity required are likely to vary depending on the ultimate contexts of use. All these factors confound alignment of institutional missions, funding objectives, regulatory and technical requirements to facilitate effective sharing. The presented work highlights the complexity and diversity of the problem, reviews the current state of the art, and emphasises the need for a flexible and adaptable approach. We propose Digital Use Conditions (DUC) as a framework that addresses these needs by leveraging existing standards, striking a balance between expressiveness versus ambiguity, and considering the breadth of applicable information with their context of use.


Subject(s)
Information Dissemination , Humans
9.
Learn Health Syst ; 8(1): e10365, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38249839

ABSTRACT

Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.

10.
N Biotechnol ; 74: 16-24, 2023 May 25.
Article in English | MEDLINE | ID: mdl-36754147

ABSTRACT

Due to popular successes (e.g., ChatGPT) Artificial Intelligence (AI) is on everyone's lips today. When advances in biotechnology are combined with advances in AI unprecedented new potential solutions become available. This can help with many global problems and contribute to important Sustainability Development Goals. Current examples include Food Security, Health and Well-being, Clean Water, Clean Energy, Responsible Consumption and Production, Climate Action, Life below Water, or protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss. AI is ubiquitous in the life sciences today. Topics include a wide range from machine learning and Big Data analytics, knowledge discovery and data mining, biomedical ontologies, knowledge-based reasoning, natural language processing, decision support and reasoning under uncertainty, temporal and spatial representation and inference, and methodological aspects of explainable AI (XAI) with applications of biotechnology. In this pre-Editorial paper, we provide an overview of open research issues and challenges for each of the topics addressed in this special issue. Potential authors can directly use this as a guideline for developing their paper.


Subject(s)
Artificial Intelligence , Ecosystem , Biotechnology , Data Mining , Knowledge Bases
11.
Open Res Eur ; 3: 28, 2023.
Article in English | MEDLINE | ID: mdl-37645511

ABSTRACT

The exposome is a complex scientific field that has enjoyed consistent growth over the last two decades, defined as the composite of every exposure to which an individual is subjected from conception to death. The study of the exposome requires consideration of both the nature of those exposures and their changes over time, and as such necessitates high quality data and software solutions. As the exposome is both a broad and a recent concept, it is challenging to define or to introduce in a structured way. Thus, an approach to assist with clear definitions and a structured framework is needed for the wider scientific and public communication. Results: A set of 14 personas were developed through three focus groups and a series of 14 semi-structured interviews. The focus groups defined the broad themes specific to exposome research, while the sub-themes emerged to saturation via the interviews process. Personas are imaginary individuals that represent segments/groups of real people within a population. Within the context of the HEAP project, the created personas represented both exposome data generators and users. Conclusion: Personas have been implemented successfully in computer science, improving the understanding of human-computer interaction. The creation of personas specific to exposome research adds a useful tool supporting education and outreach activities for a complex scientific field.

12.
N Biotechnol ; 78: 22-28, 2023 Dec 25.
Article in English | MEDLINE | ID: mdl-37758054

ABSTRACT

AI development in biotechnology relies on high-quality data to train and validate algorithms. The FAIR principles (Findable, Accessible, Interoperable, and Reusable) and regulatory frameworks such as the In Vitro Diagnostic Regulation (IVDR) and the Medical Device Regulation (MDR) specify requirements on specimen and data provenance to ensure the quality and traceability of data used in AI development. In this paper, a framework is presented for recording and publishing provenance information to meet these requirements. The framework is based on the use of standardized models and protocols, such as the W3C PROV model and the ISO 23494 series, to capture and record provenance information at various stages of the data generation and analysis process. The framework and use case illustrate the role of provenance information in supporting the development of high-quality AI algorithms in biotechnology. Finally, the principles of the framework are illustrated in a simple computational pathology use case, showing how specimen and data provenance can be used in the development and documentation of an AI algorithm. The use case demonstrates the importance of managing and integrating distributed provenance information and highlights the complex task of considering factors such as semantic interoperability, confidentiality, and the verification of authenticity and integrity.


Subject(s)
Algorithms , Biotechnology , Artificial Intelligence
13.
Eur J Radiol ; 165: 110931, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37399666

ABSTRACT

PURPOSE: To investigate whether CT texture analysis allows differentiation between adenocarcinomas, squamous cell carcinomas, carcinoids, small cell lung cancers and organizing pneumonia and between carcinomas and neuroendocrine tumors. METHOD: This retrospective study included patients 133 patients (30 patients with organizing pneumonia, 30 patients with adenocarcinoma, 30 patients with squamous cell carcinoma, 23 patients with small cell lung cancer, 20 patients with carcinoid), who underwent CT-guided biopsy of the lung and had a corresponding histopathologic diagnosis. Pulmonary lesions were segmented in consensus by two radiologists with and without a threshold of -50HU in three dimensions. Groupwise comparisons were performed to assess for differences between all five above-listed entities and between carcinomas and neuroendocrine tumors. RESULTS: Pairwise comparisons of the five entities revealed 53 statistically significant texture features when using no HU-threshold and 6 statistically significant features with a threshold of -50HU. The largest AUC (0.818 [95%CI 0.706-0.930]) was found for the feature wavelet-HHH_glszm_SmallAreaEmphasis for discrimination of carcinoid from the other entities when using no HU-threshold. In differentiating neuroendocrine tumors from carcinomas, 173 parameters proved statistically significant when using no HU threshold versus 52 parameters when using a -50HU-threshold. The largest AUC (0.810 [95%CI 0.728-0,893]) was found for the parameter original_glcm_Correlation for discrimination of neuroendocrine tumors from carcinomas when using no HU-threshold. CONCLUSIONS: CT texture analysis revealed features that differed significantly between malignant pulmonary lesions and organizing pneumonia and between carcinomas and neuroendocrine tumors of the lung. Applying a HU-threshold for segmentation substantially influenced the results of texture analysis.


Subject(s)
Adenocarcinoma , Carcinoid Tumor , Carcinoma, Neuroendocrine , Carcinoma, Squamous Cell , Lung Neoplasms , Neuroendocrine Tumors , Organizing Pneumonia , Pneumonia , Humans , Neuroendocrine Tumors/diagnostic imaging , Neuroendocrine Tumors/pathology , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/pathology , Retrospective Studies , Lung/pathology , Adenocarcinoma/pathology , Carcinoid Tumor/pathology , Carcinoma, Squamous Cell/pathology , Tomography, X-Ray Computed/methods , Carcinoma, Neuroendocrine/pathology , Cell Differentiation
14.
J Pathol Clin Res ; 9(4): 251-260, 2023 07.
Article in English | MEDLINE | ID: mdl-37045794

ABSTRACT

The current move towards digital pathology enables pathologists to use artificial intelligence (AI)-based computer programmes for the advanced analysis of whole slide images. However, currently, the best-performing AI algorithms for image analysis are deemed black boxes since it remains - even to their developers - often unclear why the algorithm delivered a particular result. Especially in medicine, a better understanding of algorithmic decisions is essential to avoid mistakes and adverse effects on patients. This review article aims to provide medical experts with insights on the issue of explainability in digital pathology. A short introduction to the relevant underlying core concepts of machine learning shall nurture the reader's understanding of why explainability is a specific issue in this field. Addressing this issue of explainability, the rapidly evolving research field of explainable AI (XAI) has developed many techniques and methods to make black-box machine-learning systems more transparent. These XAI methods are a first step towards making black-box AI systems understandable by humans. However, we argue that an explanation interface must complement these explainable models to make their results useful to human stakeholders and achieve a high level of causability, i.e. a high level of causal understanding by the user. This is especially relevant in the medical field since explainability and causability play a crucial role also for compliance with regulatory requirements. We conclude by promoting the need for novel user interfaces for AI applications in pathology, which enable contextual understanding and allow the medical expert to ask interactive 'what-if'-questions. In pathology, such user interfaces will not only be important to achieve a high level of causability. They will also be crucial for keeping the human-in-the-loop and bringing medical experts' experience and conceptual knowledge to AI processes.


Subject(s)
Artificial Intelligence , Pathologists , Humans , Algorithms , Image Processing, Computer-Assisted
15.
Nat Commun ; 14(1): 2577, 2023 05 04.
Article in English | MEDLINE | ID: mdl-37142591

ABSTRACT

Access to large volumes of so-called whole-slide images-high-resolution scans of complete pathological slides-has become a cornerstone of the development of novel artificial intelligence methods in pathology for diagnostic use, education/training of pathologists, and research. Nevertheless, a methodology based on risk analysis for evaluating the privacy risks associated with sharing such imaging data and applying the principle "as open as possible and as closed as necessary" is still lacking. In this article, we develop a model for privacy risk analysis for whole-slide images which focuses primarily on identity disclosure attacks, as these are the most important from a regulatory perspective. We introduce a taxonomy of whole-slide images with respect to privacy risks and mathematical model for risk assessment and design . Based on this risk assessment model and the taxonomy, we conduct a series of experiments to demonstrate the risks using real-world imaging data. Finally, we develop guidelines for risk assessment and recommendations for low-risk sharing of whole-slide image data.


Subject(s)
Artificial Intelligence , Privacy , Image Processing, Computer-Assisted/methods , Diagnostic Imaging/methods
16.
N Biotechnol ; 77: 12-19, 2023 Nov 25.
Article in English | MEDLINE | ID: mdl-37295722

ABSTRACT

Data quality has recently become a critical topic for the research community. European guidelines recommend that scientific data should be made FAIR: findable, accessible, interoperable and reusable. However, as FAIR guidelines do not specify how the stated principles should be implemented, it might not be straightforward for researchers to know how actually to make their data FAIR. This can prevent life-science researchers from sharing their datasets and pipelines, ultimately hindering the progress of research. To address this difficulty, we developed the BIBBOX, which is a platform that supports researchers publishing their datasets and the associated software in a FAIR manner.


Subject(s)
Mobile Applications
17.
Sci Rep ; 13(1): 21601, 2023 12 07.
Article in English | MEDLINE | ID: mdl-38062070

ABSTRACT

Consumer purchase data (CPD) is a promising instrument to assess the impact of purchases on health, but is limited by the need for manual scanning, a lack of access to data from multiple retailers, and limited information on product data and health outcomes. Here we describe the My Purchases cohort, a web-app enabled, prospective collection of CPD, covering several large retail chains in Denmark, that enables linkage to health outcomes. The cohort included 459 participants as of July 03, 2023. Up to eight years of CPD have been collected, with 2,225,010 products purchased, comprising 223,440 unique products. We matched 88.5% of all products by product name or item number to one generic food database and three product databases. Combined, the databases enable analysis of key exposures such as nutrients, ingredients, or additives. We found that increasing the number of retailers that provide CPD for each consumer improved the stability of individual CPD profiles and when we compared kilojoule information from generic and specific product matches, we found a median modified relative difference of 0.23. Combined with extensive product databases and health outcomes, CPD could provide the basis for extensive investigations of how what we buy affects our health.


Subject(s)
Family Characteristics , Food , Humans , Prospective Studies , Consumer Behavior , Life Style
18.
JAMA Netw Open ; 6(3): e2254891, 2023 03 01.
Article in English | MEDLINE | ID: mdl-36917112

ABSTRACT

Importance: Identifying new prognostic features in colon cancer has the potential to refine histopathologic review and inform patient care. Although prognostic artificial intelligence systems have recently demonstrated significant risk stratification for several cancer types, studies have not yet shown that the machine learning-derived features associated with these prognostic artificial intelligence systems are both interpretable and usable by pathologists. Objective: To evaluate whether pathologist scoring of a histopathologic feature previously identified by machine learning is associated with survival among patients with colon cancer. Design, Setting, and Participants: This prognostic study used deidentified, archived colorectal cancer cases from January 2013 to December 2015 from the University of Milano-Bicocca. All available histologic slides from 258 consecutive colon adenocarcinoma cases were reviewed from December 2021 to February 2022 by 2 pathologists, who conducted semiquantitative scoring for tumor adipose feature (TAF), which was previously identified via a prognostic deep learning model developed with an independent colorectal cancer cohort. Main Outcomes and Measures: Prognostic value of TAF for overall survival and disease-specific survival as measured by univariable and multivariable regression analyses. Interpathologist agreement in TAF scoring was also evaluated. Results: A total of 258 colon adenocarcinoma histopathologic cases from 258 patients (138 men [53%]; median age, 67 years [IQR, 65-81 years]) with stage II (n = 119) or stage III (n = 139) cancer were included. Tumor adipose feature was identified in 120 cases (widespread in 63 cases, multifocal in 31, and unifocal in 26). For overall survival analysis after adjustment for tumor stage, TAF was independently prognostic in 2 ways: TAF as a binary feature (presence vs absence: hazard ratio [HR] for presence of TAF, 1.55 [95% CI, 1.07-2.25]; P = .02) and TAF as a semiquantitative categorical feature (HR for widespread TAF, 1.87 [95% CI, 1.23-2.85]; P = .004). Interpathologist agreement for widespread TAF vs lower categories (absent, unifocal, or multifocal) was 90%, corresponding to a κ metric at this threshold of 0.69 (95% CI, 0.58-0.80). Conclusions and Relevance: In this prognostic study, pathologists were able to learn and reproducibly score for TAF, providing significant risk stratification on this independent data set. Although additional work is warranted to understand the biological significance of this feature and to establish broadly reproducible TAF scoring, this work represents the first validation to date of human expert learning from machine learning in pathology. Specifically, this validation demonstrates that a computationally identified histologic feature can represent a human-identifiable, prognostic feature with the potential for integration into pathology practice.


Subject(s)
Adenocarcinoma , Colonic Neoplasms , Male , Humans , Aged , Colonic Neoplasms/diagnosis , Pathologists , Artificial Intelligence , Machine Learning , Risk Assessment
19.
Commun Med (Lond) ; 3(1): 59, 2023 Apr 24.
Article in English | MEDLINE | ID: mdl-37095223

ABSTRACT

BACKGROUND: Presence of lymph node metastasis (LNM) influences prognosis and clinical decision-making in colorectal cancer. However, detection of LNM is variable and depends on a number of external factors. Deep learning has shown success in computational pathology, but has struggled to boost performance when combined with known predictors. METHODS: Machine-learned features are created by clustering deep learning embeddings of small patches of tumor in colorectal cancer via k-means, and then selecting the top clusters that add predictive value to a logistic regression model when combined with known baseline clinicopathological variables. We then analyze performance of logistic regression models trained with and without these machine-learned features in combination with the baseline variables. RESULTS: The machine-learned extracted features provide independent signal for the presence of LNM (AUROC: 0.638, 95% CI: [0.590, 0.683]). Furthermore, the machine-learned features add predictive value to the set of 6 clinicopathologic variables in an external validation set (likelihood ratio test, p < 0.00032; AUROC: 0.740, 95% CI: [0.701, 0.780]). A model incorporating these features can also further risk-stratify patients with and without identified metastasis (p < 0.001 for both stage II and stage III). CONCLUSION: This work demonstrates an effective approach to combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with LNM. Further work building on these specific results may have important impact in prognostication and therapeutic decision making for LNM. Additionally, this general computational approach may prove useful in other contexts.


When colorectal cancers spread to the lymph nodes, it can indicate a poorer prognosis. However, detecting lymph node metastasis (spread) can be difficult and depends on a number of factors such as how samples are taken and processed. Here, we show that machine learning, which involves computer software learning from patterns in data, can predict lymph node metastasis in patients with colorectal cancer from the microscopic appearance of their primary tumor and the clinical characteristics of the patients. We also show that the same approach can predict patient survival. With further work, our approach may help clinicians to inform patients about their prognosis and decide on appropriate treatments.

20.
NPJ Precis Oncol ; 7(1): 98, 2023 Sep 26.
Article in English | MEDLINE | ID: mdl-37752266

ABSTRACT

Studies have shown that colorectal cancer prognosis can be predicted by deep learning-based analysis of histological tissue sections of the primary tumor. So far, this has been achieved using a binary prediction. Survival curves might contain more detailed information and thus enable a more fine-grained risk prediction. Therefore, we established survival curve-based CRC survival predictors and benchmarked them against standard binary survival predictors, comparing their performance extensively on the clinical high and low risk subsets of one internal and three external cohorts. Survival curve-based risk prediction achieved a very similar risk stratification to binary risk prediction for this task. Exchanging other components of the pipeline, namely input tissue and feature extractor, had largely identical effects on model performance independently of the type of risk prediction. An ensemble of all survival curve-based models exhibited a more robust performance, as did a similar ensemble based on binary risk prediction. Patients could be further stratified within clinical risk groups. However, performance still varied across cohorts, indicating limited generalization of all investigated image analysis pipelines, whereas models using clinical data performed robustly on all cohorts.

SELECTION OF CITATIONS
SEARCH DETAIL