Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
NAR Genom Bioinform ; 6(3): lqae073, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38974799

RESUMO

Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) are now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which is typically addressed by producing lower dimensional representations of single cells for downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach by building embedding models pre-trained on reference data. We argue that this provides a more flexible analysis workflow that also has computational performance advantages through transfer learning. We implemented our approach in scEmbed, an unsupervised machine-learning framework that learns low-dimensional embeddings of genomic regulatory regions to represent and analyze scATAC-seq data. scEmbed performs well in terms of clustering ability and has the key advantage of learning patterns of region co-occurrence that can be transferred to other, unseen datasets. Moreover, models pre-trained on reference data can be exploited to build fast and accurate cell-type annotation systems without the need for other data modalities. scEmbed is implemented in Python and it is available to download from GitHub. We also make our pre-trained models available on huggingface for public use. scEmbed is open source and available at https://github.com/databio/geniml. Pre-trained models from this work can be obtained on huggingface: https://huggingface.co/databio.

2.
iScience ; 27(6): 110013, 2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-38868190

RESUMO

Environmental enteric dysfunction (EED) is a subclinical enteropathy challenging to diagnose due to an overlap of tissue features with other inflammatory enteropathies. EED subjects (n = 52) from Pakistan, controls (n = 25), and a validation EED cohort (n = 30) from Zambia were used to develop a machine-learning-based image analysis classification model. We extracted histologic feature representations from the Pakistan EED model and correlated them to transcriptomics and clinical biomarkers. In-silico metabolic network modeling was used to characterize alterations in metabolic flux between EED and controls and validated using untargeted lipidomics. Genes encoding beta-ureidopropionase, CYP4F3, and epoxide hydrolase 1 correlated to numerous tissue feature representations. Fatty acid and glycerophospholipid metabolism-related reactions showed altered flux. Increased phosphatidylcholine, lysophosphatidylcholine (LPC), and ether-linked LPCs, and decreased ester-linked LPCs were observed in the duodenal lipidome of Pakistan EED subjects, while plasma levels of glycine-conjugated bile acids were significantly increased. Together, these findings elucidate a multi-omic signature of EED.

3.
BMC Bioinformatics ; 25(1): 178, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38714921

RESUMO

BACKGROUND: In low-middle income countries, healthcare providers primarily use paper health records for capturing data. Paper health records are utilized predominately due to the prohibitive cost of acquisition and maintenance of automated data capture devices and electronic medical records. Data recorded on paper health records is not easily accessible in a digital format to healthcare providers. The lack of real time accessible digital data limits healthcare providers, researchers, and quality improvement champions to leverage data to improve patient outcomes. In this project, we demonstrate the novel use of computer vision software to digitize handwritten intraoperative data elements from smartphone photographs of paper anesthesia charts from the University Teaching Hospital of Kigali. We specifically report our approach to digitize checkbox data, symbol-denoted systolic and diastolic blood pressure, and physiological data. METHODS: We implemented approaches for removing perspective distortions from smartphone photographs, removing shadows, and improving image readability through morphological operations. YOLOv8 models were used to deconstruct the anesthesia paper chart into specific data sections. Handwritten blood pressure symbols and physiological data were identified, and values were assigned using deep neural networks. Our work builds upon the contributions of previous research by improving upon their methods, updating the deep learning models to newer architectures, as well as consolidating them into a single piece of software. RESULTS: The model for extracting the sections of the anesthesia paper chart achieved an average box precision of 0.99, an average box recall of 0.99, and an mAP0.5-95 of 0.97. Our software digitizes checkbox data with greater than 99% accuracy and digitizes blood pressure data with a mean average error of 1.0 and 1.36 mmHg for systolic and diastolic blood pressure respectively. Overall accuracy for physiological data which includes oxygen saturation, inspired oxygen concentration and end tidal carbon dioxide concentration was 85.2%. CONCLUSIONS: We demonstrate that under normal photography conditions we can digitize checkbox, blood pressure and physiological data to within human accuracy when provided legible handwriting. Our contributions provide improved access to digital data to healthcare practitioners in low-middle income countries.


Assuntos
Smartphone , Humanos , Anestesia , Registros Eletrônicos de Saúde , Países em Desenvolvimento , Processamento de Imagem Assistida por Computador/métodos , Aprendizado Profundo
4.
Bioengineering (Basel) ; 11(3)2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38534537

RESUMO

As available genomic interval data increase in scale, we require fast systems to search them. A common approach is simple string matching to compare a search term to metadata, but this is limited by incomplete or inaccurate annotations. An alternative is to compare data directly through genomic region overlap analysis, but this approach leads to challenges like sparsity, high dimensionality, and computational expense. We require novel methods to quickly and flexibly query large, messy genomic interval databases. Here, we develop a genomic interval search system using representation learning. We train numerical embeddings for a collection of region sets simultaneously with their metadata labels, capturing similarity between region sets and their metadata in a low-dimensional space. Using these learned co-embeddings, we develop a system that solves three related information retrieval tasks using embedding distance computations: retrieving region sets related to a user query string, suggesting new labels for database region sets, and retrieving database region sets similar to a query region set. We evaluate these use cases and show that jointly learned representations of region sets and metadata are a promising approach for fast, flexible, and accurate genomic region information retrieval.

5.
Med Sci Sports Exerc ; 56(2): 287-296, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-37703319

RESUMO

PURPOSE: The kinetics of physiological responses to exercise have traditionally been characterized by estimating exponential equation parameters using iterative best-fit techniques of heart rate (HR) and gas exchange (respiratory rate, oxygen uptake (V̇O 2 ), carbon dioxide output, and ventilation). In this study, we present a novel approach to characterizing the maturation of physiological responses to exercise in children by accounting for response uncertainty and variability. METHODS: Thirty-seven early-pubertal (17 females, 20 males) and 44 late-pubertal (25 females, 19 males) participants performed three multiple brief exercise bouts (MBEB). MBEB consisted of ten 2-min bouts of cycle ergometry at constant work rate interspersed by 1-min rest. Exercise intensity was categorized as low, moderate, or high, corresponding to 40%, 60%, and 80% of peak work rate, and performed in random order on 3 separate days. We evaluated sample entropy (SampEn), approximate entropy, detrended fluctuation analysis, and average absolute local variability of HR and gas exchange. RESULTS: SampEn of HR and gas-exchange responses to MBEB was greater in early- compared with late-pubertal participants (e.g., V̇O 2 early-pubertal vs late-pubertal, 1.70 ± 0.023 vs 1.41 ± 0.027; P = 2.97 × 10 -14 ), and decreased as MBEB intensity increased (e.g., 0.37 ± 0.01 HR for low-intensity compared with 0.21 ± 0.014 for high intensity, P = 3.56 × 10 -17 ). Females tended to have higher SampEn than males (e.g., 1.61 ± 0.025 V̇O 2 for females vs 1.46 ± 0.031 for males, P = 1.28 × 10 -4 ). Average absolute local variability was higher in younger participants for both gas exchange and HR (e.g., early-pubertal vs late-pubertal V̇O 2 , 17.48 % ± 0.56% vs 10.24 % ± 0.34%; P = 1.18 × 10 -21 ). CONCLUSIONS: The greater entropy in signal response to a known, quantifiable exercise perturbation in the younger children might represent maturation-dependent, enhanced competition among physiological controlling mechanisms that originate at the autonomic, subconscious, and cognitive levels.


Assuntos
Teste de Esforço , Consumo de Oxigênio , Masculino , Feminino , Criança , Humanos , Consumo de Oxigênio/fisiologia , Exercício Físico/fisiologia , Ergometria , Respiração , Frequência Cardíaca/fisiologia , Troca Gasosa Pulmonar/fisiologia
6.
J Stroke Cerebrovasc Dis ; 32(3): 106987, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36641948

RESUMO

BACKGROUND: Studies from early in the COVID-19 pandemic showed that patients with ischemic stroke and concurrent SARS-CoV-2 infection had increased stroke severity. We aimed to test the hypothesis that this association persisted throughout the first year of the pandemic and that a similar increase in stroke severity was present in patients with hemorrhagic stroke. METHODS: Using the National Institute of Health National COVID Cohort Collaborative (N3C) database, we identified a cohort of patients with stroke hospitalized in the United States between March 1, 2020 and February 28, 2021. We propensity score matched patients with concurrent stroke and SARS-COV-2 infection and available NIH Stroke Scale (NIHSS) scores to all other patients with stroke in a 1:3 ratio. Nearest neighbor matching with a caliper of 0.25 was used for most factors and exact matching was used for race/ethnicity and site. We modeled stroke severity as measured by admission NIHSS and the outcomes of death and length of stay. We also explored the temporal relationship between time of SARS-COV-2 diagnosis and incidence of stroke. RESULTS: Our query identified 43,295 patients hospitalized with ischemic stroke (5765 with SARS-COV-2, 37,530 without) and 18,107 patients hospitalized with hemorrhagic stroke (2114 with SARS-COV-2, 15,993 without). Analysis of our propensity matched cohort revealed that stroke patients with concurrent SARS-COV-2 had increased NIHSS (Ischemic stroke: IRR=1.43, 95% CI:1.33-1.52, p<0.001; hemorrhagic stroke: IRR=1.20, 95% CI:1.08-1.33, p<0.001), length of stay (Ischemic stroke: estimate = 1.48, 95% CI: 1.37, 1.61, p<0.001; hemorrhagic stroke: estimate = 1.25, 95% CI: 1.06, 1.47, p=0.007) and higher odds of death (Ischemic stroke: OR 2.19, 95% CI: 1.79-2.68, p<0.001; hemorrhagic stroke: OR 2.19, 95% CI: 1.79-2.68, p<0.001). We observed the highest incidence of stroke diagnosis on the same day as SARS-COV-2 diagnosis with a logarithmic decline in counts. CONCLUSION: This retrospective observational analysis suggests that stroke severity in patients with concurrent SARS-COV-2 was increased throughout the first year of the pandemic.


Assuntos
COVID-19 , Acidente Vascular Cerebral Hemorrágico , AVC Isquêmico , Acidente Vascular Cerebral , Humanos , COVID-19/complicações , COVID-19/diagnóstico , COVID-19/epidemiologia , Teste para COVID-19 , Acidente Vascular Cerebral Hemorrágico/diagnóstico , Acidente Vascular Cerebral Hemorrágico/epidemiologia , Acidente Vascular Cerebral Hemorrágico/terapia , AVC Isquêmico/diagnóstico , AVC Isquêmico/terapia , AVC Isquêmico/epidemiologia , Pandemias , Estudos Retrospectivos , SARS-CoV-2 , Acidente Vascular Cerebral/diagnóstico , Acidente Vascular Cerebral/terapia , Acidente Vascular Cerebral/epidemiologia , Estados Unidos/epidemiologia
7.
Anesth Analg ; 136(4): 753-760, 2023 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-36017931

RESUMO

BACKGROUND: In low-middle-income countries (LMICs), perioperative clinical information is almost universally collected on paper health records (PHRs). The lack of accessible digital databases limits LMICs in leveraging data to predict and improve patient outcomes after surgery. In this feasibility study, our aims were to: (1) determine the detection performance and prediction error of the U-Net deep image segmentation approach for digitization of hand-drawn blood pressure symbols from an image of the intraoperative PHRs and (2) evaluate the association between deep image segmentation-derived blood pressure parameters and postoperative mortality and length of stay. METHODS: A smartphone mHealth platform developed by our team was used to capture images of completed intraoperative PHRs. A 2-stage deep image segmentation modeling approach was used to create 2 separate segmentation masks for systolic blood pressure (SBP) and diastolic blood pressure (DBP). Iterative postprocessing was utilized to convert the segmentation mask results into numerical SBP and DBP values. Detection performance and prediction errors were evaluated for the U-Net models by comparison with ground-truth values. Using multivariate regression analysis, we investigated the association of deep image segmentation-derived blood pressure values, total time spent in predefined blood pressure ranges, and postoperative outcomes including in-hospital mortality and length of stay. RESULTS: A total of 350 intraoperative PHRs were imaged following surgery. Overall accuracy was 0.839 and 0.911 for SBP and DBP symbol detections, respectively. The mean error rate and standard deviation for the difference between the actual and predicted blood pressure values were 2.1 ± 4.9 and -0.8 ± 3.9 mm Hg for SBP and DBP, respectively. Using the U-Net model-derived blood pressures, minutes of time where DBP <50 mm Hg (odds ratio [OR], 1.03; CI, 1.01-1.05; P = .003) was associated with an increased in-hospital mortality. In addition, increased cumulative minutes of time with SBP between 80 and 90 mm Hg was significantly associated with a longer length of stay (incidence rate ratio, 1.02 [1.0-1.03]; P < .05), while increased cumulative minutes of time where SBP between 140 and 160 mm Hg was associated with a shorter length of stay (incidence rate ratio, 0.9 [0.96-0.99]; P < .05). CONCLUSIONS: In this study, we report our experience with a deep image segmentation model for digitization of symbol-denoted blood pressure from intraoperative anesthesia PHRs. Our data support further development of this novel approach to digitize PHRs from LMICs, to provide accessible, curated, and reproducible data for both quality improvement- and outcome-based research.


Assuntos
Hipertensão , Humanos , Pressão Sanguínea/fisiologia , Estudos de Viabilidade , Análise de Regressão , Hipertensão/diagnóstico
8.
IEEE J Biomed Health Inform ; 26(12): 5953-5963, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36103443

RESUMO

Physiological response to physical exercise through analysis of cardiopulmonary measurements has been shown to be predictive of a variety of diseases. Nonetheless, the clinical use of exercise testing remains limited because interpretation of test results requires experience and specialized training. Additionally, until this work no methods have identified which dynamic gas exchange or heart rate responses influence an individual's decision to start or stop physical activity. This research examines the use of advanced machine learning methods to predict completion of a test consisting of multiple exercise bouts by a group of healthy children and adolescents. All participants could complete the ten bouts at low or moderate-intensity work rates, however, when the bout work rates were high-intensity, 50% refused to begin the subsequent exercise bout before all ten bouts had been completed (task failure). We explored machine learning strategies to model the relationship between the physiological time series, the participant's anthropometric variables, and the binary outcome variable indicating whether the participant completed the test. The best performing model, a generalized spectral additive model with functional and scalar covariates, achieved 93.6% classification accuracy and an F1 score of 93.5%. Additionally, functional analysis of variance testing showed that participants in the 'failed' and 'success' groups have significantly different functional means in three signals: heart rate, oxygen uptake rate, and carbon dioxide uptake rate. Overall, these results show the capability of functional data analysis with generalized spectral additive models to identify key differences in the exercise-induced responses of participants in multiple bout exercise testing.


Assuntos
Exercício Físico , Consumo de Oxigênio , Adolescente , Humanos , Criança , Consumo de Oxigênio/fisiologia , Exercício Físico/fisiologia , Teste de Esforço , Frequência Cardíaca/fisiologia , Fatores de Tempo
9.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 4740-4744, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-36086227

RESUMO

Advancements in deep learning techniques have proved useful in biomedical image segmentation. However, the large amount of unlabeled data inherent in biomedical imagery, particularly in digital pathology, creates a semi-supervised learning paradigm. Specifically, because of the time consuming nature of producing pixel-wise annotations and the high cost of having a pathologist dedicate time to labeling, there is a large amount of unlabeled data that we wish to utilize in training segmentation algorithms. Pseudo-labeling is one method to leverage the unlabeled data to increase overall model performance. We adapt a method used for image classification pseudo-labeling to select images for segmentation pseudo-labeling and apply it to 3 digital pathology datasets. To select images for pseudo-labeling, we create and explore different thresholds for confidence and uncertainty on an image level basis. Furthermore, we study the relationship between image-level uncertainty and confidence with model performance. We find that the certainty metrics do not consistently correlate with performance intuitively, and abnormal correlations serve as an indicator of a model's ability to produce pseudo-labels that are useful in training. Clinical relevance - The proposed approach adapts image-level confidence and uncertainty measures for segmentation pseudo-labeling on digital pathology datasets. Increased model performance enables better disease quantification for histopathology.


Assuntos
Algoritmos , Aprendizado de Máquina Supervisionado , Incerteza
10.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 1611-1614, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-36086506

RESUMO

Exercise testing has been available for more than a half-century and is a remarkably versatile tool for diagnostic and prognostic information of patients for a range of diseases, especially cardiovascular and pulmonary. With rapid advancements in technology, wearables, and learning algorithm in the last decade, its scope has evolved. Specifically, Cardiopulmonary exercise testing (CPX) is one of the most commonly used laboratory tests for objective evaluation of exercise capacity and performance levels in patients. CPX provides a non-invasive, integrative assessment of the pulmonary, cardiovascular, and skeletal muscle systems involving the measurement of gas exchanges. However, its assessment is challenging, requiring the individual to process multiple time series data points, leading to simplification to peak values and slopes. But this simplification can discard the valuable trend information present in these time series. In this work, we encode the time series as images using the Gramian Angular Field and Markov Transition Field and use it with a convolutional neural network and attention pooling approach for the classification of heart failure and metabolic syndrome patients. Using GradCAMs, we highlight the discriminative features identified by the model. Clinical relevance- The proposed framework can process multivariate exercise testing time-series data and accurately predict cardiovascular diseases. Interpretable Grad-CAMs can be obtained to explain the prediction.


Assuntos
Teste de Esforço , Insuficiência Cardíaca , Teste de Esforço/métodos , Tolerância ao Exercício , Insuficiência Cardíaca/diagnóstico , Humanos , Redes Neurais de Computação , Fatores de Tempo
11.
Am J Prev Cardiol ; 12: 100379, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36090536

RESUMO

Machine learning (ML) refers to computational algorithms that iteratively improve their ability to recognize patterns in data. The digitization of our healthcare infrastructure is generating an abundance of data from electronic health records, imaging, wearables, and sensors that can be analyzed by ML algorithms to generate personalized risk assessments and promote guideline-directed medical management. ML's strength in generating insights from complex medical data to guide clinical decisions must be balanced with the potential to adversely affect patient privacy, safety, health equity, and clinical interpretability. This review provides a primer on key advances in ML for cardiovascular disease prevention and how they may impact clinical practice.

12.
BioData Min ; 15(1): 16, 2022 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-35964102

RESUMO

BACKGROUND: Cardiopulmonary exercise testing (CPET) provides a reliable and reproducible approach to measuring fitness in patients and diagnosing their health problems. However, the data from CPET consist of multiple time series that require training to interpret. Part of this training teaches the use of flow charts or nested decision trees to interpret the CPET results. This paper investigates the use of two machine learning techniques using neural networks to predict patient health conditions with CPET data in contrast to flow charts. The data for this investigation comes from a small sample of patients with known health problems and who had CPET results. The small size of the sample data also allows us to investigate the use and performance of deep learning neural networks on health care problems with limited amounts of labeled training and testing data. METHODS: This paper compares the current standard for interpreting and classifying CPET data, flowcharts, to neural network techniques, autoencoders and convolutional neural networks (CNN). The study also investigated the performance of principal component analysis (PCA) with logistic regression to provide an additional baseline of comparison to the neural network techniques. RESULTS: The patients in the sample had two primary diagnoses: heart failure and metabolic syndrome. All model-based testing was done with 5-fold cross-validation and metrics of precision, recall, F1 score, and accuracy. As a baseline for comparison to our models, the highest performing flow chart method achieved an accuracy of 77%. Both PCA regression and CNN achieved an average accuracy of 90% and outperformed the flow chart methods on all metrics. The autoencoder with logistic regression performed the best on each of the metrics and had an average accuracy of 94%. CONCLUSIONS: This study suggests that machine learning and neural network techniques, in particular, can provide higher levels of accuracy with CPET data than traditional flowchart methods. Further, the CNN performed well with a small data set showing that these techniques can be designed to perform well on small data problems that are often found in health care and the life sciences. Further testing with larger data sets is needed to continue evaluating the use of machine learning to interpret CPET data.

13.
IEEE J Biomed Health Inform ; 26(8): 4228-4237, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35353709

RESUMO

Cardiopulmonary Exer cise Testing (CPET) is a unique physiologic medical test used to evaluate human response to progressive maximal exercise stress. Depending on the degree and type of deviation from the normal physiologic response, CPET can help identify a patient's specific limitations to exercise to guide clinical care without the need for other expensive and invasive diagnostic tests. However, given the amount and complexity of data obtained from CPET, interpretation and visualization of test results is challenging. CPET data currently require dedicated training and significant experience for proper clinician interpretation. To make CPET more accessible to clinicians, we investigated a simplified data interpretation and visualization tool using machine learning algorithms. The visualization shows three types of limitations (cardiac, pulmonary and others); values are defined based on the results of three independent random forest classifiers. To display the models' scores and make them interpretable to the clinicians, an interactive dashboard with the scores and interpretability plots was developed. This machine learning platform has the potential to augment existing diagnostic procedures and provide a tool to make CPET more accessible to clinicians.


Assuntos
Teste de Esforço , Exercício Físico , Teste de Esforço/métodos , Coração , Humanos , Aprendizado de Máquina , Consumo de Oxigênio
14.
Pattern Recognit (2021) ; 12661: 120-140, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34693406

RESUMO

Hematoxylin and Eosin (H&E) stained Whole Slide Images (WSIs) are utilized for biopsy visualization-based diagnostic and prognostic assessment of diseases. Variation in the H&E staining process across different lab sites can lead to significant variations in biopsy image appearance. These variations introduce an undesirable bias when the slides are examined by pathologists or used for training deep learning models. Traditionally proposed stain normalization and color augmentation strategies can handle the human level bias. But deep learning models can easily disentangle the linear transformation used in these approaches, resulting in undesirable bias and lack of generalization. To handle these limitations, we propose a Self-Attentive Adversarial Stain Normalization (SAASN) approach for the normalization of multiple stain appearances to a common domain. This unsupervised generative adversarial approach includes self-attention mechanism for synthesizing images with finer detail while preserving the structural consistency of the biopsy features during translation. SAASN demonstrates consistent and superior performance compared to other popular stain normalization techniques on H&E stained duodenal biopsy image data.

15.
Bioinformatics ; 37(23): 4299-4306, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34156475

RESUMO

MOTIVATION: Genomic region sets summarize functional genomics data and define locations of interest in the genome such as regulatory regions or transcription factor binding sites. The number of publicly available region sets has increased dramatically, leading to challenges in data analysis. RESULTS: We propose a new method to represent genomic region sets as vectors, or embeddings, using an adapted word2vec approach. We compared our approach to two simpler methods based on interval unions or term frequency-inverse document frequency and evaluated the methods in three ways: First, by classifying the cell line, antibody or tissue type of the region set; second, by assessing whether similarity among embeddings can reflect simulated random perturbations of genomic regions; and third, by testing robustness of the proposed representations to different signal thresholds for calling peaks. Our word2vec-based region set embeddings reduce dimensionality from more than a hundred thousand to 100 without significant loss in classification performance. The vector representation could identify cell line, antibody and tissue type with over 90% accuracy. We also found that the vectors could quantitatively summarize simulated random perturbations to region sets and are more robust to subsampling the data derived from different peak calling thresholds. Our evaluations demonstrate that the vectors retain useful biological information in relatively lower-dimensional spaces. We propose that vector representation of region sets is a promising approach for efficient analysis of genomic region data. AVAILABILITY AND IMPLEMENTATION: https://github.com/databio/regionset-embedding. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Ligação Proteica
16.
Artigo em Inglês | MEDLINE | ID: mdl-34046649

RESUMO

Eosinophilic Esophagitis (EoE) is an inflammatory esophageal disease which is increasing in prevalence. The diagnostic gold-standard involves manual review of a patient's biopsy tissue sample by a clinical pathologist for the presence of 15 or greater eosinophils within a single high-power field (400× magnification). Diagnosing EoE can be a cumbersome process with added difficulty for assessing the severity and progression of disease. We propose an automated approach for quantifying eosinophils using deep image segmentation. A U-Net model and post-processing system are applied to generate eosinophil-based statistics that can diagnose EoE as well as describe disease severity and progression. These statistics are captured in biopsies at the initial EoE diagnosis and are then compared with patient metadata: clinical and treatment phenotypes. The goal is to find linkages that could potentially guide treatment plans for new patients at their initial disease diagnosis. A deep image classification model is further applied to discover features other than eosinophils that can be used to diagnose EoE. This is the first study to utilize a deep learning computer vision approach for EoE diagnosis and to provide an automated process for tracking disease severity and progression.

17.
Sci Rep ; 11(1): 5086, 2021 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-33658592

RESUMO

Probe-based confocal laser endomicroscopy (pCLE) allows for real-time diagnosis of dysplasia and cancer in Barrett's esophagus (BE) but is limited by low sensitivity. Even the gold standard of histopathology is hindered by poor agreement between pathologists. We deployed deep-learning-based image and video analysis in order to improve diagnostic accuracy of pCLE videos and biopsy images. Blinded experts categorized biopsies and pCLE videos as squamous, non-dysplastic BE, or dysplasia/cancer, and deep learning models were trained to classify the data into these three categories. Biopsy classification was conducted using two distinct approaches-a patch-level model and a whole-slide-image-level model. Gradient-weighted class activation maps (Grad-CAMs) were extracted from pCLE and biopsy models in order to determine tissue structures deemed relevant by the models. 1970 pCLE videos, 897,931 biopsy patches, and 387 whole-slide images were used to train, test, and validate the models. In pCLE analysis, models achieved a high sensitivity for dysplasia (71%) and an overall accuracy of 90% for all classes. For biopsies at the patch level, the model achieved a sensitivity of 72% for dysplasia and an overall accuracy of 90%. The whole-slide-image-level model achieved a sensitivity of 90% for dysplasia and 94% overall accuracy. Grad-CAMs for all models showed activation in medically relevant tissue regions. Our deep learning models achieved high diagnostic accuracy for both pCLE-based and histopathologic diagnosis of esophageal dysplasia and its precursors, similar to human accuracy in prior studies. These machine learning approaches may improve accuracy and efficiency of current screening protocols.


Assuntos
Esôfago de Barrett/diagnóstico por imagem , Esôfago de Barrett/patologia , Confiabilidade dos Dados , Aprendizado Profundo , Neoplasias Esofágicas/diagnóstico por imagem , Neoplasias Esofágicas/patologia , Idoso , Biópsia , Esôfago/diagnóstico por imagem , Esôfago/patologia , Feminino , Humanos , Processamento de Imagem Assistida por Computador/métodos , Masculino , Microscopia Confocal/métodos , Pessoa de Meia-Idade , Estudos Prospectivos , Sensibilidade e Especificidade
18.
J Pediatr Gastroenterol Nutr ; 72(6): 833-841, 2021 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-33534362

RESUMO

OBJECTIVES: Striking histopathological overlap between distinct but related conditions poses a disease diagnostic challenge. There is a major clinical need to develop computational methods enabling clinicians to translate heterogeneous biomedical images into accurate and quantitative diagnostics. This need is particularly salient with small bowel enteropathies; environmental enteropathy (EE) and celiac disease (CD). We built upon our preliminary analysis by developing an artificial intelligence (AI)-based image analysis platform utilizing deep learning convolutional neural networks (CNNs) for these enteropathies. METHODS: Data for the secondary analysis was obtained from three primary studies at different sites. The image analysis platform for EE and CD was developed using CNNs including one with multizoom architecture. Gradient-weighted class activation mappings (Grad-CAMs) were used to visualize the models' decision-making process for classifying each disease. A team of medical experts simultaneously reviewed the stain color normalized images done for bias reduction and Grad-CAMs to confirm structural preservation and biomedical relevance, respectively. RESULTS: Four hundred and sixty-one high-resolution biopsy images from 150 children were acquired. Median age (interquartile range) was 37.5 (19.0-121.5) months with a roughly equal sex distribution; 77 males (51.3%). ResNet50 and shallow CNN demonstrated 98% and 96% case-detection accuracy, respectively, which increased to 98.3% with an ensemble. Grad-CAMs demonstrated models' ability to learn different microscopic morphological features for EE, CD, and controls. CONCLUSIONS: Our AI-based image analysis platform demonstrated high classification accuracy for small bowel enteropathies which was capable of identifying biologically relevant microscopic features and emulating human pathologist decision-making process. Grad-CAMs illuminated the otherwise "black box" of deep learning in medicine, allowing for increased physician confidence in adopting these new technologies in clinical practice.


Assuntos
Inteligência Artificial , Doença Celíaca , Biópsia , Doença Celíaca/diagnóstico , Criança , Pré-Escolar , Humanos , Processamento de Imagem Assistida por Computador , Masculino , Redes Neurais de Computação
19.
J Pers Med ; 10(4)2020 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-32977465

RESUMO

The gold standard of histopathology for the diagnosis of Barrett's esophagus (BE) is hindered by inter-observer variability among gastrointestinal pathologists. Deep learning-based approaches have shown promising results in the analysis of whole-slide tissue histopathology images (WSIs). We performed a comparative study to elucidate the characteristics and behaviors of different deep learning-based feature representation approaches for the WSI-based diagnosis of diseased esophageal architectures, namely, dysplastic and non-dysplastic BE. The results showed that if appropriate settings are chosen, the unsupervised feature representation approach is capable of extracting more relevant image features from WSIs to classify and locate the precursors of esophageal cancer compared to weakly supervised and fully supervised approaches.

20.
Artigo em Inglês | MEDLINE | ID: mdl-34046246

RESUMO

One of the greatest obstacles in the adoption of deep neural networks for new medical applications is that training these models typically require a large amount of manually labeled training samples. In this body of work, we investigate the semi-supervised scenario where one has access to large amounts of unlabeled data and only a few labeled samples. We study the performance of MixMatch and FixMatch-two popular semi-supervised learning methods-on a histology dataset. More specifically, we study these models' impact under a highly noisy and imbalanced setting. The findings here motivate the development of semi-supervised methods to ameliorate problems commonly encountered in medical data applications.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA