Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
PLoS One ; 19(3): e0300127, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38483951

RESUMEN

BACKGROUND: The burden of Parkinson Disease (PD) represents a key public health issue and it is essential to develop innovative and cost-effective approaches to promote sustainable diagnostic and therapeutic interventions. In this perspective the adoption of a P3 (predictive, preventive and personalized) medicine approach seems to be pivotal. The NeuroArtP3 (NET-2018-12366666) is a four-year multi-site project co-funded by the Italian Ministry of Health, bringing together clinical and computational centers operating in the field of neurology, including PD. OBJECTIVE: The core objectives of the project are: i) to harmonize the collection of data across the participating centers, ii) to structure standardized disease-specific datasets and iii) to advance knowledge on disease's trajectories through machine learning analysis. METHODS: The 4-years study combines two consecutive research components: i) a multi-center retrospective observational phase; ii) a multi-center prospective observational phase. The retrospective phase aims at collecting data of the patients admitted at the participating clinical centers. Whereas the prospective phase aims at collecting the same variables of the retrospective study in newly diagnosed patients who will be enrolled at the same centers. RESULTS: The participating clinical centers are the Provincial Health Services (APSS) of Trento (Italy) as the center responsible for the PD study and the IRCCS San Martino Hospital of Genoa (Italy) as the promoter center of the NeuroartP3 project. The computational centers responsible for data analysis are the Bruno Kessler Foundation of Trento (Italy) with TrentinoSalute4.0 -Competence Center for Digital Health of the Province of Trento (Italy) and the LISCOMPlab University of Genoa (Italy). CONCLUSIONS: The work behind this observational study protocol shows how it is possible and viable to systematize data collection procedures in order to feed research and to advance the implementation of a P3 approach into the clinical practice through the use of AI models.


Asunto(s)
Inteligencia Artificial , Enfermedad de Parkinson , Humanos , Estudios Retrospectivos , Estudios Prospectivos , Enfermedad de Parkinson/diagnóstico , Salud Pública , Estudios Observacionales como Asunto , Estudios Multicéntricos como Asunto
2.
BioData Min ; 16(1): 33, 2023 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-38001537

RESUMEN

BACKGROUND: Discrimination between patients affected by inflammatory bowel diseases and healthy controls on the basis of endoscopic imaging is an challenging problem for machine learning models. Such task is used here as the testbed for a novel deep learning classification pipeline, powered by a set of solutions enhancing characterising elements such as reproducibility, interpretability, reduced computational workload, bias-free modeling and careful image preprocessing. RESULTS: First, an automatic preprocessing procedure is devised, aimed to remove artifacts from clinical data, feeding then the resulting images to an aggregated per-patient model to mimic the clinicians decision process. The predictions are based on multiple snapshots obtained through resampling, reducing the risk of misleading outcomes by removing the low confidence predictions. Each patient's outcome is explained by returning the images the prediction is based upon, supporting clinicians in verifying diagnoses without the need for evaluating the full set of endoscopic images. As a major theoretical contribution, quantization is employed to reduce the complexity and the computational cost of the model, allowing its deployment on small power devices with an almost negligible 3% performance degradation. Such quantization procedure holds relevance not only in the context of per-patient models but also for assessing its feasibility in providing real-time support to clinicians even in low-resources environments. The pipeline is demonstrated on a private dataset of endoscopic images of 758 IBD patients and 601 healthy controls, achieving Matthews Correlation Coefficient 0.9 as top performance on test set. CONCLUSION: We highlighted how a comprehensive pre-processing pipeline plays a crucial role in identifying and removing artifacts from data, solving one of the principal challenges encountered when working with clinical data. Furthermore, we constructively showed how it is possible to emulate clinicians decision process and how it offers significant advantages, particularly in terms of explainability and trust within the healthcare context. Last but not least, we proved that quantization can be a useful tool to reduce the time and resources consumption with an acceptable degradation of the model performs. The quantization study proposed in this work points up the potential development of real-time quantized algorithms as valuable tools to support clinicians during endoscopy procedures.

3.
Cancer Sci ; 114(1): 281-294, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36114746

RESUMEN

Emerging evidence suggests that the prognosis of patients with lung adenocarcinoma can be determined from germline variants and transcript levels in nontumoral lung tissue. Gene expression data from noninvolved lung tissue of 483 lung adenocarcinoma patients were tested for correlation with overall survival using multivariable Cox proportional hazard and multivariate machine learning models. For genes whose transcript levels are associated with survival, we used genotype data from 414 patients to identify germline variants acting as cis-expression quantitative trait loci (eQTLs). Associations of eQTL variant genotypes with gene expression and survival were tested. Levels of four transcripts were inversely associated with survival by Cox analysis (CLCF1, hazard ratio [HR] = 1.53; CNTNAP1, HR = 2.17; DUSP14, HR = 1.78; and MT1F: HR = 1.40). Machine learning analysis identified a signature of transcripts associated with lung adenocarcinoma outcome that was largely overlapping with the transcripts identified by Cox analysis, including the three most significant genes (CLCF1, CNTNAP1, and DUSP14). Pathway analysis indicated that the signature is enriched for ECM components. We identified 32 cis-eQTLs for CNTNAP1, including 6 with an inverse correlation and 26 with a direct correlation between the number of minor alleles and transcript levels. Of these, all but one were prognostic: the six with an inverse correlation were associated with better prognosis (HR < 1) while the others were associated with worse prognosis. Our findings provide supportive evidence that genetic predisposition to lung adenocarcinoma outcome is a feature already present in patients' noninvolved lung tissue.


Asunto(s)
Adenocarcinoma del Pulmón , Neoplasias Pulmonares , Humanos , Predisposición Genética a la Enfermedad , Adenocarcinoma del Pulmón/genética , Pulmón/patología , Genotipo , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Pronóstico , Polimorfismo de Nucleótido Simple
4.
Comput Biol Med ; 152: 106373, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36462367

RESUMEN

Systemic lupus erythematosus and primary Sjogren's syndrome are complex systemic autoimmune diseases that are often misdiagnosed. In this article, we demonstrate the potential of machine learning to perform differential diagnosis of these similar pathologies using gene expression and methylation data from 651 individuals. Furthermore, we analyzed the impact of the heterogeneity of these diseases on the performance of the predictive models, discovering that patients assigned to a specific molecular cluster are misclassified more often and affect to the overall performance of the predictive models. In addition, we found that the samples characterized by a high interferon activity are the ones predicted with more accuracy, followed by the samples with high inflammatory activity. Finally, we identified a group of biomarkers that improve the predictions compared to using the whole data and we validated them with external studies from other tissues and technological platforms.


Asunto(s)
Lupus Eritematoso Sistémico , Síndrome de Sjögren , Humanos , Síndrome de Sjögren/diagnóstico , Síndrome de Sjögren/genética , Diagnóstico Diferencial , Multiómica , Lupus Eritematoso Sistémico/diagnóstico , Lupus Eritematoso Sistémico/genética , Aprendizaje Automático
5.
BMC Med Inform Decis Mak ; 22(Suppl 6): 300, 2022 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-36401328

RESUMEN

BACKGROUND: The SI-CURA project (Soluzioni Innovative per la gestione del paziente e il follow up terapeutico della Colite UlceRosA) is an Italian initiative aimed at the development of artificial intelligence solutions to discriminate pathologies of different nature, including inflammatory bowel disease (IBD), namely Ulcerative Colitis (UC) and Crohn's disease (CD), based on endoscopic imaging of patients (P) and healthy controls (N). METHODS: In this study we develop a deep learning (DL) prototype to identify disease patterns through three binary classification tasks, namely (1) discriminating positive (pathological) samples from negative (healthy) samples (P vs N); (2) discrimination between Ulcerative Colitis and Crohn's Disease samples (UC vs CD) and, (3) discrimination between Ulcerative Colitis and negative (healthy) samples (UC vs N). RESULTS: The model derived from our approach achieves a high performance of Matthews correlation coefficient (MCC) > 0.9 on the test set for P versus N and UC versus N, and MCC > 0.6 on the test set for UC versus CD. CONCLUSION: Our DL model effectively discriminates between pathological and negative samples, as well as between IBD subgroups, providing further evidence of its potential as a decision support tool for endoscopy-based diagnosis.


Asunto(s)
Colitis Ulcerosa , Enfermedad de Crohn , Enfermedades Inflamatorias del Intestino , Humanos , Colitis Ulcerosa/diagnóstico por imagen , Colitis Ulcerosa/patología , Enfermedad de Crohn/diagnóstico por imagen , Enfermedad de Crohn/patología , Inteligencia Artificial , Endoscopía
6.
Sci Rep ; 12(1): 1997, 2022 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-35132093

RESUMEN

Miscarriage is the spontaneous termination of a pregnancy before 24 weeks of gestation. We studied the genome of euploid miscarried embryos from mothers in the range of healthy adult individuals to understand genetic susceptibility to miscarriage not caused by chromosomal aneuploidies. We developed GP , a pipeline that we used to prioritize 439 unique variants in 399 genes, including genes known to be associated with miscarriages. Among the prioritized genes we found STAG2 coding for the cohesin complex subunit, for which inactivation in mouse is lethal, and TLE4 a target of Notch and Wnt, physically interacting with a region on chromosome 9 associated to miscarriages.


Asunto(s)
Aborto Espontáneo/genética , Aneuploidia , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Animales , Proteínas de Ciclo Celular/genética , Proteínas Cromosómicas no Histona/genética , Cromosomas Humanos Par 9/genética , Femenino , Humanos , Ratones , Proteínas Nucleares , Embarazo , Receptores Notch/genética , Proteínas Represoras , Proteínas Wnt/genética , Cohesinas
7.
J Proteomics ; 251: 104407, 2022 01 16.
Artículo en Inglés | MEDLINE | ID: mdl-34763095

RESUMEN

During the last decade, the evidences on the relationship between neurodevelopmental disorders and the microbial communities of the intestinal tract have considerably grown. Particularly, the role of gut microbiota (GM) ecology and predicted functions in Autism Spectrum Disorders (ASD) has been especially investigated by 16S rRNA targeted and shotgun metagenomics, trying to assess disease signature and their correlation with cognitive impairment or gastrointestinal (GI) manifestations of the disease. Herein we present a metaproteomic approach to point out the microbial gene expression profiles, their functional annotations, and the taxonomic distribution of gut microbial communities in ASD children. We pursued a LC-MS/MS based investigation, to compare the GM profiles of patients with those of their respective relatives and aged-matched controls, providing a quantitative evaluation of bacterial metaproteins by SWATH analysis. All data were managed by a multiple step bioinformatic pipeline, including network analysis. In particular, comparing ASD subjects with CTRLs, up-regulation was found for some metaproteins associated with Clostridia and with carbohydrate metabolism (glyceraldehyde-3-phosphate and glutamate dehydrogenases), while down-regulation was observed for others associated with Bacteroidia (SusC and SusD family together with the TonB dependent receptor). Moreover, network analysis highlighted specific microbial correlations among ASD subgroups characterized by different functioning levels and GI symptoms. SIGNIFICANCE: To the best of our knowledge, this study represents the first metaproteomic investigation on the gut microbiota of ASD children compared with relatives and age-matched CTRLs. Remarkably, the applied SWATH methodology allowed the attribution of differentially regulated functions to specific microbial taxa, offering a novel and complementary point of view with respect to previous studies.


Asunto(s)
Trastorno del Espectro Autista , Microbioma Gastrointestinal , Anciano , Trastorno del Espectro Autista/complicaciones , Trastorno del Espectro Autista/metabolismo , Niño , Cromatografía Liquida , Microbioma Gastrointestinal/fisiología , Humanos , ARN Ribosómico 16S/genética , Espectrometría de Masas en Tándem
8.
Genome Biol ; 22(1): 109, 2021 04 16.
Artículo en Inglés | MEDLINE | ID: mdl-33863344

RESUMEN

BACKGROUND: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS: All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION: This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.


Asunto(s)
Biomarcadores de Tumor , Pruebas Genéticas/métodos , Genómica/métodos , Neoplasias/genética , Oncogenes , Variaciones en el Número de Copia de ADN , Pruebas Genéticas/normas , Genómica/normas , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Mutación , Neoplasias/diagnóstico , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
9.
Genome Biol ; 22(1): 111, 2021 04 16.
Artículo en Inglés | MEDLINE | ID: mdl-33863366

RESUMEN

BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS: In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION: These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.


Asunto(s)
Alelos , Biomarcadores de Tumor , Frecuencia de los Genes , Pruebas Genéticas/métodos , Variación Genética , Genómica/métodos , Neoplasias/genética , Línea Celular Tumoral , Variaciones en el Número de Copia de ADN , Heterogeneidad Genética , Pruebas Genéticas/normas , Genómica/normas , Humanos , Neoplasias/diagnóstico , Flujo de Trabajo
10.
Nat Commun ; 11(1): 5992, 2020 11 25.
Artículo en Inglés | MEDLINE | ID: mdl-33239635

RESUMEN

Tumor-infiltrating lymphocytes play an essential role in improving clinical outcome of neuroblastoma (NB) patients, but their relationship with other tumor-infiltrating immune cells in the T cell-inflamed tumors remains poorly investigated. Here we show that dendritic cells (DCs) and natural killer (NK) cells are positively correlated with T-cell infiltration in human NB, both at transcriptional and protein levels, and associate with a favorable prognosis. Multiplex imaging displays DC/NK/T cell conjugates in the tumor microenvironment of low-risk NB. Remarkably, this connection is further strengthened by the identification of gene signatures related to DCs and NK cells able to predict survival of NB patients and strongly correlate with the expression of PD-1 and PD-L1. In summary, our findings unveil a key prognostic role of DCs and NK cells and indicate their related gene signatures as promising tools for the identification of clinical biomarkers to better define risk stratification and survival of NB patients.


Asunto(s)
Células Dendríticas/metabolismo , Células Asesinas Naturales/metabolismo , Linfocitos Infiltrantes de Tumor/metabolismo , Neuroblastoma/mortalidad , Transcriptoma/inmunología , Adolescente , Adulto , Antígeno B7-H1/metabolismo , Niño , Preescolar , Estudios de Cohortes , Conjuntos de Datos como Asunto , Células Dendríticas/inmunología , Supervivencia sin Enfermedad , Femenino , Humanos , Lactante , Células Asesinas Naturales/inmunología , Linfocitos Infiltrantes de Tumor/inmunología , Masculino , Persona de Mediana Edad , Neuroblastoma/genética , Neuroblastoma/inmunología , Neuroblastoma/patología , Pronóstico , Receptor de Muerte Celular Programada 1/metabolismo , RNA-Seq , Sensibilidad y Especificidad , Tasa de Supervivencia , Linfocitos T/inmunología , Linfocitos T/metabolismo , Microambiente Tumoral/genética , Microambiente Tumoral/inmunología , Adulto Joven
11.
Front Oncol ; 10: 1065, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32714870

RESUMEN

Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.

12.
Biol Direct ; 15(1): 3, 2020 02 13.
Artículo en Inglés | MEDLINE | ID: mdl-32054490

RESUMEN

BACKGROUND: Drug-induced liver injury (DILI) is a major concern in drug development, as hepatotoxicity may not be apparent at early stages but can lead to life threatening consequences. The ability to predict DILI from in vitro data would be a crucial advantage. In 2018, the Critical Assessment Massive Data Analysis group proposed the CMap Drug Safety challenge focusing on DILI prediction. METHODS AND RESULTS: The challenge data included Affymetrix GeneChip expression profiles for the two cancer cell lines MCF7 and PC3 treated with 276 drug compounds and empty vehicles. Binary DILI labeling and a recommended train/test split for the development of predictive classification approaches were also provided. We devised three deep learning architectures for DILI prediction on the challenge data and compared them to random forest and multi-layer perceptron classifiers. On a subset of the data and for some of the models we additionally tested several strategies for balancing the two DILI classes and to identify alternative informative train/test splits. All the models were trained with the MAQC data analysis protocol (DAP), i.e., 10x5 cross-validation over the training set. In all the experiments, the classification performance in both cross-validation and external validation gave Matthews correlation coefficient (MCC) values below 0.2. We observed minimal differences between the two cell lines. Notably, deep learning approaches did not give an advantage on the classification performance. DISCUSSION: We extensively tested multiple machine learning approaches for the DILI classification task obtaining poor to mediocre performance. The results suggest that the CMap expression data on the two cell lines MCF7 and PC3 are not sufficient for accurate DILI label prediction. REVIEWERS: This article was reviewed by Maciej Kandula and Pawel P. Labaj.


Asunto(s)
Enfermedad Hepática Inducida por Sustancias y Drogas/etiología , Aprendizaje Automático , Humanos , Modelos Biológicos , Medición de Riesgo/métodos
13.
Cancers (Basel) ; 11(10)2019 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-31618839

RESUMEN

Immunotherapy by using immune checkpoint inhibitors (ICI) has dramatically improved the treatment options in various cancers, increasing survival rates for treated patients. Nevertheless, there are heterogeneous response rates to ICI among different cancer types, and even in the context of patients affected by a specific cancer. Thus, it becomes crucial to identify factors that predict the response to immunotherapeutic approaches. A comprehensive investigation of the mutational and immunological aspects of the tumor can be useful to obtain a robust prediction. By performing a pan-cancer analysis on gene expression data from the Cancer Genome Atlas (TCGA, 8055 cases and 29 cancer types), we set up and validated a machine learning approach to predict the potential for positive response to ICI. Support vector machines (SVM) and extreme gradient boosting (XGboost) models were developed with a 10×5-fold cross-validation schema on 80% of TCGA cases to predict ICI responsiveness defined by a score combining tumor mutational burden and TGF- ß signaling. On the remaining 20% validation subset, our SVM model scored 0.88 accuracy and 0.27 Matthews Correlation Coefficient. The proposed machine learning approach could be useful to predict the putative response to ICI treatment by expression data of primary tumors.

14.
PLoS Comput Biol ; 15(3): e1006269, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30917113

RESUMEN

Artificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the promise of aiding in routine reporting and standardizing results across trials. Deep learning features inferred from digital pathology scans can improve validity and robustness of current clinico-pathological features, up to identifying novel histological patterns, e.g., from tumor infiltrating lymphocytes. In this study, we examine the issue of evaluating accuracy of predictive models from deep learning features in digital pathology, as an hallmark of reproducibility. We introduce the DAPPER framework for validation based on a rigorous Data Analysis Plan derived from the FDA's MAQC project, designed to analyze causes of variability in predictive biomarkers. We apply the framework on models that identify tissue of origin on 787 Whole Slide Images from the Genotype-Tissue Expression (GTEx) project. We test three different deep learning architectures (VGG, ResNet, Inception) as feature extractors and three classifiers (a fully connected multilayer, Support Vector Machine and Random Forests) and work with four datasets (5, 10, 20 or 30 classes), for a total of 53, 000 tiles at 512 × 512 resolution. We analyze accuracy and feature stability of the machine learning classifiers, also demonstrating the need for diagnostic tests (e.g., random labels) to identify selection bias and risks for reproducibility. Further, we use the deep features from the VGG model from GTEx on the KIMIA24 dataset for identification of slide of origin (24 classes) to train a classifier on 1, 060 annotated tiles and validated on 265 unseen ones. The DAPPER software, including its deep learning pipeline and the Histological Imaging-Newsy Tiles (HINT) benchmark dataset derived from GTEx, is released as a basis for standardization and validation initiatives in AI for digital pathology.


Asunto(s)
Algoritmos , Inteligencia Artificial , Técnicas Histológicas/métodos , Interpretación de Imagen Asistida por Computador/métodos , Programas Informáticos , Humanos , Reproducibilidad de los Resultados
15.
Oncoimmunology ; 8(2): e1542245, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30713803

RESUMEN

Although pediatric malignant extracranial germ-cell tumors (meGCTs) are among the most chemosensitive solid tumors, a group of patients relapse and die of disease. To identify new markers predicting clinical outcome, we examined the prognostic relevance of tumor-infiltrating T lymphocytes (TILs) and the expression of PD-1 and PD-L1 in a cohort of pediatric meGCTs by in situ immunohistochemistry. MeGCTs were variously infiltrated by T cell-subtypes according to the tumor subtype, tumor location and age at diagnosis. We distinguished three different phenotypes: i) tumors not infiltrated by T cells (immature teratomas and half of the yolk sac tumors), ii) tumors highly infiltrated by CD8+ T cells expressing PD-1, which identifies activated tumor-reactive T cells (seminomas and dysgerminomas), iii) tumors highly infiltrated by CD8+ T cells within an immunosuppressive tumor microenvironment characterized by CD4+FOXP3+ Treg cells and PD-L1-expressing tumor cells (embryonal carcinomas, choriocarcinomas and the remaining yolk sac tumors). Tumor subtypes belonging mixed meGCTs were variously infiltrated, suggesting the coexistence of multiple immune microenvironments either facilitating or precluding the entry of T cells. These findings support the hypothesis that TILs influence the development of meGCTs and might be of clinical relevance to improve risk stratification and the treatment of pediatric patients.

16.
PLoS One ; 13(12): e0208924, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30532223

RESUMEN

We introduce the CDRP (Concatenated Diagnostic-Relapse Prognostic) architecture for multi-task deep learning that incorporates a clinical algorithm, e.g., a risk stratification schema to improve prognostic profiling. We present the first application to survival prediction in High-Risk (HR) Neuroblastoma from transcriptomics data, a task that studies from the MAQC consortium have shown to remain the hardest among multiple diagnostic and prognostic endpoints predictable from the same dataset. To obtain a more accurate risk stratification needed for appropriate treatment strategies, CDRP combines a first component (CDRP-A) synthesizing a diagnostic task and a second component (CDRP-N) dedicated to one or more prognostic tasks. The approach leverages the advent of semi-supervised deep learning structures that can flexibly integrate multimodal data or internally create multiple processing paths. CDRP-A is an autoencoder trained on gene expression on the HR/non-HR risk stratification by the Children's Oncology Group, obtaining a 64-node representation in the bottleneck layer. CDRP-N is a multi-task classifier for two prognostic endpoints, i.e., Event-Free Survival (EFS) and Overall Survival (OS). CDRP-A provides the HR embedding input to the CDRP-N shared layer, from which two branches depart to model EFS and OS, respectively. To control for selection bias, CDRP is trained and evaluated using a Data Analysis Protocol (DAP) developed within the MAQC initiative. CDRP was applied on Illumina RNA-Seq of 498 Neuroblastoma patients (HR: 176) from the SEQC study (12,464 Entrez genes) and on Affymetrix Human Exon Array expression profiles (17,450 genes) of 247 primary diagnostic Neuroblastoma of the TARGET NBL cohort. On the SEQC HR patients, CDRP achieves Matthews Correlation Coefficient (MCC) 0.38 for EFS and MCC = 0.19 for OS in external validation, improving over published SEQC models. We show that a CDRP-N embedding is indeed parametrically associated to increasing severity and the embedding can be used to better stratify patients' survival.


Asunto(s)
Aprendizaje Profundo , Recurrencia Local de Neoplasia/diagnóstico , Neuroblastoma/diagnóstico , Pronóstico , Algoritmos , Niño , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica/genética , Humanos , Lactante , Masculino , Recurrencia Local de Neoplasia/epidemiología , Recurrencia Local de Neoplasia/genética , Recurrencia Local de Neoplasia/patología , Neuroblastoma/epidemiología , Neuroblastoma/genética , Neuroblastoma/patología , Supervivencia sin Progresión , Medición de Riesgo
17.
Biol Direct ; 13(1): 5, 2018 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-29615097

RESUMEN

BACKGROUND: High-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers. The effective integration of omics data could provide a broader insight into the mechanisms of cancer biology, helping researchers and clinicians to develop personalized therapies. RESULTS: In the context of CAMDA 2017 Neuroblastoma Data Integration challenge, we explore the use of Integrative Network Fusion (INF), a bioinformatics framework combining a similarity network fusion with machine learning for the integration of multiple omics data. We apply the INF framework for the prediction of neuroblastoma patient outcome, integrating RNA-Seq, microarray and array comparative genomic hybridization data. We additionally explore the use of autoencoders as a method to integrate microarray expression and copy number data. CONCLUSIONS: The INF method is effective for the integration of multiple data sources providing compact feature signatures for patient classification with performances comparable to other methods. Latent space representation of the integrated data provided by the autoencoder approach gives promising results, both by improving classification on survival endpoints and by providing means to discover two groups of patients characterized by distinct overall survival (OS) curves. REVIEWERS: This article was reviewed by Djork-Arné Clevert and Tieliu Shi.


Asunto(s)
Genómica/métodos , Neuroblastoma/genética , Neuroblastoma/metabolismo , Animales , Biología Computacional , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neuroblastoma/patología
18.
BMC Bioinformatics ; 19(Suppl 2): 49, 2018 03 08.
Artículo en Inglés | MEDLINE | ID: mdl-29536822

RESUMEN

BACKGROUND: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. RESULTS: Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. CONCLUSION: Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.


Asunto(s)
Metagenómica , Redes Neurales de la Computación , Filogenia , Algoritmos , Análisis de Datos , Bases de Datos Genéticas , Humanos , Enfermedades Inflamatorias del Intestino/genética , Análisis de Componente Principal , Reproducibilidad de los Resultados , Máquina de Vectores de Soporte
19.
Artículo en Inglés | MEDLINE | ID: mdl-30628533

RESUMEN

We introduce here ML4Tox, a framework offering Deep Learning and Support Vector Machine models to predict agonist, antagonist, and binding activities of chemical compounds, in this case for the estrogen receptor ligand-binding domain. The ML4Tox models have been developed with a 10 × 5-fold cross-validation schema on the training portion of the CERAPP ToxCast dataset, formed by 1677 chemicals, each described by 777 molecular features. On the CERAPP "All Literature" evaluation set (agonist: 6319 compounds; antagonist 6539; binding 7283), ML4Tox significantly improved sensitivity over published results on all three tasks, with agonist: 0.78 vs 0.56; antagonist: 0.69 vs 0.11; binding: 0.66 vs 0.26.


Asunto(s)
Simulación por Computador , Disruptores Endocrinos/toxicidad , Contaminantes Ambientales/toxicidad , Aprendizaje Automático , Pruebas de Toxicidad/métodos , Unión Proteica , Relación Estructura-Actividad Cuantitativa , Receptores de Estrógenos , Máquina de Vectores de Soporte
20.
Clin Cancer Res ; 23(15): 4462-4472, 2017 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-28270499

RESUMEN

Purpose: This study sought to evaluate the expression of programmed cell death-ligand-1 (PD-L1) and HLA class I on neuroblastoma cells and programmed cell death-1 (PD-1) and lymphocyte activation gene 3 (LAG3) on tumor-infiltrating lymphocytes to better define patient risk stratification and understand whether this tumor may benefit from therapies targeting immune checkpoint molecules.Experimental Design:In situ IHC staining for PD-L1, HLA class I, PD-1, and LAG3 was assessed in 77 neuroblastoma specimens, previously characterized for tumor-infiltrating T-cell density and correlated with clinical outcome. Surface expression of PD-L1 was evaluated by flow cytometry and IHC in neuroblastoma cell lines and tumors genetically and/or pharmacologically inhibited for MYC and MYCN. A dataset of 477 human primary neuroblastomas from GEO and ArrayExpress databases was explored for PD-L1, MYC, and MYCN correlation.Results: Multivariate Cox regression analysis demonstrated that the combination of PD-L1 and HLA class I tumor cell density is a prognostic biomarker for predicting overall survival in neuroblastoma patients (P = 0.0448). MYC and MYCN control the expression of PD-L1 in neuroblastoma cells both in vitro and in vivo Consistently, abundance of PD-L1 transcript correlates with MYC expression in primary neuroblastoma.Conclusions: The combination of PD-L1 and HLA class I represents a novel prognostic biomarker for neuroblastoma. Pharmacologic inhibition of MYCN and MYC may be exploited to target PD-L1 and restore an efficient antitumor immunity in high-risk neuroblastoma. Clin Cancer Res; 23(15); 4462-72. ©2017 AACR.


Asunto(s)
Antígeno B7-H1/genética , Genes MHC Clase I/genética , Proteína Proto-Oncogénica N-Myc/genética , Neuroblastoma/genética , Proteínas Proto-Oncogénicas c-myc/genética , Adolescente , Adulto , Antígenos CD/genética , Antígenos CD/inmunología , Azepinas/administración & dosificación , Antígeno B7-H1/inmunología , Biomarcadores de Tumor/genética , Línea Celular Tumoral , Niño , Preescolar , Femenino , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Genes MHC Clase I/inmunología , Humanos , Lactante , Linfocitos Infiltrantes de Tumor/efectos de los fármacos , Linfocitos Infiltrantes de Tumor/patología , Masculino , Persona de Mediana Edad , Terapia Molecular Dirigida , Proteína Proto-Oncogénica N-Myc/inmunología , Neuroblastoma/tratamiento farmacológico , Neuroblastoma/inmunología , Neuroblastoma/patología , Pronóstico , Receptor de Muerte Celular Programada 1/genética , Proteínas Proto-Oncogénicas c-myc/inmunología , Triazoles/administración & dosificación , Proteína del Gen 3 de Activación de Linfocitos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA