Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
PLoS One ; 19(3): e0300127, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38483951

RESUMO

BACKGROUND: The burden of Parkinson Disease (PD) represents a key public health issue and it is essential to develop innovative and cost-effective approaches to promote sustainable diagnostic and therapeutic interventions. In this perspective the adoption of a P3 (predictive, preventive and personalized) medicine approach seems to be pivotal. The NeuroArtP3 (NET-2018-12366666) is a four-year multi-site project co-funded by the Italian Ministry of Health, bringing together clinical and computational centers operating in the field of neurology, including PD. OBJECTIVE: The core objectives of the project are: i) to harmonize the collection of data across the participating centers, ii) to structure standardized disease-specific datasets and iii) to advance knowledge on disease's trajectories through machine learning analysis. METHODS: The 4-years study combines two consecutive research components: i) a multi-center retrospective observational phase; ii) a multi-center prospective observational phase. The retrospective phase aims at collecting data of the patients admitted at the participating clinical centers. Whereas the prospective phase aims at collecting the same variables of the retrospective study in newly diagnosed patients who will be enrolled at the same centers. RESULTS: The participating clinical centers are the Provincial Health Services (APSS) of Trento (Italy) as the center responsible for the PD study and the IRCCS San Martino Hospital of Genoa (Italy) as the promoter center of the NeuroartP3 project. The computational centers responsible for data analysis are the Bruno Kessler Foundation of Trento (Italy) with TrentinoSalute4.0 -Competence Center for Digital Health of the Province of Trento (Italy) and the LISCOMPlab University of Genoa (Italy). CONCLUSIONS: The work behind this observational study protocol shows how it is possible and viable to systematize data collection procedures in order to feed research and to advance the implementation of a P3 approach into the clinical practice through the use of AI models.


Assuntos
Inteligência Artificial , Doença de Parkinson , Humanos , Estudos Retrospectivos , Estudos Prospectivos , Doença de Parkinson/diagnóstico , Saúde Pública , Estudos Observacionais como Assunto , Estudos Multicêntricos como Assunto
2.
BioData Min ; 16(1): 33, 2023 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-38001537

RESUMO

BACKGROUND: Discrimination between patients affected by inflammatory bowel diseases and healthy controls on the basis of endoscopic imaging is an challenging problem for machine learning models. Such task is used here as the testbed for a novel deep learning classification pipeline, powered by a set of solutions enhancing characterising elements such as reproducibility, interpretability, reduced computational workload, bias-free modeling and careful image preprocessing. RESULTS: First, an automatic preprocessing procedure is devised, aimed to remove artifacts from clinical data, feeding then the resulting images to an aggregated per-patient model to mimic the clinicians decision process. The predictions are based on multiple snapshots obtained through resampling, reducing the risk of misleading outcomes by removing the low confidence predictions. Each patient's outcome is explained by returning the images the prediction is based upon, supporting clinicians in verifying diagnoses without the need for evaluating the full set of endoscopic images. As a major theoretical contribution, quantization is employed to reduce the complexity and the computational cost of the model, allowing its deployment on small power devices with an almost negligible 3% performance degradation. Such quantization procedure holds relevance not only in the context of per-patient models but also for assessing its feasibility in providing real-time support to clinicians even in low-resources environments. The pipeline is demonstrated on a private dataset of endoscopic images of 758 IBD patients and 601 healthy controls, achieving Matthews Correlation Coefficient 0.9 as top performance on test set. CONCLUSION: We highlighted how a comprehensive pre-processing pipeline plays a crucial role in identifying and removing artifacts from data, solving one of the principal challenges encountered when working with clinical data. Furthermore, we constructively showed how it is possible to emulate clinicians decision process and how it offers significant advantages, particularly in terms of explainability and trust within the healthcare context. Last but not least, we proved that quantization can be a useful tool to reduce the time and resources consumption with an acceptable degradation of the model performs. The quantization study proposed in this work points up the potential development of real-time quantized algorithms as valuable tools to support clinicians during endoscopy procedures.

3.
Comput Biol Med ; 152: 106373, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36462367

RESUMO

Systemic lupus erythematosus and primary Sjogren's syndrome are complex systemic autoimmune diseases that are often misdiagnosed. In this article, we demonstrate the potential of machine learning to perform differential diagnosis of these similar pathologies using gene expression and methylation data from 651 individuals. Furthermore, we analyzed the impact of the heterogeneity of these diseases on the performance of the predictive models, discovering that patients assigned to a specific molecular cluster are misclassified more often and affect to the overall performance of the predictive models. In addition, we found that the samples characterized by a high interferon activity are the ones predicted with more accuracy, followed by the samples with high inflammatory activity. Finally, we identified a group of biomarkers that improve the predictions compared to using the whole data and we validated them with external studies from other tissues and technological platforms.


Assuntos
Lúpus Eritematoso Sistêmico , Síndrome de Sjogren , Humanos , Síndrome de Sjogren/diagnóstico , Síndrome de Sjogren/genética , Diagnóstico Diferencial , Multiômica , Lúpus Eritematoso Sistêmico/diagnóstico , Lúpus Eritematoso Sistêmico/genética , Aprendizado de Máquina
4.
Cancer Sci ; 114(1): 281-294, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36114746

RESUMO

Emerging evidence suggests that the prognosis of patients with lung adenocarcinoma can be determined from germline variants and transcript levels in nontumoral lung tissue. Gene expression data from noninvolved lung tissue of 483 lung adenocarcinoma patients were tested for correlation with overall survival using multivariable Cox proportional hazard and multivariate machine learning models. For genes whose transcript levels are associated with survival, we used genotype data from 414 patients to identify germline variants acting as cis-expression quantitative trait loci (eQTLs). Associations of eQTL variant genotypes with gene expression and survival were tested. Levels of four transcripts were inversely associated with survival by Cox analysis (CLCF1, hazard ratio [HR] = 1.53; CNTNAP1, HR = 2.17; DUSP14, HR = 1.78; and MT1F: HR = 1.40). Machine learning analysis identified a signature of transcripts associated with lung adenocarcinoma outcome that was largely overlapping with the transcripts identified by Cox analysis, including the three most significant genes (CLCF1, CNTNAP1, and DUSP14). Pathway analysis indicated that the signature is enriched for ECM components. We identified 32 cis-eQTLs for CNTNAP1, including 6 with an inverse correlation and 26 with a direct correlation between the number of minor alleles and transcript levels. Of these, all but one were prognostic: the six with an inverse correlation were associated with better prognosis (HR < 1) while the others were associated with worse prognosis. Our findings provide supportive evidence that genetic predisposition to lung adenocarcinoma outcome is a feature already present in patients' noninvolved lung tissue.


Assuntos
Adenocarcinoma de Pulmão , Neoplasias Pulmonares , Humanos , Predisposição Genética para Doença , Adenocarcinoma de Pulmão/genética , Pulmão/patologia , Genótipo , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Prognóstico , Polimorfismo de Nucleotídeo Único
5.
BMC Med Inform Decis Mak ; 22(Suppl 6): 300, 2022 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-36401328

RESUMO

BACKGROUND: The SI-CURA project (Soluzioni Innovative per la gestione del paziente e il follow up terapeutico della Colite UlceRosA) is an Italian initiative aimed at the development of artificial intelligence solutions to discriminate pathologies of different nature, including inflammatory bowel disease (IBD), namely Ulcerative Colitis (UC) and Crohn's disease (CD), based on endoscopic imaging of patients (P) and healthy controls (N). METHODS: In this study we develop a deep learning (DL) prototype to identify disease patterns through three binary classification tasks, namely (1) discriminating positive (pathological) samples from negative (healthy) samples (P vs N); (2) discrimination between Ulcerative Colitis and Crohn's Disease samples (UC vs CD) and, (3) discrimination between Ulcerative Colitis and negative (healthy) samples (UC vs N). RESULTS: The model derived from our approach achieves a high performance of Matthews correlation coefficient (MCC) > 0.9 on the test set for P versus N and UC versus N, and MCC > 0.6 on the test set for UC versus CD. CONCLUSION: Our DL model effectively discriminates between pathological and negative samples, as well as between IBD subgroups, providing further evidence of its potential as a decision support tool for endoscopy-based diagnosis.


Assuntos
Colite Ulcerativa , Doença de Crohn , Doenças Inflamatórias Intestinais , Humanos , Colite Ulcerativa/diagnóstico por imagem , Colite Ulcerativa/patologia , Doença de Crohn/diagnóstico por imagem , Doença de Crohn/patologia , Inteligência Artificial , Endoscopia
6.
Sci Rep ; 12(1): 1997, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-35132093

RESUMO

Miscarriage is the spontaneous termination of a pregnancy before 24 weeks of gestation. We studied the genome of euploid miscarried embryos from mothers in the range of healthy adult individuals to understand genetic susceptibility to miscarriage not caused by chromosomal aneuploidies. We developed GP , a pipeline that we used to prioritize 439 unique variants in 399 genes, including genes known to be associated with miscarriages. Among the prioritized genes we found STAG2 coding for the cohesin complex subunit, for which inactivation in mouse is lethal, and TLE4 a target of Notch and Wnt, physically interacting with a region on chromosome 9 associated to miscarriages.


Assuntos
Aborto Espontâneo/genética , Aneuploidia , Estudos de Associação Genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Animais , Proteínas de Ciclo Celular/genética , Proteínas Cromossômicas não Histona/genética , Cromossomos Humanos Par 9/genética , Feminino , Humanos , Camundongos , Proteínas Nucleares , Gravidez , Receptores Notch/genética , Proteínas Repressoras , Proteínas Wnt/genética , Coesinas
7.
J Proteomics ; 251: 104407, 2022 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-34763095

RESUMO

During the last decade, the evidences on the relationship between neurodevelopmental disorders and the microbial communities of the intestinal tract have considerably grown. Particularly, the role of gut microbiota (GM) ecology and predicted functions in Autism Spectrum Disorders (ASD) has been especially investigated by 16S rRNA targeted and shotgun metagenomics, trying to assess disease signature and their correlation with cognitive impairment or gastrointestinal (GI) manifestations of the disease. Herein we present a metaproteomic approach to point out the microbial gene expression profiles, their functional annotations, and the taxonomic distribution of gut microbial communities in ASD children. We pursued a LC-MS/MS based investigation, to compare the GM profiles of patients with those of their respective relatives and aged-matched controls, providing a quantitative evaluation of bacterial metaproteins by SWATH analysis. All data were managed by a multiple step bioinformatic pipeline, including network analysis. In particular, comparing ASD subjects with CTRLs, up-regulation was found for some metaproteins associated with Clostridia and with carbohydrate metabolism (glyceraldehyde-3-phosphate and glutamate dehydrogenases), while down-regulation was observed for others associated with Bacteroidia (SusC and SusD family together with the TonB dependent receptor). Moreover, network analysis highlighted specific microbial correlations among ASD subgroups characterized by different functioning levels and GI symptoms. SIGNIFICANCE: To the best of our knowledge, this study represents the first metaproteomic investigation on the gut microbiota of ASD children compared with relatives and age-matched CTRLs. Remarkably, the applied SWATH methodology allowed the attribution of differentially regulated functions to specific microbial taxa, offering a novel and complementary point of view with respect to previous studies.


Assuntos
Transtorno do Espectro Autista , Microbioma Gastrointestinal , Idoso , Transtorno do Espectro Autista/complicações , Transtorno do Espectro Autista/metabolismo , Criança , Cromatografia Líquida , Microbioma Gastrointestinal/fisiologia , Humanos , RNA Ribossômico 16S/genética , Espectrometria de Massas em Tandem
8.
Genome Biol ; 22(1): 109, 2021 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863344

RESUMO

BACKGROUND: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS: All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION: This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.


Assuntos
Biomarcadores Tumorais , Testes Genéticos/métodos , Genômica/métodos , Neoplasias/genética , Oncogenes , Variações do Número de Cópias de DNA , Testes Genéticos/normas , Genômica/normas , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Mutação , Neoplasias/diagnóstico , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
9.
Genome Biol ; 22(1): 111, 2021 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863366

RESUMO

BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS: In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION: These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.


Assuntos
Alelos , Biomarcadores Tumorais , Frequência do Gene , Testes Genéticos/métodos , Variação Genética , Genômica/métodos , Neoplasias/genética , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Heterogeneidade Genética , Testes Genéticos/normas , Genômica/normas , Humanos , Neoplasias/diagnóstico , Fluxo de Trabalho
10.
Nat Commun ; 11(1): 5992, 2020 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-33239635

RESUMO

Tumor-infiltrating lymphocytes play an essential role in improving clinical outcome of neuroblastoma (NB) patients, but their relationship with other tumor-infiltrating immune cells in the T cell-inflamed tumors remains poorly investigated. Here we show that dendritic cells (DCs) and natural killer (NK) cells are positively correlated with T-cell infiltration in human NB, both at transcriptional and protein levels, and associate with a favorable prognosis. Multiplex imaging displays DC/NK/T cell conjugates in the tumor microenvironment of low-risk NB. Remarkably, this connection is further strengthened by the identification of gene signatures related to DCs and NK cells able to predict survival of NB patients and strongly correlate with the expression of PD-1 and PD-L1. In summary, our findings unveil a key prognostic role of DCs and NK cells and indicate their related gene signatures as promising tools for the identification of clinical biomarkers to better define risk stratification and survival of NB patients.


Assuntos
Células Dendríticas/metabolismo , Células Matadoras Naturais/metabolismo , Linfócitos do Interstício Tumoral/metabolismo , Neuroblastoma/mortalidade , Transcriptoma/imunologia , Adolescente , Adulto , Antígeno B7-H1/metabolismo , Criança , Pré-Escolar , Estudos de Coortes , Conjuntos de Dados como Assunto , Células Dendríticas/imunologia , Intervalo Livre de Doença , Feminino , Humanos , Lactente , Células Matadoras Naturais/imunologia , Linfócitos do Interstício Tumoral/imunologia , Masculino , Pessoa de Meia-Idade , Neuroblastoma/genética , Neuroblastoma/imunologia , Neuroblastoma/patologia , Prognóstico , Receptor de Morte Celular Programada 1/metabolismo , RNA-Seq , Sensibilidade e Especificidade , Taxa de Sobrevida , Linfócitos T/imunologia , Linfócitos T/metabolismo , Microambiente Tumoral/genética , Microambiente Tumoral/imunologia , Adulto Jovem
11.
Front Oncol ; 10: 1065, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32714870

RESUMO

Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.

12.
Biol Direct ; 15(1): 3, 2020 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-32054490

RESUMO

BACKGROUND: Drug-induced liver injury (DILI) is a major concern in drug development, as hepatotoxicity may not be apparent at early stages but can lead to life threatening consequences. The ability to predict DILI from in vitro data would be a crucial advantage. In 2018, the Critical Assessment Massive Data Analysis group proposed the CMap Drug Safety challenge focusing on DILI prediction. METHODS AND RESULTS: The challenge data included Affymetrix GeneChip expression profiles for the two cancer cell lines MCF7 and PC3 treated with 276 drug compounds and empty vehicles. Binary DILI labeling and a recommended train/test split for the development of predictive classification approaches were also provided. We devised three deep learning architectures for DILI prediction on the challenge data and compared them to random forest and multi-layer perceptron classifiers. On a subset of the data and for some of the models we additionally tested several strategies for balancing the two DILI classes and to identify alternative informative train/test splits. All the models were trained with the MAQC data analysis protocol (DAP), i.e., 10x5 cross-validation over the training set. In all the experiments, the classification performance in both cross-validation and external validation gave Matthews correlation coefficient (MCC) values below 0.2. We observed minimal differences between the two cell lines. Notably, deep learning approaches did not give an advantage on the classification performance. DISCUSSION: We extensively tested multiple machine learning approaches for the DILI classification task obtaining poor to mediocre performance. The results suggest that the CMap expression data on the two cell lines MCF7 and PC3 are not sufficient for accurate DILI label prediction. REVIEWERS: This article was reviewed by Maciej Kandula and Pawel P. Labaj.


Assuntos
Doença Hepática Induzida por Substâncias e Drogas/etiologia , Aprendizado de Máquina , Humanos , Modelos Biológicos , Medição de Risco/métodos
13.
Cancers (Basel) ; 11(10)2019 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-31618839

RESUMO

Immunotherapy by using immune checkpoint inhibitors (ICI) has dramatically improved the treatment options in various cancers, increasing survival rates for treated patients. Nevertheless, there are heterogeneous response rates to ICI among different cancer types, and even in the context of patients affected by a specific cancer. Thus, it becomes crucial to identify factors that predict the response to immunotherapeutic approaches. A comprehensive investigation of the mutational and immunological aspects of the tumor can be useful to obtain a robust prediction. By performing a pan-cancer analysis on gene expression data from the Cancer Genome Atlas (TCGA, 8055 cases and 29 cancer types), we set up and validated a machine learning approach to predict the potential for positive response to ICI. Support vector machines (SVM) and extreme gradient boosting (XGboost) models were developed with a 10×5-fold cross-validation schema on 80% of TCGA cases to predict ICI responsiveness defined by a score combining tumor mutational burden and TGF- ß signaling. On the remaining 20% validation subset, our SVM model scored 0.88 accuracy and 0.27 Matthews Correlation Coefficient. The proposed machine learning approach could be useful to predict the putative response to ICI treatment by expression data of primary tumors.

14.
PLoS Comput Biol ; 15(3): e1006269, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30917113

RESUMO

Artificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the promise of aiding in routine reporting and standardizing results across trials. Deep learning features inferred from digital pathology scans can improve validity and robustness of current clinico-pathological features, up to identifying novel histological patterns, e.g., from tumor infiltrating lymphocytes. In this study, we examine the issue of evaluating accuracy of predictive models from deep learning features in digital pathology, as an hallmark of reproducibility. We introduce the DAPPER framework for validation based on a rigorous Data Analysis Plan derived from the FDA's MAQC project, designed to analyze causes of variability in predictive biomarkers. We apply the framework on models that identify tissue of origin on 787 Whole Slide Images from the Genotype-Tissue Expression (GTEx) project. We test three different deep learning architectures (VGG, ResNet, Inception) as feature extractors and three classifiers (a fully connected multilayer, Support Vector Machine and Random Forests) and work with four datasets (5, 10, 20 or 30 classes), for a total of 53, 000 tiles at 512 × 512 resolution. We analyze accuracy and feature stability of the machine learning classifiers, also demonstrating the need for diagnostic tests (e.g., random labels) to identify selection bias and risks for reproducibility. Further, we use the deep features from the VGG model from GTEx on the KIMIA24 dataset for identification of slide of origin (24 classes) to train a classifier on 1, 060 annotated tiles and validated on 265 unseen ones. The DAPPER software, including its deep learning pipeline and the Histological Imaging-Newsy Tiles (HINT) benchmark dataset derived from GTEx, is released as a basis for standardization and validation initiatives in AI for digital pathology.


Assuntos
Algoritmos , Inteligência Artificial , Técnicas Histológicas/métodos , Interpretação de Imagem Assistida por Computador/métodos , Software , Humanos , Reprodutibilidade dos Testes
15.
Oncoimmunology ; 8(2): e1542245, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30713803

RESUMO

Although pediatric malignant extracranial germ-cell tumors (meGCTs) are among the most chemosensitive solid tumors, a group of patients relapse and die of disease. To identify new markers predicting clinical outcome, we examined the prognostic relevance of tumor-infiltrating T lymphocytes (TILs) and the expression of PD-1 and PD-L1 in a cohort of pediatric meGCTs by in situ immunohistochemistry. MeGCTs were variously infiltrated by T cell-subtypes according to the tumor subtype, tumor location and age at diagnosis. We distinguished three different phenotypes: i) tumors not infiltrated by T cells (immature teratomas and half of the yolk sac tumors), ii) tumors highly infiltrated by CD8+ T cells expressing PD-1, which identifies activated tumor-reactive T cells (seminomas and dysgerminomas), iii) tumors highly infiltrated by CD8+ T cells within an immunosuppressive tumor microenvironment characterized by CD4+FOXP3+ Treg cells and PD-L1-expressing tumor cells (embryonal carcinomas, choriocarcinomas and the remaining yolk sac tumors). Tumor subtypes belonging mixed meGCTs were variously infiltrated, suggesting the coexistence of multiple immune microenvironments either facilitating or precluding the entry of T cells. These findings support the hypothesis that TILs influence the development of meGCTs and might be of clinical relevance to improve risk stratification and the treatment of pediatric patients.

16.
PLoS One ; 13(12): e0208924, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30532223

RESUMO

We introduce the CDRP (Concatenated Diagnostic-Relapse Prognostic) architecture for multi-task deep learning that incorporates a clinical algorithm, e.g., a risk stratification schema to improve prognostic profiling. We present the first application to survival prediction in High-Risk (HR) Neuroblastoma from transcriptomics data, a task that studies from the MAQC consortium have shown to remain the hardest among multiple diagnostic and prognostic endpoints predictable from the same dataset. To obtain a more accurate risk stratification needed for appropriate treatment strategies, CDRP combines a first component (CDRP-A) synthesizing a diagnostic task and a second component (CDRP-N) dedicated to one or more prognostic tasks. The approach leverages the advent of semi-supervised deep learning structures that can flexibly integrate multimodal data or internally create multiple processing paths. CDRP-A is an autoencoder trained on gene expression on the HR/non-HR risk stratification by the Children's Oncology Group, obtaining a 64-node representation in the bottleneck layer. CDRP-N is a multi-task classifier for two prognostic endpoints, i.e., Event-Free Survival (EFS) and Overall Survival (OS). CDRP-A provides the HR embedding input to the CDRP-N shared layer, from which two branches depart to model EFS and OS, respectively. To control for selection bias, CDRP is trained and evaluated using a Data Analysis Protocol (DAP) developed within the MAQC initiative. CDRP was applied on Illumina RNA-Seq of 498 Neuroblastoma patients (HR: 176) from the SEQC study (12,464 Entrez genes) and on Affymetrix Human Exon Array expression profiles (17,450 genes) of 247 primary diagnostic Neuroblastoma of the TARGET NBL cohort. On the SEQC HR patients, CDRP achieves Matthews Correlation Coefficient (MCC) 0.38 for EFS and MCC = 0.19 for OS in external validation, improving over published SEQC models. We show that a CDRP-N embedding is indeed parametrically associated to increasing severity and the embedding can be used to better stratify patients' survival.


Assuntos
Aprendizado Profundo , Recidiva Local de Neoplasia/diagnóstico , Neuroblastoma/diagnóstico , Prognóstico , Algoritmos , Criança , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Lactente , Masculino , Recidiva Local de Neoplasia/epidemiologia , Recidiva Local de Neoplasia/genética , Recidiva Local de Neoplasia/patologia , Neuroblastoma/epidemiologia , Neuroblastoma/genética , Neuroblastoma/patologia , Intervalo Livre de Progressão , Medição de Risco
17.
Biol Direct ; 13(1): 5, 2018 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-29615097

RESUMO

BACKGROUND: High-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers. The effective integration of omics data could provide a broader insight into the mechanisms of cancer biology, helping researchers and clinicians to develop personalized therapies. RESULTS: In the context of CAMDA 2017 Neuroblastoma Data Integration challenge, we explore the use of Integrative Network Fusion (INF), a bioinformatics framework combining a similarity network fusion with machine learning for the integration of multiple omics data. We apply the INF framework for the prediction of neuroblastoma patient outcome, integrating RNA-Seq, microarray and array comparative genomic hybridization data. We additionally explore the use of autoencoders as a method to integrate microarray expression and copy number data. CONCLUSIONS: The INF method is effective for the integration of multiple data sources providing compact feature signatures for patient classification with performances comparable to other methods. Latent space representation of the integrated data provided by the autoencoder approach gives promising results, both by improving classification on survival endpoints and by providing means to discover two groups of patients characterized by distinct overall survival (OS) curves. REVIEWERS: This article was reviewed by Djork-Arné Clevert and Tieliu Shi.


Assuntos
Genômica/métodos , Neuroblastoma/genética , Neuroblastoma/metabolismo , Animais , Biologia Computacional , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neuroblastoma/patologia
18.
BMC Bioinformatics ; 19(Suppl 2): 49, 2018 03 08.
Artigo em Inglês | MEDLINE | ID: mdl-29536822

RESUMO

BACKGROUND: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. RESULTS: Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. CONCLUSION: Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.


Assuntos
Metagenômica , Redes Neurais de Computação , Filogenia , Algoritmos , Análise de Dados , Bases de Dados Genéticas , Humanos , Doenças Inflamatórias Intestinais/genética , Análise de Componente Principal , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
19.
Artigo em Inglês | MEDLINE | ID: mdl-30628533

RESUMO

We introduce here ML4Tox, a framework offering Deep Learning and Support Vector Machine models to predict agonist, antagonist, and binding activities of chemical compounds, in this case for the estrogen receptor ligand-binding domain. The ML4Tox models have been developed with a 10 × 5-fold cross-validation schema on the training portion of the CERAPP ToxCast dataset, formed by 1677 chemicals, each described by 777 molecular features. On the CERAPP "All Literature" evaluation set (agonist: 6319 compounds; antagonist 6539; binding 7283), ML4Tox significantly improved sensitivity over published results on all three tasks, with agonist: 0.78 vs 0.56; antagonist: 0.69 vs 0.11; binding: 0.66 vs 0.26.


Assuntos
Simulação por Computador , Disruptores Endócrinos/toxicidade , Poluentes Ambientais/toxicidade , Aprendizado de Máquina , Testes de Toxicidade/métodos , Ligação Proteica , Relação Quantitativa Estrutura-Atividade , Receptores de Estrogênio , Máquina de Vetores de Suporte
20.
Cell Death Differ ; 24(5): 889-902, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28338656

RESUMO

Hepatocellular carcinoma (HCC) is the most common type of liver cancer in humans. The focal adhesion tyrosine kinase (FAK) is often over-expressed in human HCC and FAK inhibition may reduce HCC cell invasiveness. However, the anti-oncogenic effect of FAK knockdown in HCC cells remains to be clarified. We found that FAK depletion in HCC cells reduced in vitro and in vivo tumorigenicity, by inducing G2/M arrest and apoptosis, decreasing anchorage-independent growth, and modulating the expression of several cancer-related genes. Among these genes, we showed that FAK silencing decreased transcription and nuclear localization of enhancer of zeste homolog 2 (EZH2) and its tri-methylation activity on lysine 27 of histone H3 (H3K27me3). Accordingly, FAK, EZH2 and H3K27me3 were concomitantly upregulated in human HCCs compared to non-tumor livers. In vitro experiments demonstrated that FAK affected EZH2 expression and function by modulating, at least in part, p53 and E2F2/3 transcriptional activity. Moreover, FAK silencing downregulated both EZH2 binding and histone H3K27me3 levels at the promoter of its target gene NOTCH2. Finally, we found that pharmacological inhibition of FAK activity resembled these effects although milder. In summary, we demonstrate that FAK depletion reduces HCC cell growth by affecting cancer-promoting genes including the pro-oncogene EZH2. Furthermore, we unveil a novel unprecedented FAK/EZH2 crosstalk in HCC cells, thus identifying a targetable network paving the way for new anticancer therapies.


Assuntos
Carcinoma Hepatocelular/genética , Proteína Potenciadora do Homólogo 2 de Zeste/genética , Quinase 1 de Adesão Focal/genética , Regulação Neoplásica da Expressão Gênica , Neoplasias Hepáticas/genética , Receptor Notch2/genética , Aminopiridinas/farmacologia , Animais , Apoptose/efeitos dos fármacos , Apoptose/genética , Carcinoma Hepatocelular/metabolismo , Carcinoma Hepatocelular/patologia , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Fator de Transcrição E2F2/genética , Fator de Transcrição E2F2/metabolismo , Fator de Transcrição E2F3/genética , Fator de Transcrição E2F3/metabolismo , Proteína Potenciadora do Homólogo 2 de Zeste/metabolismo , Quinase 1 de Adesão Focal/antagonistas & inibidores , Quinase 1 de Adesão Focal/metabolismo , Pontos de Checagem da Fase G2 do Ciclo Celular , Células Hep G2 , Histonas/genética , Histonas/metabolismo , Humanos , Neoplasias Hepáticas/metabolismo , Neoplasias Hepáticas/patologia , Masculino , Camundongos , Camundongos Nus , Transplante de Neoplasias , Regiões Promotoras Genéticas , RNA Interferente Pequeno/genética , RNA Interferente Pequeno/metabolismo , Receptor Notch2/metabolismo , Transdução de Sinais , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA