Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 103
Filtrar
1.
Brief Funct Genomics ; 2024 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-39373492

RESUMO

Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.

2.
J Nutr ; 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39332769

RESUMO

BACKGROUND: Intake of sweet and fatty snacks may partly contribute to the occurrence of obesity and other health conditions in childhood. Traditional dietary assessment methods may be limited in accurately assessing the intake of sweet and fatty snacks in children. Metabolite biomarkers may aid the objective assessment of children's food intake and support establishing diet-disease relationships. OBJECTIVE: The present study aimed to identify biomarkers of sweet and fatty snack intake in two independent cohorts of European children. METHODS: We used data from the IDEFICS/I.Family cohort from baseline (2007/2008) and two follow-up examination waves (2009/2010 and 2013/2014). In total, n=1788 urine samples from 599 children were analysed for untargeted metabolomics using high-resolution liquid chromatography-mass spectrometry. Short-term dietary intake was assessed by 24-hour dietary recalls, and habitual dietary intake was calculated with the National Cancer Institute method. Data from the DONALD cohort of 24-hour urine samples (n=567) and 3-day weighted dietary records were used for external replication of results. Multivariate modelling with Unbiased Variable selection in R (MUVR) algorithms and linear mixed models were used to identify novel biomarkers. Metabolite features significantly associated with dietary intake were then annotated. RESULTS: In total, 66 metabolites were discovered and found to be statistically significant for "chocolate candy", "cakes, puddings & cookies", "candy & sweets", "ice cream", and "crisps". Most of the features (n=62) could not be annotated. Short-term and habitual chocolate intake were positively associated with theobromine, xanthosine, and cyclo(L-prolyl-L-valyl). These results were replicated in the DONALD cohort. Short-term "candy & sweets" intake was negatively associated with octenoylcarnitine. CONCLUSION: We identified potential metabolite biomarkers of sweet and fatty snacks in children, of which three biomarkers of chocolate intake, namely theobromine, xanthosine, and cyclo(L-prolyl-L-valyl) were externally replicated. However, these potential biomarkers require further validation in children.

3.
J Pathol ; 264(3): 332-343, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39329437

RESUMO

Lung carcinoids (L-CDs) are rare, poorly characterised neuroendocrine tumours (NETs). L-CDs are more common in women and are not the consequence of cigarette smoking. They are classified histologically as typical carcinoids (TCs) or atypical carcinoids (ACs). ACs confer a worse survival. Histological classification is imperfect, and there is increasing interest in molecular markers. We therefore investigated global transcriptomic and epigenomic profiles of 15 L-CDs resected with curative intent at Royal Brompton Hospital. We identified underlying mutations and structural abnormalities through whole-exome sequencing (WES) and single nucleotide polymorphism (SNP) genotyping. Transcriptomic clustering algorithms identified two distinct L-CD subtypes. These showed similarities either to pancreatic or neuroendocrine tumours at other sites and so were named respectively L-CD-PanC and L-CD-NeU. L-CD-PanC tumours featured upregulation of pancreatic and metabolic pathway genes matched by promoter hypomethylation of genes for beta cells and insulin secretion (p < 1 × 10-6). These tumours were centrally located and showed mutational signatures of activation-induced deaminase/apolipoprotein B editing complex  activity, together with genome-wide DNA methylation loss enriched in repetitive elements (p = 2.2 × 10-16). By contrast, the L-CD-NeU group exhibited upregulation of neuronal markers (adjusted p < 0.01) and was characterised by focal spindle cell morphology (p = 0.04), peripheral location (p = 0.01), high mutational load (p = 2.17 × 10-4), recurrent copy number alterations, and enrichment for ACs. Mutations affected chromatin remodelling and SWI/SNF complex pathways. L-CD-NeU tumours carried a mutational signature attributable to aflatoxin and aristolochic acid (p = 0.05), suggesting a possible environmental exposure in their pathogenesis. Immunologically, myeloid and T-cell markers were enriched in L-CD-PanC and B-cell markers in L-CD-NeU tumours. The substantial epigenetic and non-coding differences between L-CD-PanC and L-CD-NeU open new possibilities for biomarker selection and targeted treatment of L-CD. © 2024 The Author(s). The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Assuntos
Biomarcadores Tumorais , Tumor Carcinoide , Neoplasias Pulmonares , Mutação , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Tumor Carcinoide/genética , Tumor Carcinoide/patologia , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Biomarcadores Tumorais/genética , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/patologia , Adulto , Metilação de DNA , Sequenciamento do Exoma , Polimorfismo de Nucleotídeo Único , Transcriptoma , Genômica , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica
4.
Neurosurg Rev ; 47(1): 666, 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39311972

RESUMO

The article "Differential DNA Methylation Associated with Delayed Cerebral Ischemia after Aneurysmal Subarachnoid Hemorrhage: A Systematic Review" by Tomasz Klepinowski et al. offers an in-depth analysis of the relationship between DNA methylation and delayed cerebral ischemia (DCI) following aneurysmal subarachnoid hemorrhage (aSAH). By systematically reviewing databases like PubMed, MEDLINE, Scopus, and Web of Science, the authors identify key genes, including ITPR3, HAMP, INSR, and CDHR5, as potential biomarkers for early DCI diagnosis. Their meticulous adherence to PRISMA guidelines and the STROBE statement for quality assessment enhances the study's credibility. However, the review could be improved by discussing methodological variability, statistical power, and the functional relevance of identified CpG sites. Additional sections on mechanistic pathways, integration with other omics data, clinical translation, and ethical considerations would further strengthen the review, providing a more comprehensive understanding of epigenetic factors in DCI and paving the way for future therapeutic interventions.


Assuntos
Isquemia Encefálica , Metilação de DNA , Hemorragia Subaracnóidea , Humanos , Hemorragia Subaracnóidea/complicações , Hemorragia Subaracnóidea/genética , Epigênese Genética
5.
Comput Methods Programs Biomed ; 254: 108260, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38878357

RESUMO

BACKGROUND AND OBJECTIVE: Proteome microarrays are one of the popular high-throughput screening methods for large-scale investigation of protein interactions in cells. These interactions can be measured on protein chips when coupled with fluorescence-labeled probes, helping indicate potential biomarkers or discover drugs. Several computational tools were developed to help analyze the protein chip results. However, existing tools fail to provide a user-friendly interface for biologists and present only one or two data analysis methods suitable for limited experimental designs, restricting the use cases. METHODS: In order to facilitate the biomarker examination using protein chips, we implemented a user-friendly and comprehensive web tool called BAPCP (Biomarker Analysis tool for Protein Chip Platforms) in this research to deal with diverse chip data distributions. RESULTS: BAPCP is well integrated with standard chip result files and includes 7 data normalization methods and 7 custom-designed quality control/differential analysis filters for biomarker extraction among experiment groups. Moreover, it can handle cost-efficient chip designs that repeat several blocks/samples within one single slide. Using experiments of the human coronavirus (HCoV) protein microarray and the E. coli proteome chip that helps study the immune response of Kawasaki disease as examples, we demonstrated that BAPCP can accelerate the time-consuming week-long manual biomarker identification process to merely 3 min. CONCLUSIONS: The developed BAPCP tool provides substantial analysis support for protein interaction studies and conforms to the necessity of expanding computer usage and exchanging information in bioscience and medicine. The web service of BAPCP is available at https://cosbi.ee.ncku.edu.tw/BAPCP/.


Assuntos
Biomarcadores , Análise Serial de Proteínas , Software , Biomarcadores/metabolismo , Humanos , Internet , Proteoma , Interface Usuário-Computador , Escherichia coli , Proteômica/métodos , Biologia Computacional
6.
Front Public Health ; 12: 1400332, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38912274

RESUMO

Background: The human immunodeficiency virus (HIV) remains a critical global health issue, with a pressing need for effective diagnostic and monitoring tools. Methodology: This study explored distinctions in salivary metabolome among healthy individuals, individuals with HIV, and those receiving highly active antiretroviral therapy (HAART). Utilizing LC-MS/MS for exhaustive metabolomics profiling, we analyzed 90 oral saliva samples from individuals with HIV, categorized by CD4 count levels in the peripheral blood. Results: Orthogonal partial least squares-discriminant analysis (OPLS-DA) and other analyses underscored significant metabolic alterations in individuals with HIV, especially in energy metabolism pathways. Notably, post-HAART metabolic profiles indicated a substantial presence of exogenous metabolites and changes in amino acid pathways like arginine, proline, and lysine degradation. Key metabolites such as citric acid, L-glutamic acid, and L-histidine were identified as potential indicators of disease progression or recovery. Differential metabolite selection and functional enrichment analysis, combined with receiver operating characteristic (ROC) and random forest analyses, pinpointed potential biomarkers for different stages of HIV infection. Additionally, our research examined the interplay between oral metabolites and microorganisms such as herpes simplex virus type 1 (HSV1), bacteria, and fungi in individuals with HIV, revealing crucial interactions. Conclusion: This investigation seeks to contribute understanding into the metabolic shifts occurring in HIV infection and following the initiation of HAART, while tentatively proposing novel avenues for diagnostic and treatment monitoring through salivary metabolomics.


Assuntos
Terapia Antirretroviral de Alta Atividade , Biomarcadores , Infecções por HIV , Metaboloma , Saliva , Humanos , Saliva/metabolismo , Saliva/química , Infecções por HIV/metabolismo , Biomarcadores/metabolismo , Masculino , Metaboloma/fisiologia , Adulto , Feminino , Pessoa de Meia-Idade , Cromatografia Líquida , Metabolômica , Espectrometria de Massas em Tandem , Diagnóstico Precoce , Contagem de Linfócito CD4
7.
Comput Struct Biotechnol J ; 23: 2304-2325, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38845821

RESUMO

Understanding the intricate relationships between gene expression levels and epigenetic modifications in a genome is crucial to comprehending the pathogenic mechanisms of many diseases. With the advancement of DNA Methylome Profiling techniques, the emphasis on identifying Differentially Methylated Regions (DMRs/DMGs) has become crucial for biomarker discovery, offering new insights into the etiology of illnesses. This review surveys the current state of computational tools/algorithms for the analysis of microarray-based DNA methylation profiling datasets, focusing on key concepts underlying the diagnostic/prognostic CpG site extraction. It addresses methodological frameworks, algorithms, and pipelines employed by various authors, serving as a roadmap to address challenges and understand changing trends in the methodologies for analyzing array-based DNA methylation profiling datasets derived from diseased genomes. Additionally, it highlights the importance of integrating gene expression and methylation datasets for accurate biomarker identification, explores prognostic prediction models, and discusses molecular subtyping for disease classification. The review also emphasizes the contributions of machine learning, neural networks, and data mining to enhance diagnostic workflow development, thereby improving accuracy, precision, and robustness.

8.
BMC Med Inform Decis Mak ; 24(Suppl 4): 175, 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38902676

RESUMO

BACKGROUND: Machine Learning (ML) plays a crucial role in biomedical research. Nevertheless, it still has limitations in data integration and irreproducibility. To address these challenges, robust methods are needed. Pancreatic ductal adenocarcinoma (PDAC), a highly aggressive cancer with low early detection rates and survival rates, is used as a case study. PDAC lacks reliable diagnostic biomarkers, especially metastatic biomarkers, which remains an unmet need. In this study, we propose an ML-based approach for discovering disease biomarkers, apply it to the identification of a PDAC metastatic composite biomarker candidate, and demonstrate the advantages of harnessing data resources. METHODS: We utilised primary tumour RNAseq data from five public repositories, pooling samples to maximise statistical power and integrating data by correcting for technical variance. Data were split into train and validation sets. The train dataset underwent variable selection via a 10-fold cross-validation process that combined three algorithms in 100 models per fold. Genes found in at least 80% of models and five folds were considered robust to build a consensus multivariate model. A random forest model was constructed using selected genes from the train dataset and tested in the validation set. We also assessed the goodness of prediction by recalibrating a model using only the validation data. The biological context and relevance of signals was explored through enrichment and pathway analyses using QIAGEN Ingenuity Pathway Analysis and GeneMANIA. RESULTS: We developed a pipeline that can detect robust signatures to build composite biomarkers. We tested the pipeline in PDAC, exploiting transcriptomics data from different sources, proposing a composite biomarker candidate comprised of fifteen genes consistently selected that showed very promising predictive capability. Biological contextualisation revealed links with cancer progression and metastasis, underscoring their potential relevance. All code is available in GitHub. CONCLUSION: This study establishes a robust framework for identifying composite biomarkers across various disease contexts. We demonstrate its potential by proposing a plausible composite biomarker candidate for PDAC metastasis. By reusing data from public repositories, we highlight the sustainability of our research and the wider applications of our pipeline. The preliminary findings shed light on a promising validation and application path.


Assuntos
Biomarcadores Tumorais , Carcinoma Ductal Pancreático , Aprendizado de Máquina , Neoplasias Pancreáticas , Humanos , Carcinoma Ductal Pancreático/genética , Neoplasias Pancreáticas/genética , Biomarcadores Tumorais/genética
9.
Pharmaceuticals (Basel) ; 17(5)2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38794221

RESUMO

Precise targeting has become the main direction of anti-cancer drug development. Trophoblast cell surface antigen 2 (Trop-2) is highly expressed in different solid tumors but rarely in normal tissues, rendering it an attractive target. Trop-2-targeted antibody-drug conjugates (ADCs) have displayed promising efficacy in treating diverse solid tumors, especially breast cancer and urothelial carcinoma. However, their clinical application is still limited by insufficient efficacy, excessive toxicity, and the lack of biological markers related to effectiveness. This review summarizes the clinical trials and combination therapy strategies for Trop-2-targeted ADCs, discusses the current challenges, and provides new insights for future advancements.

10.
Comput Med Imaging Graph ; 115: 102386, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38718562

RESUMO

A late post-traumatic seizure (LPTS), a consequence of traumatic brain injury (TBI), can potentially evolve into a lifelong condition known as post-traumatic epilepsy (PTE). Presently, the mechanism that triggers epileptogenesis in TBI patients remains elusive, inspiring the epilepsy community to devise ways to predict which TBI patients will develop PTE and to identify potential biomarkers. In response to this need, our study collected comprehensive, longitudinal multimodal data from 48 TBI patients across multiple participating institutions. A supervised binary classification task was created, contrasting data from LPTS patients with those without LPTS. To accommodate missing modalities in some subjects, we took a two-pronged approach. Firstly, we extended a graphical model-based Bayesian estimator to directly classify subjects with incomplete modality. Secondly, we explored conventional imputation techniques. The imputed multimodal information was then combined, following several fusion and dimensionality reduction techniques found in the literature, and subsequently fitted to a kernel- or a tree-based classifier. For this fusion, we proposed two new algorithms: recursive elimination of correlated components (RECC) that filters information based on the correlation between the already selected features, and information decomposition and selective fusion (IDSF), which effectively recombines information from decomposed multimodal features. Our cross-validation findings showed that the proposed IDSF algorithm delivers superior performance based on the area under the curve (AUC) score. Ultimately, after rigorous statistical comparisons and interpretable machine learning examination using Shapley values of the most frequently selected features, we recommend the two following magnetic resonance imaging (MRI) abnormalities as potential biomarkers: the left anterior limb of internal capsule in diffusion MRI (dMRI), and the right middle temporal gyrus in functional MRI (fMRI).


Assuntos
Biomarcadores , Lesões Encefálicas Traumáticas , Aprendizado de Máquina , Neuroimagem , Humanos , Lesões Encefálicas Traumáticas/diagnóstico por imagem , Lesões Encefálicas Traumáticas/complicações , Neuroimagem/métodos , Masculino , Feminino , Imageamento por Ressonância Magnética/métodos , Adulto , Algoritmos , Epilepsia Pós-Traumática/diagnóstico por imagem , Epilepsia Pós-Traumática/etiologia , Imagem Multimodal/métodos , Convulsões/diagnóstico por imagem , Teorema de Bayes , Pessoa de Meia-Idade
11.
BMC Bioinformatics ; 25(1): 149, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38609844

RESUMO

BACKGROUND: Biomarker discovery is a challenging task due to the massive search space. Quantum computing and quantum Artificial Intelligence (quantum AI) can be used to address the computational problem of biomarker discovery from genetic data. METHOD: We propose a Quantum Neural Networks architecture to discover genetic biomarkers for input activation pathways. The Maximum Relevance-Minimum Redundancy criteria score biomarker candidate sets. Our proposed model is economical since the neural solution can be delivered on constrained hardware. RESULTS: We demonstrate the proof of concept on four activation pathways associated with CTLA4, including (1) CTLA4-activation stand-alone, (2) CTLA4-CD8A-CD8B co-activation, (3) CTLA4-CD2 co-activation, and (4) CTLA4-CD2-CD48-CD53-CD58-CD84 co-activation. CONCLUSION: The model indicates new genetic biomarkers associated with the mutational activation of CLTA4-associated pathways, including 20 genes: CLIC4, CPE, ETS2, FAM107A, GPR116, HYOU1, LCN2, MACF1, MT1G, NAPA, NDUFS5, PAK1, PFN1, PGAP3, PPM1G, PSMD8, RNF213, SLC25A3, UBA1, and WLS. We open source the implementation at: https://github.com/namnguyen0510/Biomarker-Discovery-with-Quantum-Neural-Networks .


Assuntos
Inteligência Artificial , Metodologias Computacionais , Antígeno CTLA-4/genética , Teoria Quântica , Redes Neurais de Computação
12.
Front Genet ; 15: 1242636, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38633407

RESUMO

Allogeneic hematopoietic cell transplantation (HCT) is used to treat many blood-based disorders and malignancies, however it can also result in serious adverse events, such as the development of acute graft-versus-host disease (aGVHD). This study aimed to develop a donor-specific epigenetic classifier to reduce incidence of aGVHD by improving donor selection. Genome-wide DNA methylation was assessed in a discovery cohort of 288 HCT donors selected based on recipient aGVHD outcome; this cohort consisted of 144 cases with aGVHD grades III-IV and 144 controls with no aGVHD. We applied a machine learning algorithm to identify CpG sites predictive of aGVHD. Receiver operating characteristic (ROC) curve analysis of these sites resulted in a classifier with an encouraging area under the ROC curve (AUC) of 0.91. To test this classifier, we used an independent validation cohort (n = 288) selected using the same criteria as the discovery cohort. Attempts to validate the classifier failed with the AUC falling to 0.51. These results indicate that donor DNA methylation may not be a suitable predictor of aGVHD in an HCT setting involving unrelated donors, despite the initial promising results in the discovery cohort. Our work highlights the importance of independent validation of machine learning classifiers, particularly when developing classifiers intended for clinical use.

13.
Comput Biol Med ; 174: 108346, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38581999

RESUMO

Non-Communicable Diseases (NCDs) significantly impact global health, contributing to over 70% of premature deaths, as reported by the World Health Organization (WHO). These diseases have complex and multifactorial origins, involving genetic, epigenetic, environmental and lifestyle factors. While Genome-Wide Association Study (GWAS) is widely recognized as a valuable tool for identifying variants associated with complex phenotypes; the multifactorial nature of NCDs necessitates a more comprehensive exploration, encompassing not only the genetic but also the epigenetic aspect. For this purpose, we employed a bioinformatics-multiomics approach to examine the genetic and epigenetic characteristics of NCDs (i.e. colorectal cancer, coronary atherosclerosis, squamous cell lung cancer, psoriasis, type 2 diabetes, and multiple sclerosis), aiming to identify novel biomarkers for diagnosis and prognosis. Leveraging GWAS summary statistics, we pinpointed Single Nucleotide Polymorphisms (SNPs) independently associated with each NCD. Subsequently, we identified genes linked to cell cycle, inflammation and oxidative stress mechanisms, revealing shared genes across multiple diseases, suggesting common functional pathways. From an epigenetic perspective, we identified microRNAs (miRNAs) with regulatory functions targeting these genes of interest. Our findings underscore critical genetic pathways implicated in these diseases. In colorectal cancer, the dysregulation of the "Cytokine Signaling in Immune System" pathway, involving LAMA5 and SMAD7, regulated by Hsa-miR-21-5p, Hsa-miR-103a-3p, and Hsa-miR-195-5p, emerged as pivotal. In coronary atherosclerosis, the pathway associated with "binding of TCF/LEF:CTNNB1 to target gene promoters" displayed noteworthy implications, with the MYC factor controlled by Hsa-miR-16-5p as a potential regulatory factor. Squamous cell lung carcinoma analysis revealed significant pathways such as "PTK6 promotes HIF1A stabilization," regulated by Hsa-let-7b-5p. In psoriasis, the "Endosomal/Vacuolar pathway," involving HLA-C and Hsa-miR-148a-3p and Hsa-miR-148b-3p, was identified as crucial. Type 2 Diabetes implicated the "Regulation of TP53 Expression" pathway, controlled by Hsa-miR-106a-5p and Hsa-miR-106b-5p. In conclusion, our study elucidates the genetic framework and molecular mechanisms underlying NCDs, offering crucial insights into potential genetic/epigenetic biomarkers for diagnosis and prognosis. The specificity of pathways and related miRNAs in different pathologies highlights promising candidates for further clinical validation, with the potential to advance personalized treatments and alleviate the global burden of NCDs.


Assuntos
Inflamação , MicroRNAs , Doenças não Transmissíveis , Estresse Oxidativo , Polimorfismo de Nucleotídeo Único , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Inflamação/genética , Estresse Oxidativo/genética , Estudo de Associação Genômica Ampla , Transdução de Sinais/genética , Epigênese Genética
14.
Comput Biol Med ; 174: 108407, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38603902

RESUMO

Feature selection and machine learning algorithms can be used to analyze Single Nucleotide Polymorphisms (SNPs) data and identify potential disease biomarkers. Reproducibility of identified biomarkers is critical for them to be useful for clinical research; however, genotyping platforms and selection criteria for individuals to be genotyped affect the reproducibility of identified biomarkers. To assess biomarkers reproducibility, we collected five SNPs datasets from the database of Genotypes and Phenotypes (dbGaP) and explored several data integration strategies. While combining datasets can lead to a reduction in classification accuracy, it has the potential to improve the reproducibility of potential biomarkers. We evaluated the agreement among different strategies in terms of the SNPs that were identified as potential Parkinson's disease (PD) biomarkers. Our findings indicate that, on average, 93% of the SNPs identified in a single dataset fail to be identified in other datasets. However, through dataset integration, this lack of replication is reduced to 62%. We discovered fifty SNPs that were identified at least twice, which could potentially serve as novel PD biomarkers. These SNPs are indirectly linked to PD in the literature but have not been directly associated with PD before. These findings open up new potential avenues of investigation.


Assuntos
Biomarcadores , Aprendizado de Máquina , Doença de Parkinson , Polimorfismo de Nucleotídeo Único , Doença de Parkinson/genética , Doença de Parkinson/metabolismo , Humanos , Bases de Dados Genéticas , Reprodutibilidade dos Testes , Marcadores Genéticos/genética
15.
Front Microbiol ; 15: 1348974, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38426064

RESUMO

Background: Colorectal cancer (CRC) is a type of tumor caused by the uncontrolled growth of cells in the mucosa lining the last part of the intestine. Emerging evidence underscores an association between CRC and gut microbiome dysbiosis. The high mortality rate of this cancer has made it necessary to develop new early diagnostic methods. Machine learning (ML) techniques can represent a solution to evaluate the interaction between intestinal microbiota and host physiology. Through explained artificial intelligence (XAI) it is possible to evaluate the individual contributions of microbial taxonomic markers for each subject. Our work also implements the Shapley Method Additive Explanations (SHAP) algorithm to identify for each subject which parameters are important in the context of CRC. Results: The proposed study aimed to implement an explainable artificial intelligence framework using both gut microbiota data and demographic information from subjects to classify a cohort of control subjects from those with CRC. Our analysis revealed an association between gut microbiota and this disease. We compared three machine learning algorithms, and the Random Forest (RF) algorithm emerged as the best classifier, with a precision of 0.729 ± 0.038 and an area under the Precision-Recall curve of 0.668 ± 0.016. Additionally, SHAP analysis highlighted the most crucial variables in the model's decision-making, facilitating the identification of specific bacteria linked to CRC. Our results confirmed the role of certain bacteria, such as Fusobacterium, Peptostreptococcus, and Parvimonas, whose abundance appears notably associated with the disease, as well as bacteria whose presence is linked to a non-diseased state. Discussion: These findings emphasizes the potential of leveraging gut microbiota data within an explainable AI framework for CRC classification. The significant association observed aligns with existing knowledge. The precision exhibited by the RF algorithm reinforces its suitability for such classification tasks. The SHAP analysis not only enhanced interpretability but identified specific bacteria crucial in CRC determination. This approach opens avenues for targeted interventions based on microbial signatures. Further exploration is warranted to deepen our understanding of the intricate interplay between microbiota and health, providing insights for refined diagnostic and therapeutic strategies.

16.
Biosystems ; 237: 105163, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38401640

RESUMO

In this paper, we explore the challenges associated with biomarker identification for diagnosis purpose in biomedical experiments, and propose a novel approach to handle the above challenging scenario via the generalization of the Dantzig selector. To improve the efficiency of the regularization method, we introduce a transformation from an inherent nonlinear programming due to its nonlinear link function into a linear programming framework under a reasonable assumption on the logistic probability range. We illustrate the use of our method on an experiment with binary response, showing superior performance on biomarker identification studies when compared to their conventional analysis. Our proposed method does not merely serve as a variable/biomarker selection tool, its ranking of variable importance provides valuable reference information for practitioners to reach informed decisions regarding the prioritization of factors for further investigations.


Assuntos
Biomarcadores , Probabilidade
17.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38348747

RESUMO

Integrating and analyzing multiple omics data sets, including genomics, proteomics and radiomics, can significantly advance researchers' comprehensive understanding of Alzheimer's disease (AD). However, current methodologies primarily focus on the main effects of genetic variation and protein, overlooking non-additive effects such as genotype-protein interaction (GPI) and correlation patterns in brain imaging genetics studies. Importantly, these non-additive effects could contribute to intermediate imaging phenotypes, finally leading to disease occurrence. In general, the interaction between genetic variations and proteins, and their correlations are two distinct biological effects, and thus disentangling the two effects for heritable imaging phenotypes is of great interest and need. Unfortunately, this issue has been largely unexploited. In this paper, to fill this gap, we propose $\textbf{M}$ulti-$\textbf{T}$ask $\textbf{G}$enotype-$\textbf{P}$rotein $\textbf{I}$nteraction and $\textbf{C}$orrelation disentangling method ($\textbf{MT-GPIC}$) to identify GPI and extract correlation patterns between them. To ensure stability and interpretability, we use novel and off-the-shelf penalties to identify meaningful genetic risk factors, as well as exploit the interconnectedness of different brain regions. Additionally, since computing GPI poses a high computational burden, we develop a fast optimization strategy for solving MT-GPIC, which is guaranteed to converge. Experimental results on the Alzheimer's Disease Neuroimaging Initiative data set show that MT-GPIC achieves higher correlation coefficients and classification accuracy than state-of-the-art methods. Moreover, our approach could effectively identify interpretable phenotype-related GPI and correlation patterns in high-dimensional omics data sets. These findings not only enhance the diagnostic accuracy but also contribute valuable insights into the underlying pathogenic mechanisms of AD.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Doença de Alzheimer/patologia , Multiômica , Genótipo , Neuroimagem/métodos , Fenótipo , Encéfalo/diagnóstico por imagem , Encéfalo/patologia
18.
Int J Mol Sci ; 25(3)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38338932

RESUMO

Advancing the domain of biomedical investigation, integrated multi-omics data have shown exceptional performance in elucidating complex human diseases. However, as the variety of omics information expands, precisely perceiving the informativeness of intra- and inter-omics becomes challenging due to the intricate interrelations, thus presenting significant challenges in the integration of multi-omics data. To address this, we introduce a novel multi-omics integration approach, referred to as TEMINET. This approach enhances diagnostic prediction by leveraging an intra-omics co-informative representation module and a trustworthy learning strategy used to address inter-omics fusion. Considering the multifactorial nature of complex diseases, TEMINET utilizes intra-omics features to construct disease-specific networks; then, it applies graph attention networks and a multi-level framework to capture more collective informativeness than pairwise relations. To perceive the contribution of co-informative representations within intra-omics, we designed a trustworthy learning strategy to identify the reliability of each omics in integration. To integrate inter-omics information, a combined-beliefs fusion approach is deployed to harmonize the trustworthy representations of different omics types effectively. Our experiments across four different diseases using mRNA, methylation, and miRNA data demonstrate that TEMINET achieves advanced performance and robustness in classification tasks.


Assuntos
MicroRNAs , Multiômica , Humanos , Reprodutibilidade dos Testes , Aprendizagem , MicroRNAs/genética , Processamento de Proteína Pós-Traducional
19.
Ther Adv Med Oncol ; 16: 17588359231220510, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38188465

RESUMO

Background: CTLA-4 impedes the immune system's antitumor response. There are two Food and Drug Administration-approved anti-CTLA-4 agents - ipilimumab and tremelimumab - both used together with anti-PD-1/PD-L1 agents. Objective: To assess the prognostic implications and immunologic correlates of high CTLA-4 in tumors of patients on immunotherapy and those on non-immunotherapy treatments. Design/methods: We evaluated RNA expression levels in a clinical-grade laboratory and clinical correlates of CTLA-4 and other immune checkpoints in 514 tumors, including 489 patients with advanced/metastatic cancers and full outcome annotation. A reference population (735 tumors; 35 histologies) was used to normalize and rank transcript abundance (0-100 percentile) to internal housekeeping gene profiles. Results: The most common tumor types were colorectal (140/514, 27%), pancreatic (55/514, 11%), breast (49/514, 10%), and ovarian cancers (43/514, 8%). Overall, 87 of 514 tumors (16.9%) had high CTLA-4 transcript expression (⩾75th percentile rank). Cancers with the largest proportion of high CTLA-4 transcripts were cervical cancer (80% of patients), small intestine cancer (33.3%), and melanoma (33.3%). High CTLA-4 RNA independently/significantly correlated with high PD-1, PD- L2, and LAG3 RNA levels (and with high PD-L1 in univariate analysis). High CTLA-4 RNA expression was not correlated with survival from the time of metastatic disease [N = 272 patients who never received immune checkpoint inhibitors (ICIs)]. However, in 217 patients treated with ICIs (mostly anti-PD-1/anti-PD- L1), progression-free survival (PFS) and overall survival (OS) were significantly longer among patients with high versus non-high CTLA-4 expression [hazard ratio, 95% confidence interval: 0.6 (0.4-0.9) p = 0.008; and 0.5 (0.3-0.8) p = 0.002, respectively]; results were unchanged when 18 patients who received anti-CTLA-4 were omitted. Patients whose tumors had high CTLA-4 and high PD-L1 did best; those with high PD-L1 but non-high CTLA-4 and/or other expression patterns had poorer outcomes for PFS (p = 0.004) and OS (p = 0.009) after immunotherapy. Conclusion: High CTLA-4, especially when combined with high PD-L1 transcript expression, was a significant positive predictive biomarker for better outcomes (PFS and OS) in patients on immunotherapy.


High CTLA-4 expression and immunotherapy outcome High CTLA-4 expression was not a prognostic factor for survival in patients not receiving ICIs but was a significant positive predictive biomarker for better outcome (PFS and OS) in patients on immunotherapy, perhaps because it correlated with expression of other checkpoints such as PD-1 and PD-L2.

20.
Drug Alcohol Depend ; 255: 111066, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38217979

RESUMO

BACKGROUND: Identifying co-occurring mental disorders and elevated risk is vital for optimization of healthcare processes. In this study, we will use DeepBiomarker2, an updated version of our deep learning model to predict the adverse events among patients with comorbid post-traumatic stress disorder (PTSD) and alcohol use disorder (AUD), a high-risk population. METHODS: We analyzed electronic medical records of 5565 patients from University of Pittsburgh Medical Center to predict adverse events (opioid use disorder, suicide related events, depression, and death) within 3 months at any encounter after the diagnosis of PTSD+AUD by using DeepBiomarker2. We integrated multimodal information including: lab tests, medications, co-morbidities, individual and neighborhood level social determinants of health (SDoH), psychotherapy and veteran data. RESULTS: DeepBiomarker2 achieved an area under the receiver operator curve (AUROC) of 0.94 on the prediction of adverse events among those PTSD+AUD patients. Medications such as vilazodone, dronabinol, tenofovir, suvorexant, modafinil, and lamivudine showed potential for risk reduction. SDoH parameters such as cognitive behavioral therapy and trauma focused psychotherapy lowered risk while active veteran status, income segregation, limited access to parks and greenery, low Gini index, limited English-speaking capacity, and younger patients increased risk. CONCLUSIONS: Our improved version of DeepBiomarker2 demonstrated its capability of predicting multiple adverse event risk with high accuracy and identifying potential risk and beneficial factors.


Assuntos
Alcoolismo , Aprendizado Profundo , Transtornos de Estresse Pós-Traumáticos , Humanos , Transtornos de Estresse Pós-Traumáticos/diagnóstico , Transtornos de Estresse Pós-Traumáticos/epidemiologia , Transtornos de Estresse Pós-Traumáticos/psicologia , Alcoolismo/complicações , Alcoolismo/diagnóstico , Alcoolismo/epidemiologia , Registros Eletrônicos de Saúde , Comorbidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA