RESUMEN
Recent advancements in Natural Language Processing (NLP) have been significantly driven by the development of Large Language Models (LLMs), representing a substantial leap in language-based technology capabilities. These models, built on sophisticated deep learning architectures, typically transformers, are characterized by billions of parameters and extensive training data, enabling them to achieve high accuracy across various tasks. The transformer architecture of LLMs allows them to effectively handle context and sequential information, which is crucial for understanding and generating human language. Beyond traditional NLP applications, LLMs have shown significant promise in bioinformatics, transforming the field by addressing challenges associated with large and complex biological datasets. In genomics, proteomics, and personalized medicine, LLMs facilitate identifying patterns, predicting protein structures, or understanding genetic variations. This capability is crucial, e.g., for advancing drug discovery, where accurate prediction of molecular interactions is essential. This review discusses the current trends in LLMs research and their potential to revolutionize the field of bioinformatics and accelerate novel discoveries in the life sciences.
RESUMEN
The identification of cancer subtypes is crucial for advancing precision medicine, as it facilitates the development of more effective and personalized treatment and prevention strategies. With the development of high-throughput sequencing technologies, researchers now have access to a wealth of multi-omics data from cancer patients, making computational cancer subtyping increasingly feasible. One of the main challenges in integrating multi-omics data is handling missing data, since not all biomolecules are consistently measured across all samples. Current computational models based on multi-omics data for cancer subtyping often struggle with the challenge of weakly paired omics data. To address this challenge, we propose a novel unsupervised cancer subtyping model named Subtype-MVCC. This model leverages graph convolutional networks to extract and represent low-dimensional features from each omics data type, using intra-view and inter-view contrastive learning approaches. By incorporating a weighted average fusion strategy to unify the dimension of each sample, Subtype-MVCC effectively handles weakly paired multi-omics datasets. Comprehensive evaluations on established benchmark datasets demonstrate that Subtype-MVCC outperforms nine leading models in this domain. Additionally, simulations with varying levels of missing data highlight the model's robust performance in handling weakly paired omics data. The clinical relevance and survival outcomes associated with the identified subtypes further validate the interpretability and reliability of the clustering results produced by Subtype-MVCC.
RESUMEN
Background: Cancer survival prediction is vital in improving patients' prospects and recommending therapies. Understanding the molecular behavior of cancer can be enhanced through the integration of multi-omics data, including mRNA, miRNA, and DNA methylation data. In light of these multi-omics data, we proposed a graph attention network (GAT) model in this study to predict the survival of non-small cell lung cancer (NSCLC). Methods: The different omics data were obtained from The Cancer Genome Atlas (TCGA) and preprocessed and combined into a single dataset using the sample ID. We used the chi-square test to select the most significant features to be used in our model. We used the synthetic minority oversampling technique (SMOTE) to balance the dataset and the concordance index (C-index) to measure the performance of our model on different combinations of omics data. Results: Our model demonstrated superior performance, with the highest value of the C-index obtained when we used both mRNA and miRNA data. This demonstrates that the multi-omics approach could be effective in predicting survival. Further pathway analysis conducted with KEGG showed that our GAT model provided high weights to the features that are associated with the viral entry pathways, such as the Epstein-Barr virus and Influenza A pathways, which are involved in lung cancer development. From our findings, it can be observed that the proposed GAT model leads to a significantly improved prediction of survival by exploiting the strengths of multiple omics datasets and the findings from the enriched pathways. Our GAT model outperforms other state-of-the-art methods that are used for NSCLC prediction. Conclusions: In this study, we developed a new model for the survival prediction of NSCLC using the GAT based on multi-omics data. Our model showed outstanding predictive values, and the KEGG analysis of the selected significant features showed that they were implicated in pivotal biological processes underlying pathways such as Influenza A and the Epstein-Barr virus infection, which are linked to lung cancer progression.
RESUMEN
Introduction: Respiratory viral infections (RVIs) are a major global contributor to morbidity and mortality. The susceptibility and outcome of RVIs are strongly age-dependent and show considerable inter-population differences, pointing to genetically and/or environmentally driven developmental variability. The factors determining the age-dependency and shaping the age-related changes of human anti-RVI immunity after birth are still elusive. Methods: We are conducting a prospective birth cohort study aiming at identifying endogenous and environmental factors associated with the susceptibility to RVIs and their impact on cellular and humoral immune responses against the influenza A virus (IAV), respiratory syncytial virus (RSV) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The MIAI birth cohort enrolls healthy, full-term neonates born at the University Hospital Würzburg, Germany, with follow-up at four defined time-points during the first year of life. At each study visit, clinical metadata including diet, lifestyle, sociodemographic information, and physical examinations, are collected along with extensive biomaterial sampling. Biomaterials are used to generate comprehensive, integrated multi-omics datasets including transcriptomic, epigenomic, proteomic, metabolomic and microbiomic methods. Discussion: The results are expected to capture a holistic picture of the variability of immune trajectories with a focus on cellular and humoral key players involved in the defense of RVIs and the impact of host and environmental factors thereon. Thereby, MIAI aims at providing insights that allow unraveling molecular mechanisms that can be targeted to promote the development of competent anti-RVI immunity in early life and prevent severe RVIs. Clinical trial registration: https://drks.de/search/de/trial/, identifier DRKS00034278.
Asunto(s)
COVID-19 , Gripe Humana , Infecciones por Virus Sincitial Respiratorio , Infecciones del Sistema Respiratorio , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Cohorte de Nacimiento , COVID-19/inmunología , Alemania/epidemiología , Gripe Humana/inmunología , Estudios Prospectivos , Infecciones del Sistema Respiratorio/inmunología , Infecciones del Sistema Respiratorio/virología , Infecciones por Virus Sincitial Respiratorio/inmunología , Proyectos de InvestigaciónRESUMEN
Exclusive enteral nutrition (EEN) is a first-line therapy for pediatric Crohn's disease (CD), but protective mechanisms remain unknown. We established a prospective pediatric cohort to characterize the function of fecal microbiota and metabolite changes of treatment-naive CD patients in response to EEN (German Clinical Trials DRKS00013306). Integrated multi-omics analysis identified network clusters from individually variable microbiome profiles, with Lachnospiraceae and medium-chain fatty acids as protective features. Bioorthogonal non-canonical amino acid tagging selectively identified bacterial species in response to medium-chain fatty acids. Metagenomic analysis identified high strain-level dynamics in response to EEN. Functional changes in diet-exposed fecal microbiota were further validated using gut chemostat cultures and microbiota transfer into germ-free Il10-deficient mice. Dietary model conditions induced individual patient-specific strain signatures to prevent or cause inflammatory bowel disease (IBD)-like inflammation in gnotobiotic mice. Hence, we provide evidence that EEN therapy operates through explicit functional changes of temporally and individually variable microbiome profiles.
RESUMEN
The high heterogeneity within and between breast cancer patients complicates treatment determination and prognosis assessment. Treatment decision-making is influenced by various factors, such as tumor subtype, histological grade, and genotype, necessitating personalized treatment strategies. Prognostic outcomes vary significantly depending on patient-specific conditions. As a critical branch of artificial intelligence, machine learning efficiently handles large datasets and automates decision-making processes. The introduction of machine learning offers new solutions for breast cancer treatment selection and prognosis assessment. In the field of cancer therapy, traditional methods for predicting treatment and survival outcomes often rely on single or few biomarkers, limiting their ability to capture the complexity of biological processes comprehensively. Machine learning analyzes patients' multi-omic data and the intricate patterns of variations during cancer initiation and progression to predict patients' survival and treatment outcomes. Consequently, it facilitates the selection of appropriate therapeutic interventions to implement early intervention and improve treatment efficacy for patients. Here, we first introduce common machine learning methods, and then elaborate on the application of machine learning in the field of survival prediction and prognosis from two aspects: evaluating survival and predicting treatment outcomes for breast cancer patients. The aim is to provide breast cancer patients with precise treatment strategies to improve therapeutic outcomes and quality of life.
Asunto(s)
Neoplasias de la Mama , Aprendizaje Automático , Humanos , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/terapia , Neoplasias de la Mama/genética , Femenino , Pronóstico , Resultado del Tratamiento , MultiómicaRESUMEN
Accurately predicting cancer driver genes remains a formidable challenge amidst the burgeoning volume and intricacy of cancer genomic data. In this investigation, we propose HGTDG, an innovative heterogeneous graph transformer framework tailored for precisely predicting cancer driver genes and exploring downstream tasks. A heterogeneous graph construction module is central to the framework, which assembles a gene-protein heterogeneous network leveraging the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interactions sourced from the STRING (search tool for recurring instances of neighboring genes) database. Moreover, our framework introduces a pioneering heterogeneous graph transformer module, harnessing multi-head attention mechanisms for nuanced node embedding. This transformative module proficiently captures distinct representations for both nodes and edges, thereby enriching the model's predictive capacity. Subsequently, the generated node embeddings are seamlessly integrated into a classification module, facilitating the discrimination between driver and non-driver genes. Our experimental findings evince the superiority of HGTDG over existing methodologies, as evidenced by the enhanced performance metrics, including the area under the receiver operating characteristic curves (AUROC) and the area under the precision-recall curves (AUPRC). Furthermore, the downstream analysis utilizing the newly identified cancer driver genes underscores the efficacy and versatility of our proposed framework.
RESUMEN
BACKGROUND: High-dimensional omics data integration has emerged as a prominent avenue within the healthcare industry, presenting substantial potential to improve predictive models. However, the data integration process faces several challenges, including data heterogeneity, priority sequence in which data blocks are prioritized for rendering predictive information contained in multiple blocks, assessing the flow of information from one omics level to the other and multicollinearity. METHODS: We propose the Priority-Elastic net algorithm, a hierarchical regression method extending Priority-Lasso for the binary logistic regression model by incorporating a priority order for blocks of variables while fitting Elastic-net models sequentially for each block. The fitted values from each step are then used as an offset in the subsequent step. Additionally, we considered the adaptive elastic-net penalty within our priority framework to compare the results. RESULTS: The Priority-Elastic net and Priority-Adaptive Elastic net algorithms were evaluated on a brain tumor dataset available from The Cancer Genome Atlas (TCGA), accounting for transcriptomics, proteomics, and clinical information measured over two glioma types: Lower-grade glioma (LGG) and glioblastoma (GBM). CONCLUSION: Our findings suggest that the Priority-Elastic net is a highly advantageous choice for a wide range of applications. It offers moderate computational complexity, flexibility in integrating prior knowledge while introducing a hierarchical modeling perspective, and, importantly, improved stability and accuracy in predictions, making it superior to the other methods discussed. This evolution marks a significant step forward in predictive modeling, offering a sophisticated tool for navigating the complexities of multi-omics datasets in pursuit of precision medicine's ultimate goal: personalized treatment optimization based on a comprehensive array of patient-specific data. This framework can be generalized to time-to-event, Cox proportional hazards regression and multicategorical outcomes. A practical implementation of this method is available upon request in R script, complete with an example to facilitate its application.
RESUMEN
Despite recent advances in chronic obstructive pulmonary disease (COPD) research, few studies have identified the potential therapeutic targets systematically by integrating multiple-omics datasets. This project aimed to develop a systems biology pipeline to identify biologically relevant genes and potential therapeutic targets that could be exploited to discover novel COPD treatments via drug repurposing or de novo drug discovery. A computational method was implemented by integrating multi-omics COPD data from unpaired human samples of more than half a million subjects. The outcomes from genome, transcriptome, proteome, and metabolome COPD studies were included, followed by an in silico interactome and drug-target information analysis. The potential candidate genes were ranked by a distance-based network computational model. Ninety-two genes were identified as COPD signature genes based on their overall proximity to signature genes on all omics levels. They are genes encoding proteins involved in extracellular matrix structural constituent, collagen binding, protease binding, actin-binding proteins, and other functions. Among them, 70 signature genes were determined to be druggable targets. The in silico validation identified that the knockout or over-expression of SPP1, APOA1, CTSD, TIMP1, RXFP1, and SMAD3 genes may drive the cell transcriptomics to a status similar to or contrasting with COPD. While some genes identified in our pipeline have been previously associated with COPD pathology, others represent possible new targets for COPD therapy development. In conclusion, we have identified promising therapeutic targets for COPD. This hypothesis-generating pipeline was supported by unbiased information from available omics datasets and took into consideration disease relevance and development feasibility.
Asunto(s)
Reposicionamiento de Medicamentos , Enfermedad Pulmonar Obstructiva Crónica , Enfermedad Pulmonar Obstructiva Crónica/tratamiento farmacológico , Enfermedad Pulmonar Obstructiva Crónica/genética , Enfermedad Pulmonar Obstructiva Crónica/metabolismo , Humanos , Reposicionamiento de Medicamentos/métodos , Transcriptoma , Biología Computacional/métodos , Proteoma/metabolismo , Proteína smad3/metabolismo , Proteína smad3/genética , Genómica/métodos , Inhibidor Tisular de Metaloproteinasa-1/genética , Inhibidor Tisular de Metaloproteinasa-1/metabolismo , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , MultiómicaRESUMEN
Intrahepatic cholangiocarcinoma (ICC) is a highly heterogeneous malignancy. The reasons behind the global rise in the incidence of ICC remain unclear, and there exists limited knowledge regarding the immune cells within the tumor microenvironment (TME). In this study, a more comprehensive analysis of multi-omics data was performed using machine learning methods. The study found that the immunoactivity of B cells, macrophages, and T cells in the infiltrating immune cells of ICC exhibits a significantly higher level of immunoactivity in comparison to other immune cells. During the immune sensing and response, the effect of antigen-presenting cells (APCs) such as B cells and macrophages on activating NK cells was weakened, while the effect of activating T cells became stronger. Simultaneously, four distinct subpopulations, namely BLp, MacrophagesLp, BHn, and THn, have been identified from the infiltrating immune cells, and their corresponding immune-related marker genes have been identified. The immune sensing and response model of ICC has been revised and constructed based on our current comprehension. This study not only helps to deepen the understanding the heterogeneity of infiltrating immune cells in ICC, but also may provide valuable insights into the diagnosis, evaluation, treatment, and prognosis of ICC.
RESUMEN
Multiple types of omics data contain a wealth of biomedical information which reflect different aspects of clinical samples. Multi-omics integrated analysis is more likely to lead to more accurate clinical decisions. Existing cancer diagnostic methods based on multi-omics data integration mainly focus on the classification accuracy of the model, while neglecting the interpretability of the internal mechanism and the reliability of the results, which are crucial in specific domains such as precision medicine and the life sciences. To overcome this limitation, we propose a trustworthy multi-omics dynamic learning framework (TMODINET) for cancer diagnostic. The framework employs multi-omics adaptive dynamic learning to process each sample to provide patient-centered personality diagnosis by using self-attentional learning of features and modalities. To characterize the correlation between samples well, we introduce a graph dynamic learning method which can adaptively adjust the graph structure according to the specific classification results for specific graph convolutional networks (GCN) learning. Moreover, we utilize an uncertainty mechanism by employing Dirichlet distribution and Dempster-Shafer theory to obtain uncertainty and integrate multi-omics data at the decision level, ensuring trustworthy for cancer diagnosis. Extensive experiments on four real-world multimodal medical datasets are conducted. Compared to state-of-the-art methods, the superior performance and trustworthiness of our proposed algorithm are clearly validated. Our model has great potential for clinical diagnosis.
RESUMEN
BACKGROUND: Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions. METHODS: In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives. RESULTS: Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures. CONCLUSIONS: Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.
Asunto(s)
Benchmarking , Genómica , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/mortalidad , Análisis de Supervivencia , Pronóstico , MultiómicaRESUMEN
Transcriptional regulatory networks (TRNs) associated with recombinant protein (rProt) synthesis in Yarrowia lipolytica are still under-described. Yet, it is foreseen that skillful manipulation with TRNs would enable global fine-tuning of the host strain's metabolism towards a high-level-producing phenotype. Our previous studies investigated the transcriptomes of Y. lipolytica strains overproducing biochemically different rProts and the functional impact of transcription factors (TFs) overexpression (OE) on rProt synthesis capacity in this species. Hence, much knowledge has been accumulated and deposited in public repositories. In this study, we combined both biological datasets and enriched them with further experimental data to investigate an interplay between TFs and rProts synthesis in Y. lipolytica at transcriptional and functional levels. Technically, the RNAseq datasets were extracted and re-analyzed for the TFs' expression profiles. Of the 140 TFs in Y. lipolytica, 87 TF-encoding genes were significantly deregulated in at least one of the strains. The expression profiles were juxtaposed against the rProt amounts from 125 strains co-overexpressing TF and rProt. In addition, several strains bearing knock-outs (KOs) in the TF loci were analyzed to get more insight into their actual involvement in rProt synthesis. Different profiles of the TFs' transcriptional deregulation and the impact of their OE or KO on rProts synthesis were observed, and new engineering targets were pointed.
Asunto(s)
Regulación Fúngica de la Expresión Génica , Proteínas Recombinantes , Factores de Transcripción , Yarrowia , Yarrowia/genética , Yarrowia/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Proteínas Recombinantes/biosíntesis , Redes Reguladoras de Genes , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Transcriptoma , Perfilación de la Expresión Génica , Transcripción GenéticaRESUMEN
BACKGROUND: Accurate identification of cancer subtypes is crucial for disease prognosis evaluation and personalized patient management. Recent advances in computational methods have demonstrated that multi-omics data provides valuable insights into tumor molecular subtyping. However, the high dimensionality and small sample size of the data may result in ambiguous and overlapping cancer subtypes during clustering. In this study, we propose a novel contrastive-learning-based approach to address this issue. The proposed end-to-end deep learning method can extract crucial information from the multi-omics features by self-supervised learning for patient clustering. RESULTS: By applying our method to nine public cancer datasets, we have demonstrated superior performance compared to existing methods in separating patients with different survival outcomes (p < 0.05). To further evaluate the impact of various omics data on cancer survival, we developed an XGBoost classification model and found that mRNA had the highest importance score, followed by DNA methylation and miRNA. In the presented case study, our method successfully clustered subtypes and identified 14 cancer-related genes, of which 12 (85.7%) were validated through literature review. CONCLUSIONS: Our findings demonstrate that our method is capable of identifying cancer subtypes that are both statistically and biologically significant. The code about COLCS is given at: https://github.com/Mercuriiio/COLCS .
Asunto(s)
Aprendizaje Profundo , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/clasificación , Metilación de ADN , Redes Neurales de la Computación , Biología Computacional/métodos , MicroARNs/genética , Análisis por Conglomerados , MultiómicaRESUMEN
Microbial electrosynthesis for CO2 utilization (MESCU) producing valuable chemicals with high energy density has garnered attention due to its long-term stability and high coulombic efficiency. The data-driven approaches offer a promising avenue by leveraging existing data to uncover the underlying patterns. This comprehensive review firstly uncovered the potentials of utilizing data-driven approaches to enhance high-value conversion of CO2 via MESCU. Firstly, critical challenges of MESCU advancing have been identified, including reactor configuration, cathode design, and microbial analysis. Subsequently, the potential of data-driven approaches to tackle the corresponding challenges, encompassing the identification of pivotal parameters governing reactor setup and cathode design, alongside the decipheration of omics data derived from microbial communities, have been discussed. Correspondingly, the future direction of data-driven approaches in assisting the application of MESCU has been addressed. This review offers guidance and theoretical support for future data-driven applications to accelerate MESCU research and potential industrialization.
Asunto(s)
Reactores Biológicos , Dióxido de Carbono , Dióxido de Carbono/metabolismo , Electrodos , Fuentes de Energía BioeléctricaRESUMEN
The integration of multi-omics data offers a robust approach to understanding the complexity of diseases by combining information from various biological levels, such as genomics, transcriptomics, proteomics, and metabolomics. This integrated approach is essential for a comprehensive understanding of disease mechanisms and for developing more effective diagnostic and therapeutic strategies. Nevertheless, most current methodologies fail to effectively extract both shared and specific representations from omics data. This study introduces MOSDNET, a multi-omics classification framework that effectively extracts shared and specific representations. This framework leverages Simplified Multi-view Deep Discriminant Representation Learning (S-MDDR) and Dynamic Edge GCN (DEGCN) to enhance the accuracy and efficiency of disease classification. Initially, MOSDNET utilizes S-MDDR to establish similarity and orthogonal constraints for extracting these representations, which are then concatenated to integrate the multi-omics data. Subsequently, MOSDNET constructs a comprehensive view of the sample data by employing patient similarity networks. By incorporating similarity networks into DEGCN, MOSDNET learns intricate network structures and node representations, which enables superior classification outcomes. MOSDNET is trained through a multitask learning approach, effectively leveraging the complementary knowledge from both the data integration and classification components. After conducting extensive comparative experiments, we have conclusively demonstrated that MOSDNET outperforms leading state-of-the-art multi-omics classification models in terms of classification accuracy. Simultaneously, we employ MOSDNET to identify pivotal biomarkers within the multi-omics data, providing insights into disease etiology and progression.
Asunto(s)
Metabolómica , Humanos , Metabolómica/métodos , Genómica/métodos , Proteómica/métodos , Aprendizaje Profundo , Redes Neurales de la Computación , MultiómicaRESUMEN
Colorectal cancer is one of the most common cancers around the world, which is a severe threat to people's health. SMAD4 belongs to the dwarfin/SMAD family, which plays a crucial role in TGF-ß and BMP signal pathways. As the molecular characterization of colon cancer patients following SMAD4 mutations remains unclear, we integrated multi-omics data of SMAD4 mutant patients to reveal the profile of molecular characterization of SMAD4 mutation. A missense mutation is the most common mutant type of SMAD4. Patients with SMAD4 mutation had worse survival. Tumor tissues from patients carrying the SMAD4 mutation showed a reduction in various immune cells, such as CD4 + memory T cells and memory B cells. Many differential genes were identified compared to the SMAD4 mutation-free group and could be significantly enriched for tumor- and immune-related signaling pathways. In addition, the mutant group had different drug sensitivities than the non-mutant group.
RESUMEN
Synthetic data generation has emerged as a promising solution to overcome the challenges which are posed by data scarcity and privacy concerns, as well as, to address the need for training artificial intelligence (AI) algorithms on unbiased data with sufficient sample size and statistical power. Our review explores the application and efficacy of synthetic data methods in healthcare considering the diversity of medical data. To this end, we systematically searched the PubMed and Scopus databases with a great focus on tabular, imaging, radiomics, time-series, and omics data. Studies involving multi-modal synthetic data generation were also explored. The type of method used for the synthetic data generation process was identified in each study and was categorized into statistical, probabilistic, machine learning, and deep learning. Emphasis was given to the programming languages used for the implementation of each method. Our evaluation revealed that the majority of the studies utilize synthetic data generators to: (i) reduce the cost and time required for clinical trials for rare diseases and conditions, (ii) enhance the predictive power of AI models in personalized medicine, (iii) ensure the delivery of fair treatment recommendations across diverse patient populations, and (iv) enable researchers to access high-quality, representative multimodal datasets without exposing sensitive patient information, among others. We underline the wide use of deep learning based synthetic data generators in 72.6 % of the included studies, with 75.3 % of the generators being implemented in Python. A thorough documentation of open-source repositories is finally provided to accelerate research in the field.
RESUMEN
Introduction: Developing effective breast cancer survival prediction models is critical to breast cancer prognosis. With the widespread use of next-generation sequencing technologies, numerous studies have focused on survival prediction. However, previous methods predominantly relied on single-omics data, and survival prediction using multi-omics data remains a significant challenge. Methods: In this study, considering the similarity of patients and the relevance of multi-omics data, we propose a novel multi-omics stacked fusion network (MSFN) based on a stacking strategy to predict the survival of breast cancer patients. MSFN first constructs a patient similarity network (PSN) and employs a residual graph neural network (ResGCN) to obtain correlative prognostic information from PSN. Simultaneously, it employs convolutional neural networks (CNNs) to obtain specificity prognostic information from multi-omics data. Finally, MSFN stacks the prognostic information from these networks and feeds into AdaboostRF for survival prediction. Results: Experiments results demonstrated that our method outperformed several state-of-the-art methods, and biologically validated by Kaplan-Meier and t-SNE.
RESUMEN
Blastocystis is the most prevalent intestinal eukaryotic microorganism with significant impacts on both human and animal health. Despite extensive research, its pathogenicity remains controversial. The COST Action CA21105, " Blastocystis under One Health" (OneHealthBlastocystis), aims to bridge gaps in our understanding by fostering a multidisciplinary network. This initiative focuses on developing standardised diagnostic methodologies, establishing a comprehensive subtype and microbiome databank, and promoting capacity building through education and collaboration. The Action is structured into five working groups, each targeting specific aspects of Blastocystis research, including epidemiology, diagnostics, 'omics technologies, in vivo and in vitro investigations, and data dissemination. By integrating advances across medical, veterinary, public, and environmental health, this initiative seeks to harmonise diagnostics, improve public health policies, and foster innovative research, ultimately enhancing our understanding of Blastocystis and its role in health and disease. This collaborative effort is expected to lead to significant advancements and practical applications, benefiting the scientific community and public health.
Blastocystis is a common microorganism found in the intestines of humans and animals. Its role in causing disease is still debated among scientists. The " Blastocystis under One Health" initiative aims to unite experts from human medicine, veterinary science, and environmental science to better understand this microorganism and its health effects. The project focuses on improving diagnostic methods, creating a comprehensive database of Blastocystis samples, and analysing its genetic and molecular makeup. Researchers will also study how Blastocystis interacts with other gut microbes and impacts gut health. Additionally, the initiative aims to educate healthcare professionals and the public about Blastocystis. By working together, scientists hope to develop better ways to diagnose, treat (if necessary), and/or prevent Blastocystis infections, ultimately protecting both human and animal health and enhancing our understanding of this widespread microorganism.