Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 270
Filter
1.
Med Image Anal ; 97: 103252, 2024 Jun 26.
Article in English | MEDLINE | ID: mdl-38963973

ABSTRACT

Histopathology image-based survival prediction aims to provide a precise assessment of cancer prognosis and can inform personalized treatment decision-making in order to improve patient outcomes. However, existing methods cannot automatically model the complex correlations between numerous morphologically diverse patches in each whole slide image (WSI), thereby preventing them from achieving a more profound understanding and inference of the patient status. To address this, here we propose a novel deep learning framework, termed dual-stream multi-dependency graph neural network (DM-GNN), to enable precise cancer patient survival analysis. Specifically, DM-GNN is structured with the feature updating and global analysis branches to better model each WSI as two graphs based on morphological affinity and global co-activating dependencies. As these two dependencies depict each WSI from distinct but complementary perspectives, the two designed branches of DM-GNN can jointly achieve the multi-view modeling of complex correlations between the patches. Moreover, DM-GNN is also capable of boosting the utilization of dependency information during graph construction by introducing the affinity-guided attention recalibration module as the readout function. This novel module offers increased robustness against feature perturbation, thereby ensuring more reliable and stable predictions. Extensive benchmarking experiments on five TCGA datasets demonstrate that DM-GNN outperforms other state-of-the-art methods and offers interpretable prediction insights based on the morphological depiction of high-attention patches. Overall, DM-GNN represents a powerful and auxiliary tool for personalized cancer prognosis from histopathology images and has great potential to assist clinicians in making personalized treatment decisions and improving patient outcomes.

2.
iScience ; 27(7): 110183, 2024 Jul 19.
Article in English | MEDLINE | ID: mdl-38989460

ABSTRACT

Current studies in early cancer detection based on liquid biopsy data often rely on off-the-shelf models and face challenges with heterogeneous data, as well as manually designed data preprocessing pipelines with different parameter settings. To address those challenges, we present AutoCancer, an automated, multimodal, and interpretable transformer-based framework. This framework integrates feature selection, neural architecture search, and hyperparameter optimization into a unified optimization problem with Bayesian optimization. Comprehensive experiments demonstrate that AutoCancer achieves accurate performance in specific cancer types and pan-cancer analysis, outperforming existing methods across three cohorts. We further demonstrated the interpretability of AutoCancer by identifying key gene mutations associated with non-small cell lung cancer to pinpoint crucial factors at different stages and subtypes. The robustness of AutoCancer, coupled with its strong interpretability, underscores its potential for clinical applications in early cancer detection.

3.
Int J Mol Sci ; 25(13)2024 Jun 27.
Article in English | MEDLINE | ID: mdl-39000124

ABSTRACT

Over the years, comprehensive explorations of the model organisms Caenorhabditis elegans (elegant worm) and Drosophila melanogaster (vinegar fly) have contributed substantially to our understanding of complex biological processes and pathways in multicellular organisms generally. Extensive functional genomic-phenomic, genomic, transcriptomic, and proteomic data sets have enabled the discovery and characterisation of genes that are crucial for life, called 'essential genes'. Recently, we investigated the feasibility of inferring essential genes from such data sets using advanced bioinformatics and showed that a machine learning (ML)-based workflow could be used to extract or engineer features from DNA, RNA, protein, and/or cellular data/information to underpin the reliable prediction of essential genes both within and between C. elegans and D. melanogaster. As these are two distantly related species within the Ecdysozoa, we proposed that this ML approach would be particularly well suited for species that are within the same phylum or evolutionary clade. In the present study, we cross-predicted essential genes within the phylum Nematoda (evolutionary clade V)-between C. elegans and the pathogenic parasitic nematode H. contortus-and then ranked and prioritised H. contortus proteins encoded by these genes as intervention (e.g., drug) target candidates. Using strong, validated predictors, we inferred essential genes of H. contortus that are involved predominantly in crucial biological processes/pathways including ribosome biogenesis, translation, RNA binding/processing, and signalling and which are highly transcribed in the germline, somatic gonad precursors, sex myoblasts, vulva cell precursors, various nerve cells, glia, or hypodermis. The findings indicate that this in silico workflow provides a promising avenue to identify and prioritise panels/groups of drug target candidates in parasitic nematodes for experimental validation in vitro and/or in vivo.


Subject(s)
Caenorhabditis elegans , Genes, Essential , Haemonchus , Machine Learning , Animals , Haemonchus/genetics , Caenorhabditis elegans/genetics , Helminth Proteins/genetics , Helminth Proteins/metabolism , Computational Biology/methods , Drosophila melanogaster/genetics
4.
Diabetes Care ; 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-39012781

ABSTRACT

OBJECTIVE: To evaluate associations of wildfire fine particulate matter (PM2.5) with diabetes across multiple countries and territories. RESEARCH DESIGN AND METHODS: We collected data on 3,612,135 diabetes hospitalizations from 1,008 locations in Australia, Brazil, Canada, Chile, New Zealand, Thailand, and Taiwan during 2000-2019. Daily wildfire-specific PM2.5 levels were estimated through chemical transport models and machine-learning calibration. Quasi-Poisson regression with distributed lag nonlinear models and random-effects meta-analysis were applied to estimate associations between wildfire-specific PM2.5 and diabetes hospitalization. Subgroup analyses were by age, sex, location income level, and country or territory. Diabetes hospitalizations attributable to wildfire-specific PM2.5 and nonwildfire PM2.5 were compared. RESULTS: Each 10 µg/m3 increase in wildfire-specific PM2.5 levels over the current day and previous 3 days was associated with relative risks (95% CI) of 1.017 (1.011-1.022), 1.023 (1.011-1.035), 1.023 (1.015-1.032), 0.962 (0.823-1.032), 1.033 (1.001-1.066), and 1.013 (1.004-1.022) for all-cause, type 1, type 2, malnutrition-related, other specified, and unspecified diabetes hospitalization, respectively. Stronger associations were observed for all-cause, type 1, and type 2 diabetes in Thailand, Australia, and Brazil; unspecified diabetes in New Zealand; and type 2 diabetes in high-income locations. Relative risks (95% CI) of 0.67% (0.16-1.18%) and 1.02% (0.20-1.81%) for all cause and type 2 diabetes hospitalizations were attributable to wildfire-specific PM2.5. Compared with nonwildfire PM2.5, wildfire-specific PM2.5 posed greater risks of all-cause, type 1, and type 2 diabetes and were responsible for 38.7% of PM2.5-related diabetes hospitalizations. CONCLUSIONS: We show the relatively underappreciated links between diabetes and wildfire air pollution, which can lead to a nonnegligible proportion of PM2.5-related diabetes hospitalizations. Precision prevention and mitigation should be developed for those in advantaged communities and in Thailand, Australia, and Brazil.

5.
Cell ; 187(13): 3357-3372.e19, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38866018

ABSTRACT

Microbial hydrogen (H2) cycling underpins the diversity and functionality of diverse anoxic ecosystems. Among the three evolutionarily distinct hydrogenase superfamilies responsible, [FeFe] hydrogenases were thought to be restricted to bacteria and eukaryotes. Here, we show that anaerobic archaea encode diverse, active, and ancient lineages of [FeFe] hydrogenases through combining analysis of existing and new genomes with extensive biochemical experiments. [FeFe] hydrogenases are encoded by genomes of nine archaeal phyla and expressed by H2-producing Asgard archaeon cultures. We report an ultraminimal hydrogenase in DPANN archaea that binds the catalytic H-cluster and produces H2. Moreover, we identify and characterize remarkable hybrid complexes formed through the fusion of [FeFe] and [NiFe] hydrogenases in ten other archaeal orders. Phylogenetic analysis and structural modeling suggest a deep evolutionary history of hybrid hydrogenases. These findings reveal new metabolic adaptations of archaea, streamlined H2 catalysts for biotechnological development, and a surprisingly intertwined evolutionary history between the two major H2-metabolizing enzymes.


Subject(s)
Archaea , Hydrogen , Hydrogenase , Phylogeny , Archaea/genetics , Archaea/enzymology , Archaeal Proteins/metabolism , Archaeal Proteins/chemistry , Archaeal Proteins/genetics , Genome, Archaeal , Hydrogen/metabolism , Hydrogenase/metabolism , Hydrogenase/genetics , Hydrogenase/chemistry , Iron-Sulfur Proteins/metabolism , Iron-Sulfur Proteins/genetics , Iron-Sulfur Proteins/chemistry , Models, Molecular , Protein Structure, Tertiary
6.
Article in English | MEDLINE | ID: mdl-38913512

ABSTRACT

RNA N6-methyladenosine is a prevalent and abundant type of RNA modification that exerts significant influence on diverse biological processes. To date, numerous computational approaches have been developed for predicting methylation, with most of them ignoring the correlations of different encoding strategies and failing to explore the adaptability of various attention mechanisms for methylation identification. To solve the above issues, we proposed an innovative framework for predicting RNA m6A modification site, termed BLAM6A-Merge. Specifically, it utilized a multimodal feature fusion strategy to combine the classification results of four features and Blastn tool. Apart from this, different attention mechanisms were employed for extracting higher-level features on specific features after the screening process. Extensive experiments on 12 benchmarking datasets demonstrated that BLAM6A-Merge achieved superior performance (average AUC: 0.849 for the full transcript mode and 0.784 for the mature mRNA mode). Notably, the Blastn tool was employed for the first time in the identification of methylation sites. The data and code can be accessed at https://github.com/DoraemonXia/BLAM6A-Merge.

7.
Int J Epidemiol ; 53(3)2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38725299

ABSTRACT

BACKGROUND: Model-estimated air pollution exposure products have been widely used in epidemiological studies to assess the health risks of particulate matter with diameters of ≤2.5 µm (PM2.5). However, few studies have assessed the disparities in health effects between model-estimated and station-observed PM2.5 exposures. METHODS: We collected daily all-cause, respiratory and cardiovascular mortality data in 347 cities across 15 countries and regions worldwide based on the Multi-City Multi-Country collaborative research network. The station-observed PM2.5 data were obtained from official monitoring stations. The model-estimated global PM2.5 product was developed using a machine-learning approach. The associations between daily exposure to PM2.5 and mortality were evaluated using a two-stage analytical approach. RESULTS: We included 15.8 million all-cause, 1.5 million respiratory and 4.5 million cardiovascular deaths from 2000 to 2018. Short-term exposure to PM2.5 was associated with a relative risk increase (RRI) of mortality from both station-observed and model-estimated exposures. Every 10-µg/m3 increase in the 2-day moving average PM2.5 was associated with overall RRIs of 0.67% (95% CI: 0.49 to 0.85), 0.68% (95% CI: -0.03 to 1.39) and 0.45% (95% CI: 0.08 to 0.82) for all-cause, respiratory, and cardiovascular mortality based on station-observed PM2.5 and RRIs of 0.87% (95% CI: 0.68 to 1.06), 0.81% (95% CI: 0.08 to 1.55) and 0.71% (95% CI: 0.32 to 1.09) based on model-estimated exposure, respectively. CONCLUSIONS: Mortality risks associated with daily PM2.5 exposure were consistent for both station-observed and model-estimated exposures, suggesting the reliability and potential applicability of the global PM2.5 product in epidemiological studies.


Subject(s)
Air Pollutants , Air Pollution , Cardiovascular Diseases , Cities , Environmental Exposure , Particulate Matter , Humans , Particulate Matter/adverse effects , Particulate Matter/analysis , Cardiovascular Diseases/mortality , Cities/epidemiology , Environmental Exposure/adverse effects , Air Pollution/adverse effects , Air Pollution/analysis , Air Pollutants/adverse effects , Air Pollutants/analysis , Respiratory Tract Diseases/mortality , Male , Mortality/trends , Female , Middle Aged , Aged , Environmental Monitoring/methods , Adult , Machine Learning
8.
BMC Med ; 22(1): 188, 2024 May 07.
Article in English | MEDLINE | ID: mdl-38715068

ABSTRACT

BACKGROUND: Floods are the most frequent weather-related disaster, causing significant health impacts worldwide. Limited studies have examined the long-term consequences of flooding exposure. METHODS: Flood data were retrieved from the Dartmouth Flood Observatory and linked with health data from 499,487 UK Biobank participants. To calculate the annual cumulative flooding exposure, we multiplied the duration and severity of each flood event and then summed these values for each year. We conducted a nested case-control analysis to evaluate the long-term effect of flooding exposure on all-cause and cause-specific mortality. Each case was matched with eight controls. Flooding exposure was modelled using a distributed lag non-linear model to capture its nonlinear and lagged effects. RESULTS: The risk of all-cause mortality increased by 6.7% (odds ratio (OR): 1.067, 95% confidence interval (CI): 1.063-1.071) for every unit increase in flood index after confounders had been controlled for. The mortality risk from neurological and mental diseases was negligible in the current year, but strongest in the lag years 3 and 4. By contrast, the risk of mortality from suicide was the strongest in the current year (OR: 1.018, 95% CI: 1.008-1.028), and attenuated to lag year 5. Participants with higher levels of education and household income had a higher estimated risk of death from most causes whereas the risk of suicide-related mortality was higher among participants who were obese, had lower household income, engaged in less physical activity, were non-moderate alcohol consumers, and those living in more deprived areas. CONCLUSIONS: Long-term exposure to floods is associated with an increased risk of mortality. The health consequences of flooding exposure would vary across different periods after the event, with different profiles of vulnerable populations identified for different causes of death. These findings contribute to a better understanding of the long-term impacts of flooding exposure.


Subject(s)
Floods , Humans , Floods/mortality , Case-Control Studies , United Kingdom/epidemiology , Male , Female , Aged , Middle Aged , Adult , Cause of Death , Risk Factors
9.
Cell Genom ; 4(6): 100565, 2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38781966

ABSTRACT

Spatially resolved transcriptomics (SRT) technologies have revolutionized the study of tissue organization. We introduce a graph convolutional network with an attention and positive emphasis mechanism, termed BINARY, relying exclusively on binarized SRT data to accurately delineate spatial domains. BINARY outperforms existing methods across various SRT data types while using significantly less input information. Our study suggests that precise gene expression quantification may not always be essential, inspiring further exploration of the broader applications of spatially resolved binarized gene expression data.


Subject(s)
Gene Expression Profiling , Humans , Gene Expression Profiling/methods , Transcriptome/genetics , Algorithms
10.
Article in English | MEDLINE | ID: mdl-38607721

ABSTRACT

N4-acetylcytidine (ac4C) is a post-transcriptional modification in mRNA that is critical in mRNA translation in terms of stability and regulation. In the past few years, numerous approaches employing convolutional neural networks (CNN) and Transformer have been proposed for the identification of ac4C sites, with each variety of approaches processing distinct characteristics. CNN-based methods excels at extracting local features and positional information, whereas Transformer-based ones stands out in establishing long-range dependencies and generating global representations. Given the importance of both local and global features in mRNA ac4C sites identification, we propose a novel method termed TransC-ac4C which combines CNN and Transformer together for enhancing the feature extraction capability and improving the identification accuracy. Five different feature encoding strategies (One-hot, NCP, ND, EIIP, and K-mer) are employed to generate the mRNA sequence representations, in which way the sequence attributes and physical and chemical properties of the sequences can be embedded. To strengthen the relevance of features, we construct a novel feature fusion method. Firstly, the CNN is employed to process five single features, stitch them together and feed them to the Transformer layer. Then, our approach employs CNN to extract local features and Transformer subsequently to establish global long-range dependencies among extracted features. We use 5-fold cross-validation to evaluate the model, and the evaluation indicators are significantly improved. The prediction accuracy of the two datasets is as high as 81.42.

11.
Mol Oncol ; 18(6): 1437-1459, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38627210

ABSTRACT

Different molecular classifications for gastric cancer (GC) have been proposed based on multi-omics platforms with the long-term goal of improved precision treatment. However, the GC (phospho)proteome remains incompletely characterized, particularly at the level of tyrosine phosphorylation. In addition, previous multiomics-based stratification of patient cohorts has lacked identification of corresponding cell line models and comprehensive validation of broad or subgroup-selective therapeutic targets. To address these knowledge gaps, we applied a reverse approach, undertaking the most comprehensive (phospho)proteomic analysis of GC cell lines to date and cross-validating this using publicly available data. Mass spectrometry (MS)-based (phospho)proteomic and tyrosine phosphorylation datasets were subjected to individual or integrated clustering to identify subgroups that were subsequently characterized in terms of enriched molecular processes and pathways. Significant congruence was detected between cell line proteomic and specific patient-derived transcriptomic subclassifications. Many protein kinases exhibiting 'outlier' expression or phosphorylation in the cell line dataset exhibited genomic aberrations in patient samples and association with poor prognosis, with casein kinase I isoform delta/epsilon (CSNK1D/E) being experimentally validated as potential therapeutic targets. Src family kinases were predicted to be commonly hyperactivated in GC cell lines, consistent with broad sensitivity to the next-generation Src inhibitor eCF506. In addition, phosphoproteomic and integrative clustering segregated the cell lines into two subtypes, with epithelial-mesenchyme transition (EMT) and proliferation-associated processes enriched in one, designated the EMT subtype, and metabolic pathways, cell-cell junctions, and the immune response dominating the features of the other, designated the metabolism subtype. Application of kinase activity prediction algorithms and interrogation of gene dependency and drug sensitivity databases predicted that the mechanistic target of rapamycin kinase (mTOR) and dual specificity mitogen-activated protein kinase kinase 2 (MAP2K2) represented potential therapeutic targets for the EMT and metabolism subtypes, respectively, and this was confirmed using selective inhibitors. Overall, our study provides novel, in-depth insights into GC proteomics, kinomics, and molecular taxonomy and reveals potential therapeutic targets that could provide the basis for precision treatments.


Subject(s)
Proteome , Stomach Neoplasms , Stomach Neoplasms/metabolism , Stomach Neoplasms/genetics , Stomach Neoplasms/pathology , Stomach Neoplasms/classification , Humans , Proteome/metabolism , Cell Line, Tumor , Proteomics/methods , Phosphorylation , Molecular Targeted Therapy
12.
Bioinform Adv ; 4(1): vbae035, 2024.
Article in English | MEDLINE | ID: mdl-38549946

ABSTRACT

Motivation: PE/PPE proteins, highly abundant in the Mycobacterium genome, play a vital role in virulence and immune modulation. Understanding their functions is key to comprehending the internal mechanisms of Mycobacterium. However, a lack of dedicated resources has limited research into PE/PPE proteins. Results: Addressing this gap, we introduce MycobactERIal PE/PPE proTeinS (MERITS), a comprehensive 3D structure database specifically designed for PE/PPE proteins. MERITS hosts 22 353 non-redundant PE/PPE proteins, encompassing details like physicochemical properties, subcellular localization, post-translational modification sites, protein functions, and measures of antigenicity, toxicity, and allergenicity. MERITS also includes data on their secondary and tertiary structure, along with other relevant biological information. MERITS is designed to be user-friendly, offering interactive search and data browsing features to aid researchers in exploring the potential functions of PE/PPE proteins. MERITS is expected to become a crucial resource in the field, aiding in developing new diagnostics and vaccines by elucidating the sequence-structure-functional relationships of PE/PPE proteins. Availability and implementation: MERITS is freely accessible at http://merits.unimelb-biotools.cloud.edu.au/.

13.
Comput Biol Med ; 173: 108339, 2024 May.
Article in English | MEDLINE | ID: mdl-38547658

ABSTRACT

The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.


Subject(s)
Artificial Intelligence , Drug Discovery , Drug Evaluation, Preclinical , Neural Networks, Computer , Benchmarking
14.
Environ Pollut ; 348: 123852, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38531468

ABSTRACT

Model-estimated air pollution exposure assessments have been extensively employed in the evaluation of health risks associated with air pollution. However, few studies synthetically evaluate the reliability of model-estimated PM2.5 products in health risk assessment by comparing them with ground-based monitoring station air quality data. In response to this gap, we undertook a meticulously structured systematic review and meta-analysis. Our objective was to aggregate existing comparative studies to ascertain the disparity in mortality effect estimates derived from model-estimated ambient PM2.5 exposure versus those based on monitoring station-observed PM2.5 exposure. We conducted searches across multiple databases, namely PubMed, Scopus, and Web of Science, using predefined keywords. Ultimately, ten studies were included in the review. Of these, seven investigated long-term annual exposure, while the remaining three studies focused on short-term daily PM2.5 exposure. Despite variances in the estimated Exposure-Response (E-R) associations, most studies revealed positive associations between ambient PM2.5 exposure and all-cause and cardiovascular mortality, irrespective of the exposure being estimated through models or observed at monitoring stations. Our meta-analysis revealed that all-cause mortality risk associated with model-estimated PM2.5 exposure was in line with that derived from station-observed sources. The pooled Relative Risk (RR) was 1.083 (95% CI: 1.047, 1.119) for model-estimated exposure, and 1.089 (95% CI: 1.054, 1.125) for station-observed sources (p = 0.795). In conclusion, most model-estimated air pollution products have demonstrated consistency in estimating mortality risk compared to data from monitoring stations. However, only a limited number of studies have undertaken such comparative analyses, underscoring the necessity for more comprehensive investigations to validate the reliability of these model-estimated exposure in mortality risk assessment.


Subject(s)
Air Pollutants , Air Pollution , Air Pollutants/toxicity , Air Pollutants/analysis , Particulate Matter/analysis , Environmental Exposure/adverse effects , Environmental Exposure/analysis , Reproducibility of Results , Air Pollution/analysis , Risk Assessment
15.
Lancet Planet Health ; 8(3): e146-e155, 2024 03.
Article in English | MEDLINE | ID: mdl-38453380

ABSTRACT

BACKGROUND: The acute health effects of short-term (hours to days) exposure to fine particulate matter (PM2·5) have been well documented; however, the global mortality burden attributable to this exposure has not been estimated. We aimed to estimate the global, regional, and urban mortality burden associated with short-term exposure to PM2·5 and the spatiotemporal variations in this burden from 2000 to 2019. METHODS: We combined estimated global daily PM2·5 concentrations, annual population counts, country-level mortality rates, and epidemiologically derived exposure-response functions to estimate the mortality attributable to short-term PM2·5 exposure from 2000 to 2019, in the continental regions and in 13 189 urban centres worldwide at a spatial resolution of 0·1°â€ˆ× 0·1°. We tested the robustness of our mortality estimates with different theoretical minimum risk exposure levels, lag effects, and exposure-response functions. FINDINGS: Approximately 1 million (95% CI 690 000-1·3 million) premature deaths per year from 2000 to 2019 were attributable to short-term PM2·5 exposure, representing 2·08% (1·41-2·75) of total global deaths or 17 (11-22) premature deaths per 100 000 population. Annually, 0·23 million (0·15 million-0·30 million) deaths attributable to short-term PM2·5 exposure were in urban areas, constituting 22·74% of the total global deaths attributable to this cause and accounting for 2·30% (1·56-3·05) of total global deaths in urban areas. The sensitivity analyses showed that our worldwide estimates of mortality attributed to short-term PM2·5 exposure were robust. INTERPRETATION: Short-term exposure to PM2·5 contributes a substantial global mortality burden, particularly in Asia and Africa, as well as in global urban areas. Our results highlight the importance of mitigation strategies to reduce short-term exposure to air pollution and its adverse effects on human health. FUNDING: Australian Research Council and the Australian National Health and Medical Research Council.


Subject(s)
Air Pollution , Particulate Matter , Humans , Particulate Matter/analysis , Australia , Air Pollution/adverse effects , Air Pollution/analysis , Mortality, Premature , Asia
16.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38552307

ABSTRACT

MOTIVATION: Cell-type clustering is a crucial first step for single-cell RNA-seq data analysis. However, existing clustering methods often provide different results on cluster assignments with respect to their own data pre-processing, choice of distance metrics, and strategies of feature extraction, thereby limiting their practical applications. RESULTS: We propose Cross-Tabulation Ensemble Clustering (CTEC) method that formulates two re-clustering strategies (distribution- and outlier-based) via cross-tabulation. Benchmarking experiments on five scRNA-Seq datasets illustrate that the proposed CTEC method offers significant improvements over the individual clustering methods. Moreover, CTEC-DB outperforms the state-of-the-art ensemble methods for single-cell data clustering, with 45.4% and 17.1% improvement over the single-cell aggregated from ensemble clustering method (SAFE) and the single-cell aggregated clustering via Mixture model ensemble method (SAME), respectively, on the two-method ensemble test. AVAILABILITY AND IMPLEMENTATION: The source code of the benchmark in this work is available at the GitHub repository https://github.com/LWCHN/CTEC.git.


Subject(s)
Algorithms , Single-Cell Analysis , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Cluster Analysis , Data Analysis , Gene Expression Profiling/methods
17.
J Chem Inf Model ; 64(4): 1407-1418, 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38334115

ABSTRACT

Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.


Subject(s)
Algorithms , Proteins , Humans , Proteins/chemistry , Amino Acid Sequence , Neural Networks, Computer , Amino Acids
18.
Article in English | MEDLINE | ID: mdl-38190667

ABSTRACT

Origins of replication sites (ORIs) are crucial genomic regions where DNA replication initiation takes place, playing pivotal roles in fundamental biological processes like cell division, gene expression regulation, and DNA integrity. Accurate identification of ORIs is essential for comprehending cell replication, gene expression, and mutation-related diseases. However, experimental approaches for ORI identification are often expensive and time-consuming, leading to the growing popularity of computational methods. In this study, we present PLANNER (DeeP LeArNiNg prEdictor for ORI), a novel approach for species-specific and cell-specific prediction of eukaryotic ORIs. PLANNER uses the multi-scale ktuple sequences as input and employs the DNABERT pre-training model with transfer learning and ensemble learning strategies to train accurate predictive models. Extensive empirical test results demonstrate that PLANNER achieved superior predictive performance compared to state-of-the-art approaches, including iOri-Euk, Stack-ORI, and ORI-Deep, within specific cell types and across different cell types. Furthermore, by incorporating an interpretable analysis mechanism, we provide insights into the learned patterns, facilitating the mapping from discovering important sequential determinants to comprehensively analysing their biological functions. To facilitate the widespread utilisation of PLANNER, we developed an online webserver and local stand-alone software, available at http://planner.unimelb-biotools.cloud.edu.au/ and https://github.com/CongWang3/PLANNER, respectively.

19.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38261340

ABSTRACT

The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.


Subject(s)
Benchmarking , Gene Regulatory Networks , Area Under Curve , Learning , Neural Networks, Computer
20.
BMC Bioinformatics ; 25(1): 13, 2024 Jan 09.
Article in English | MEDLINE | ID: mdl-38195423

ABSTRACT

BACKGROUND: MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations. RESULTS: In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM. CONCLUSIONS: Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.


Subject(s)
Deep Learning , MicroRNAs , Humans , Benchmarking , Machine Learning , Nucleotides
SELECTION OF CITATIONS
SEARCH DETAIL
...