Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 60
Filter
1.
J Transl Med ; 22(1): 894, 2024 Oct 03.
Article in English | MEDLINE | ID: mdl-39363164

ABSTRACT

BACKGROUND: Ductal carcinoma in situ (DCIS) of the breast is an early stage of breast cancer, and preventing its progression to invasive ductal carcinoma (IDC) is crucial for the early detection and treatment of breast cancer. Although single-cell transcriptome analysis technology has been widely used in breast cancer research, the biological mechanisms underlying the transition from DCIS to IDC remain poorly understood. RESULTS: We identified eight cell types through cell annotation, finding significant differences in T cell proportions between DCIS and IDC. Using this as a basis, we performed pseudotime analysis on T cell subpopulations, revealing that differentially expressed genes primarily regulate immune cell migration and modulation. By intersecting WGCNA results of T cells highly correlated with the subtypes and the differentially expressed genes, we identified six key genes: FGFBP2, GNLY, KLRD1, TYROBP, PRF1, and NKG7. Excluding PRF1, the other five genes were significantly associated with overall survival in breast cancer, highlighting their potential as prognostic biomarkers. CONCLUSIONS: We identified immune cells that may play a role in the progression from DCIS to IDC and uncovered five key genes that can serve as prognostic markers for breast cancer. These findings provide insights into the mechanisms underlying the transition from DCIS to IDC, offering valuable perspectives for future research. Additionally, our results contribute to a better understanding of the biological processes involved in breast cancer progression.


Subject(s)
Breast Neoplasms , Carcinoma, Ductal, Breast , Carcinoma, Intraductal, Noninfiltrating , Disease Progression , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Single-Cell Analysis , Tumor Microenvironment , Humans , Female , Tumor Microenvironment/genetics , Tumor Microenvironment/immunology , Prognosis , Carcinoma, Intraductal, Noninfiltrating/genetics , Carcinoma, Intraductal, Noninfiltrating/immunology , Carcinoma, Intraductal, Noninfiltrating/pathology , Breast Neoplasms/genetics , Breast Neoplasms/immunology , Breast Neoplasms/pathology , Carcinoma, Ductal, Breast/genetics , Carcinoma, Ductal, Breast/pathology , Carcinoma, Ductal, Breast/immunology , Transcriptome/genetics , Single-Cell Gene Expression Analysis
2.
Research (Wash D C) ; 7: 0487, 2024.
Article in English | MEDLINE | ID: mdl-39324017

ABSTRACT

Understanding protein corona composition is essential for evaluating their potential applications in biomedicine. Relative protein abundance (RPA), accounting for the total proteins in the corona, is an important parameter for describing the protein corona. For the first time, we comprehensively predicted the RPA of multiple proteins on the protein corona. First, we used multiple machine learning algorithms to predict whether a protein adsorbs to a nanoparticle, which is dichotomous prediction. Then, we selected the top 3 performing machine learning algorithms in dichotomous prediction to predict the specific value of RPA, which is regression prediction. Meanwhile, we analyzed the advantages and disadvantages of different machine learning algorithms for RPA prediction through interpretable analysis. Finally, we mined important features about the RPA prediction, which provided effective suggestions for the preliminary design of protein corona. The service for the prediction of RPA is available at http://www.bioai-lab.com/PC_ML.

3.
PLoS Comput Biol ; 20(9): e1012409, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39235988

ABSTRACT

Spatial transcriptome technology can parse transcriptomic data at the spatial level to detect high-throughput gene expression and preserve information regarding the spatial structure of tissues. Identifying spatial domains, that is identifying regions with similarities in gene expression and histology, is the most basic and critical aspect of spatial transcriptome data analysis. Most current methods identify spatial domains only through a single view, which may obscure certain important information and thus fail to make full use of the information embedded in spatial transcriptome data. Therefore, we propose an unsupervised clustering framework based on multiview graph convolutional networks (MVST) to achieve accurate spatial domain recognition by the learning graph embedding features of neighborhood graphs constructed from gene expression information, spatial location information, and histopathological image information through multiview graph convolutional networks. By exploring spatial transcriptomes from multiple views, MVST enables data from all parts of the spatial transcriptome to be comprehensively and fully utilized to obtain more accurate spatial expression patterns. We verified the effectiveness of MVST on real spatial transcriptome datasets, the robustness of MVST on some simulated datasets, and the reasonableness of the framework structure of MVST in ablation experiments, and from the experimental results, it is clear that MVST can achieve a more accurate spatial domain identification compared with the current more advanced methods. In conclusion, MVST is a powerful tool for spatial transcriptome research with improved spatial domain recognition.


Subject(s)
Computational Biology , Gene Expression Profiling , Transcriptome , Transcriptome/genetics , Computational Biology/methods , Gene Expression Profiling/methods , Humans , Cluster Analysis , Algorithms , Neural Networks, Computer , Animals , Databases, Genetic
4.
Int J Biol Macromol ; 277(Pt 3): 134317, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39094861

ABSTRACT

Plant vacuoles, play a crucial role in maintaining cellular stability, adapting to environmental changes, and responding to external pressures. The accurate identification of vacuolar proteins (PVPs) is crucial for understanding the biosynthetic mechanisms of intracellular vacuoles and the adaptive mechanisms of plants. In order to more accurately identify vacuole proteins, this study developed a new predictive model PEL-PVP based on ESM-2. Through this study, the feasibility and effectiveness of using advanced pre-training models and fine-tuning techniques for bioinformatics tasks were demonstrated, providing new methods and ideas for plant vacuolar protein research. In addition, previous datasets for vacuolar proteins were balanced, but imbalance is more closely related to the actual situation. Therefore, this study constructed an imbalanced dataset UB-PVP from the UniProt database,helping the model better adapt to the complexity and uncertainty in real environments, thereby improving the model's generalization ability and practicality. The experimental results show that compared with existing recognition techniques, achieving significant improvements in multiple indicators, with 6.08 %, 13.51 %, 11.9 %, and 5 % improvements in ACC, SP, MCC, and AUC, respectively. The accuracy reaches 94.59 %, significantly higher than the previous best model GraphIdn. This provides an efficient and precise tool for the study of plant vacuole proteins.


Subject(s)
Plant Proteins , Vacuoles , Vacuoles/metabolism , Computational Biology/methods , Databases, Protein
5.
Article in English | MEDLINE | ID: mdl-39083393

ABSTRACT

Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial.This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively. In addition, we provide a user-friendly Web server (http://www.bioailab. com/Hyb_SEnc). The source code is freely available at https://github.com/fxh1001/Hyb_SEnc.

6.
Brief Funct Genomics ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38860675

ABSTRACT

In recent years, the application of single-cell transcriptomics and spatial transcriptomics analysis techniques has become increasingly widespread. Whether dealing with single-cell transcriptomic or spatial transcriptomic data, dimensionality reduction and clustering are indispensable. Both single-cell and spatial transcriptomic data are often high-dimensional, making the analysis and visualization of such data challenging. Through dimensionality reduction, it becomes possible to visualize the data in a lower-dimensional space, allowing for the observation of relationships and differences between cell subpopulations. Clustering enables the grouping of similar cells into the same cluster, aiding in the identification of distinct cell subpopulations and revealing cellular diversity, providing guidance for downstream analyses. In this review, we systematically summarized the most widely recognized algorithms employed for the dimensionality reduction and clustering analysis of single-cell transcriptomic and spatial transcriptomic data. This endeavor provides valuable insights and ideas that can contribute to the development of novel tools in this rapidly evolving field.

7.
BMC Biol ; 22(1): 126, 2024 May 30.
Article in English | MEDLINE | ID: mdl-38816885

ABSTRACT

BACKGROUND: A promoter is a specific sequence in DNA that has transcriptional regulatory functions, playing a role in initiating gene expression. Identifying promoters and their strengths can provide valuable information related to human diseases. In recent years, computational methods have gained prominence as an effective means for identifying promoter, offering a more efficient alternative to labor-intensive biological approaches. RESULTS: In this study, a two-stage integrated predictor called "msBERT-Promoter" is proposed for identifying promoters and predicting their strengths. The model incorporates multi-scale sequence information through a tokenization strategy and fine-tunes the DNABERT model. Soft voting is then used to fuse the multi-scale information, effectively addressing the issue of insufficient DNA sequence information extraction in traditional models. To the best of our knowledge, this is the first time an integrated approach has been used in the DNABERT model for promoter identification and strength prediction. Our model achieves accuracy rates of 96.2% for promoter identification and 79.8% for promoter strength prediction, significantly outperforming existing methods. Furthermore, through attention mechanism analysis, we demonstrate that our model can effectively combine local and global sequence information, enhancing its interpretability. CONCLUSIONS: msBERT-Promoter provides an effective tool that successfully captures sequence-related attributes of DNA promoters and can accurately identify promoters and predict their strengths. This work paves a new path for the application of artificial intelligence in traditional biology.


Subject(s)
Promoter Regions, Genetic , Computational Biology/methods , DNA/genetics , Humans , Models, Genetic , Sequence Analysis, DNA/methods
8.
Comput Biol Med ; 171: 108129, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38342046

ABSTRACT

DNA N6-methyladenine (6mA) modifications play a pivotal role in the regulation of growth, development, and diseases in organisms. As a significant epigenetic marker, 6mA modifications extensively participate in the intricate regulatory networks of the genome. Hence, gaining a profound understanding of how 6mA is intricately involved in these biological processes is imperative for deciphering the gene regulatory networks within organisms. In this study, we propose PSAC-6mA (Position-self-attention Capsule-6mA), a sequence-location-based self-attention capsule network. The positional layer in the model enables positional relationship extraction and independent parameter setting for each base position, avoiding parameter sharing inherent in convolutional approaches. Simultaneously, the self-attention capsule network enhances dimensionality, capturing correlation information between capsules and achieving exceptional results in feature extraction across multiple spatial dimensions within the model. Experimental results demonstrate the superior performance of PSAC-6mA in recognizing 6mA motifs across various species.


Subject(s)
Adenine , DNA Methylation , DNA/genetics , Genome , Gene Regulatory Networks
9.
Comput Biol Med ; 169: 107943, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38211382

ABSTRACT

BACKGROUND: Breast cancer is the most prevalent malignancy in women. Advanced breast cancer can develop distant metastases, posing a severe threat to the life of patients. Because the clinical warning signs of distant metastasis are manifested in the late stage of the disease, there is a need for better methods of predicting metastasis. METHODS: First, we screened breast cancer distant metastasis target genes by performing difference analysis and weighted gene co-expression network analysis (WGCNA) on the selected datasets, and performed analyses such as GO enrichment analysis on these target genes. Secondly, we screened breast cancer distant metastasis target genes by LASSO regression analysis and performed correlation analysis and other analyses on these biomarkers. Finally, we constructed several breast cancer distant metastasis prediction models based on Logistic Regression (LR) model, Random Forest (RF) model, Support Vector Machine (SVM) model, Gradient Boosting Decision Tree (GBDT) model and eXtreme Gradient Boosting (XGBoost) model, and selected the optimal model from them. RESULTS: Several 21-gene breast cancer distant metastasis prediction models were constructed, with the best performance of the model constructed based on the random forest model. This model accurately predicted the emergence of distant metastases from breast cancer, with an accuracy of 93.6 %, an F1-score of 88.9 % and an AUC value of 91.3 % on the validation set. CONCLUSION: Our findings have the potential to be translated into a point-of-care prognostic analysis to reduce breast cancer mortality.


Subject(s)
Breast Neoplasms , Humans , Female , Breast , Gene Expression Profiling , Logistic Models , Machine Learning
10.
Methods ; 222: 142-151, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38242383

ABSTRACT

Protein-protein interactions play an important role in various biological processes. Interaction among proteins has a wide range of applications. Therefore, the correct identification of protein-protein interactions sites is crucial. In this paper, we propose a novel predictor for protein-protein interactions sites, AGF-PPIS, where we utilize a multi-head self-attention mechanism (introducing a graph structure), graph convolutional network, and feed-forward neural network. We use the Euclidean distance between each protein residue to generate the corresponding protein graph as the input of AGF-PPIS. On the independent test dataset Test_60, AGF-PPIS achieves superior performance over comparative methods in terms of seven different evaluation metrics (ACC, precision, recall, F1-score, MCC, AUROC, AUPRC), which fully demonstrates the validity and superiority of the proposed AGF-PPIS model. The source codes and the steps for usage of AGF-PPIS are available at https://github.com/fxh1001/AGF-PPIS.


Subject(s)
Benchmarking , Proton Pump Inhibitors , Neural Networks, Computer , Software
11.
Int J Nurs Pract ; 30(3): e13237, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38263693

ABSTRACT

BACKGROUND: The condition and correlation of fatigue, sleep and physical activity in postoperative patients with pituitary adenomas remain unclear. This survey aimed to evaluate the current status and influencing factors of fatigue, sleep and physical activity in postoperative patients with pituitary adenomas. METHODS: Patients undergoing pituitary adenoma resection in two tertiary hospitals from November 2019 to November 2021 were included. The general data questionnaire, Multidimensional Fatigue Inventory (MFI-20), Pittsburgh Sleep Quality Index (PSQI) and international physical activity questionnaire were used for data analysis. RESULTS: In total, 184 patients with pituitary adenomas were included. The postoperative patients with pituitary adenomas had a high level of fatigue. In total, 34 (18.5%) patients had low level of physical activity, 76(41.3%) patients had medium level of physical activity and 74 (40.2%) had high level of physical activity. Postoperative time, PSQI, physical activity level and gender were the influencing factors of fatigue in patients with pituitary adenomas (all P < 0.05). CONCLUSIONS: Postoperative patients with pituitary adenomas have a higher level of fatigue, and it is related to reduced sleep quality and activity. Relevant nursing measures should be taken according to the influencing factors of fatigue to reduce the fatigue of postoperative patients with pituitary adenomas.


Subject(s)
Adenoma , Exercise , Fatigue , Pituitary Neoplasms , Humans , Male , Female , Pituitary Neoplasms/surgery , Middle Aged , Adult , Surveys and Questionnaires , Adenoma/surgery , Sleep Quality , Postoperative Period , Aged , Sleep
12.
Brief Funct Genomics ; 23(4): 295-302, 2024 Jul 19.
Article in English | MEDLINE | ID: mdl-38267084

ABSTRACT

Numerous methods have been developed to integrate spatial transcriptomics sequencing data with single-cell RNA sequencing (scRNA-seq) data. Continuous development and improvement of these methods offer multiple options for integrating and analyzing scRNA-seq and spatial transcriptomics data based on diverse research inquiries. However, each method has its own advantages, limitations and scope of application. Researchers need to select the most suitable method for their research purposes based on the actual situation. This review article presents a compilation of 19 integration methods sourced from a wide range of available approaches, serving as a comprehensive reference for researchers to select the suitable integration method for their specific research inquiries. By understanding the principles of these methods, we can identify their similarities and differences, comprehend their applicability and potential complementarity, and lay the foundation for future method development and understanding. This review article presents 19 methods that aim to integrate scRNA-seq data and spatial transcriptomics data. The methods are classified into two main groups and described accordingly. The article also emphasizes the incorporation of High Variance Genes in annotating various technologies, aiming to obtain biologically relevant information aligned with the intended purpose.


Subject(s)
Single-Cell Analysis , Transcriptome , Single-Cell Analysis/methods , Humans , Transcriptome/genetics , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods , Software , Animals , Single-Cell Gene Expression Analysis
13.
Nucleic Acids Res ; 52(D1): D990-D997, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37831073

ABSTRACT

Rare variants contribute significantly to the genetic causes of complex traits, as they can have much larger effects than common variants and account for much of the missing heritability in genome-wide association studies. The emergence of UK Biobank scale datasets and accurate gene-level rare variant-trait association testing methods have dramatically increased the number of rare variant associations that have been detected. However, no systematic collection of these associations has been carried out to date, especially at the gene level. To address the issue, we present the Rare Variant Association Repository (RAVAR), a comprehensive collection of rare variant associations. RAVAR includes 95 047 high-quality rare variant associations (76186 gene-level and 18 861 variant-level associations) for 4429 reported traits which are manually curated from 245 publications. RAVAR is the first resource to collect and curate published rare variant associations in an interactive web interface with integrated visualization, search, and download features. Detailed gene and SNP information are provided for each association, and users can conveniently search for related studies by exploring the EFO tree structure and interactive Manhattan plots. RAVAR could vastly improve the accessibility of rare variant studies. RAVAR is freely available for all users without login requirement at http://www.ravar.bio.


Subject(s)
Databases, Genetic , Genetic Variation , Genome-Wide Association Study , Genome-Wide Association Study/methods , Multifactorial Inheritance , Phenotype
14.
Front Psychol ; 14: 1187433, 2023.
Article in English | MEDLINE | ID: mdl-37457089

ABSTRACT

Background: Healthcare systems had an exceptionally difficult time during the early COVID-19 pandemic. Nurse managers in particular made enormous contributions to ensuring the safety of patients and front-line nurses while being under excessive psychological stress. However, little is known about their experiences during this time. Objective: The aim of this study was thus to assess the level of stress overload and psychological feelings of nurse managers during the early COVID-19 pandemic. Methods: A mixed methods sequential explanatory design study with non-random convenience sampling was performed, following the STROBE and COREQ checklists. The study was conducted at the Affiliated Dongyang Hospital, Wenzhou Medical University, with data collected from six provinces in southern China (Zhejiang, Hubei, Shanghai, Jiangsu, Hunan and Jiangxi) during March 2020 and June 2020. A total of 966 nurse managers completed the Stress Overload Scale and Work-Family Support Scale. In addition, a nested sample of nurse managers participated in semi-structured face-to-face interviews. The data were then analyzed using qualitative content analysis, Pearson correlation, and multiple linear regression. Results: The quantitative results showed that nurse managers experienced a moderate level of stress load. There was a significant negative correlation between work-family support and stress load (r = -0.551, p < 0.01). Concerns about protecting front-line nurses and work-family support were the main factors affecting the stress load, which accounted for 34.0% of the total variation. Qualitative analysis identified four main thematic analyses that explained stress load: (1) great responsibility and great stress, (2) unprecedented stress-induced stress response, (3) invisible stress: the unknown was even more frightening, and (4) stress relief from love and support. Taken together these findings indicate that concern about protecting front-line nurses and negative work-family support of nurse managers were the main factors causing stress overload. Conclusion: Implementing measures focused on individual psychological adjustment combined with community and family support and belongingness is one potential strategy to reduce psychological stress among nurse managers.

15.
Front Environ Sci Eng ; 17(6): 77, 2023.
Article in English | MEDLINE | ID: mdl-36628171

ABSTRACT

An intelligent and efficient methodology is needed owning to the continuous increase of global municipal solid waste (MSW). This is because the common methods of manual and semi-mechanical screenings not only consume large amount of manpower and material resources but also accelerate virus community transmission. As the categories of MSW are diverse considering their compositions, chemical reactions, and processing procedures, etc., resulting in low efficiencies in MSW sorting using the traditional methods. Deep machine learning can help MSW sorting becoming into a smarter and more efficient mode. This study for the first time applied MSWNet in MSW sorting, a ResNet-50 with transfer learning. The method of cyclical learning rate was taken to avoid blind finding, and tests were repeated until accidentally encountering a good value. Measures of visualization were also considered to make the MSWNet model more transparent and accountable. Results showed transfer learning enhanced the efficiency of training time (from 741 s to 598.5 s), and improved the accuracy of recognition performance (from 88.50% to 93.50%); MSWNet showed a better performance in MSW classsification in terms of sensitivity (93.50%), precision (93.40%), F1-score (93.40%), accuracy (93.50%) and AUC (92.00%). The findings of this study can be taken as a reference for building the model MSW classification by deep learning, quantifying a suitable learning rate, and changing the data from high dimensions to two dimensions. Electronic Supplementary material: Supplementary material is available in the online version of this article at 10.1007/s11783-023-1677-1 and is accessible for authorized users.

16.
Front Public Health ; 10: 914599, 2022.
Article in English | MEDLINE | ID: mdl-35844847

ABSTRACT

Objective: Behavioral intentions to care for patients with infectious diseases are crucial for improving quality of care. However, there have been few studies of the behavioral intentions and factors influencing patient care by clinical nurses during the COVID-19 pandemic. This study aims to explore cognition, attitudes, subjective norms, self-efficacy, and behavioral intentions of clinical nurses while caring for COVID-19 patients and to explore any influencing factors. Method: A cross-sectional survey was conducted of nurses through convenience sampling in southeast China from February 2020 to March 2020. The questionnaire was developed based on the theory of planned behavior and self-efficacy. Results: A total of 774 nurses completed the survey. Of these, 69.12% (535/774) reported positive behavioral intentions, 75.58% (585/774) reported a positive attitude, and 63.82% (494/774) reported having the confidence to care for patients. However, the lack of support from family and friends and special allowance affected their self-confidence. Attitude, self-efficacy, subjective norms, and ethical cognition were significantly positively correlated with behavioral intentions (r = 0.719, 0.690, 0.603, and 0.546, respectively, all P < 0.001). Structural equation model showed that self-efficacy, attitude, ethical cognition, and subjective norms had positive effects on behavioral intentions (ß = 0.402, 0.382, 0.091, and 0.066, respectively, P < 0.01). The total effect of behavioral intentions was influenced by attitude, ethical cognition, self-efficacy, and subjective norms (ß = 0.656, 0.630, 0.402, and 0.157, respectively, P < 0.01). In addition, ethical cognition had a positive mediating effect on behavioral intentions (ß = 0.539, P < 0.001). Conclusion: The study results indicated that attitude, ethical cognition, and self-efficacy were the main factors influencing nurses' behavioral intention. Efforts should be made to improve nurses' attitude and self-efficacy through ethical education and training to increase behavioral intentions to care for patients with infectious diseases, which will improve the quality of nursing care.


Subject(s)
COVID-19 , Nurses , Attitude of Health Personnel , Cross-Sectional Studies , Humans , Intention , Pandemics
17.
Bioinformatics ; 38(13): 3488-3489, 2022 06 27.
Article in English | MEDLINE | ID: mdl-35604082

ABSTRACT

SUMMARY: Integrative analysis of single-cell RNA-sequencing (scRNA-seq) data with spatial data for the same species and organ would provide each cell sample with a predictive spatial location, which would facilitate biological study. However, publicly available spatial sequencing datasets for specific species and organs are rare and are often displayed in different formats. In this study, we introduce a new web-based scRNA-seq analysis tool, webSCST, that integrates well-organized spatial transcriptome sequencing datasets categorized by species and organs, provides a user-friendly interface for raw single-cell processing with popular integration methods and allows users to submit their raw scRNA-seq data once to obtain predicted spatial locations for each cell type. AVAILABILITY AND IMPLEMENTATION: webSCST implemented in shiny with all major browsers supported is available at http://www.webscst.com. webSCST is also freely available as an R package at https://github.com/swsoyee/webSCST.


Subject(s)
Single-Cell Analysis , Transcriptome , Sequence Analysis, RNA , Software , RNA , Gene Expression Profiling/methods
18.
Comput Struct Biotechnol J ; 20: 2020-2028, 2022.
Article in English | MEDLINE | ID: mdl-35521556

ABSTRACT

Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play vital roles in gene expression. Accurate identification of these proteins is crucial. However, there are two existing challenges: one is the problem of ignoring DNA- and RNA-binding proteins (DRBPs), and the other is a cross-predicting problem referring to DBP predictors predicting DBPs as RBPs, and vice versa. In this study, we proposed a computational predictor, called DeepMC-iNABP, with the goal of solving these difficulties by utilizing a multiclass classification strategy and deep learning approaches. DBPs, RBPs, DRBPs and non-NABPs as separate classes of data were used for training the DeepMC-iNABP model. The results on test data collected in this study and two independent test datasets showed that DeepMC-iNABP has a strong advantage in identifying the DRBPs and has the ability to alleviate the cross-prediction problem to a certain extent. The web-server of DeepMC-iNABP is freely available at http://www.deepmc-inabp.net/. The datasets used in this research can also be downloaded from the website.

19.
Proteomics ; 22(8): e2100197, 2022 04.
Article in English | MEDLINE | ID: mdl-35112474

ABSTRACT

With the development of artificial intelligence (AI) technologies and the availability of large amounts of biological data, computational methods for proteomics have undergone a developmental process from traditional machine learning to deep learning. This review focuses on computational approaches and tools for the prediction of protein-DNA/RNA interactions using machine intelligence techniques. We provide an overview of the development progress of computational methods and summarize the advantages and shortcomings of these methods. We further compiled applications in tasks related to the protein-DNA/RNA interactions, and pointed out possible future application trends. Moreover, biological sequence-digitizing representation strategies used in different types of computational methods are also summarized and discussed.


Subject(s)
Artificial Intelligence , Big Data , Machine Learning , Proteomics , RNA
20.
Nucleic Acids Res ; 50(D1): D1123-D1130, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34669946

ABSTRACT

The development of transcriptome-wide association studies (TWAS) has enabled researchers to better identify and interpret causal genes in many diseases. However, there are currently no resources providing a comprehensive listing of gene-disease associations discovered by TWAS from published GWAS summary statistics. TWAS analyses are also difficult to conduct due to the complexity of TWAS software pipelines. To address these issues, we introduce a new resource called webTWAS, which integrates a database of the most comprehensive disease GWAS datasets currently available with credible sets of potential causal genes identified by multiple TWAS software packages. Specifically, a total of 235 064 gene-diseases associations for a wide range of human diseases are prioritized from 1298 high-quality downloadable European GWAS summary statistics. Associations are calculated with seven different statistical models based on three popular and representative TWAS software packages. Users can explore associations at the gene or disease level, and easily search for related studies or diseases using the MeSH disease tree. Since the effects of diseases are highly tissue-specific, webTWAS applies tissue-specific enrichment analysis to identify significant tissues. A user-friendly web server is also available to run custom TWAS analyses on user-provided GWAS summary statistics data. webTWAS is freely available at http://www.webtwas.net.


Subject(s)
Databases, Genetic , Genetic Diseases, Inborn/classification , Genetic Predisposition to Disease , Transcriptome/genetics , Gene Expression Profiling , Genetic Association Studies , Genetic Diseases, Inborn/genetics , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Software
SELECTION OF CITATIONS
SEARCH DETAIL