Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
1.
Proteomics ; 24(16): e2300302, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38258387

ABSTRACT

Small proteins (SPs) are a unique group of proteins that play crucial roles in many important biological processes. Exploring the biological function of SPs is necessary. In this study, the InterPro tool and the maximum correlation method were utilized to analyze functional domains of SPs. The purpose was to identify important functional domains that can indicate the essential differences between small and large protein sequences. First, the small and large proteins were represented by their functional domains via a one-hot scheme. Then, the MaxRel method was adopted to evaluate the relationships between each domain and the target variable, indicating small or large protein. The top 36 domain features were selected for further investigation. Among them, 14 were deemed to be highly related to SPs because they were annotated to SPs more frequently than large proteins. We found the involvement of functional domains, such as ubiquitin-conjugating enzyme/RWD-like, nuclear transport factor 2 domain, and alpha subunit of guanine nucleotide-binding protein (G-protein) in regulating the biological function of SPs. The involvement of these domains has been confirmed by other recent studies. Our findings indicate that protein functional domains may regulate small protein-related functions and predict their biological activity.


Subject(s)
Machine Learning , Protein Domains , Proteins/chemistry , Proteins/metabolism , Humans , Databases, Protein , Computational Biology/methods
2.
Comput Biol Med ; 175: 108495, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38697003

ABSTRACT

Allergic rhinitis is a common allergic disease with a complex pathogenesis and many unresolved issues. Studies have shown that the incidence of allergic rhinitis is closely related to genetic factors, and research on the related genes could help further understand its pathogenesis and develop new treatment methods. In this study, 446 allergic rhinitis-related genes were obtained on the basis of the DisGeNET database. The protein-protein interaction network was searched using the random-walk-with-restart algorithm with these 446 genes as seed nodes to assess the linkages between other genes and allergic rhinitis. Then, this result was further examined by three screening tests, including permutation, interaction, and enrichment tests, which aimed to pick up genes that have strong and special associations with allergic rhinitis. 52 novel genes were finally obtained. The functional enrichment test confirmed their relationships to the biological processes and pathways related to allergic rhinitis. Furthermore, some genes were extensively analyzed to uncover their special or latent associations to allergic rhinitis, including IRAK2 and MAPK, which are involved in the pathogenesis of allergic rhinitis and the inhibition of allergic inflammation via the p38-MAPK pathway, respectively. The new found genes may help the following investigations for understanding the underlying molecular mechanisms of allergic rhinitis and developing effective treatments.


Subject(s)
Protein Interaction Maps , Rhinitis, Allergic , Humans , Rhinitis, Allergic/genetics , Protein Interaction Maps/genetics , Databases, Genetic , Algorithms , Computational Biology/methods , Gene Regulatory Networks
3.
Protein J ; 43(5): 983-996, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39243320

ABSTRACT

Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.


Subject(s)
Escherichia coli Proteins , Escherichia coli , Machine Learning , Solubility , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli/chemistry , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/metabolism , Escherichia coli Proteins/genetics , Support Vector Machine , Algorithms
4.
Front Biosci (Landmark Ed) ; 29(1): 21, 2024 01 17.
Article in English | MEDLINE | ID: mdl-38287832

ABSTRACT

BACKGROUND: Autophagy is instrumental in various health conditions, including cancer, aging, and infections. Therefore, examining proteins and compounds associated with autophagy is paramount to understanding cellular biology and the origins of diseases, paving the way for potential therapeutic and disease prediction strategies. However, the complexity of autophagy, its intersection with other cellular pathways, and the challenges in monitoring autophagic activity make the experimental identification of these elements arduous. METHODS: In this study, autophagy-related proteins and chemicals were catalogued on the basis of Human Autophagy-dedicated Database. These entities were mapped to their respective PubChem identifications (IDs) for chemicals and Ensembl IDs for proteins, yielding 563 chemicals and 779 proteins. A network comprising protein-protein, protein-chemical, and chemical-chemical interactions was probed employing the Random-Walk-with-Restart algorithm using the aforementioned proteins and chemicals as seed nodes to unearth additional autophagy-associated proteins and chemicals. Screening tests were performed to exclude proteins and chemicals with minimal autophagy associations. RESULTS: A total of 88 inferred proteins and 50 inferred chemicals of high autophagy relevance were identified. Certain entities, such as the chemical prostaglandin E2 (PGE2), which is recognized for modulating cell death-induced inflammatory responses during pathogen invasion, and the protein G Protein Subunit Alpha I1 (GNAI1), implicated in ether lipid metabolism influencing a range of cellular processes including autophagy, were associated with autophagy. CONCLUSIONS: The discovery of novel autophagy-associated proteins and chemicals is of vital importance because it enhances the understanding of autophagy, provides potential therapeutic targets, and fosters the development of innovative therapeutic strategies and interventions.


Subject(s)
Neoplasms , Proteins , Humans , Autophagy , Algorithms , Computational Biology/methods
5.
Protein J ; 43(3): 477-486, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38436837

ABSTRACT

Protein-protein interactions (PPIs) involve the physical or functional contact between two or more proteins. Generally, proteins that can interact with each other always have special relationships. Some previous studies have reported that gene ontology (GO) terms are related to the determination of PPIs, suggesting the special patterns on the GO terms of proteins in PPIs. In this study, we explored the special GO term patterns on human PPIs, trying to uncover the underlying functional mechanism of PPIs. The experimental validated human PPIs were retrieved from STRING database, which were termed as positive samples. Additionally, we randomly paired proteins occurring in positive samples, yielding lots of negative samples. A simple calculation was conducted to count the number of positive samples for each GO term pair, where proteins in samples were annotated by GO terms in the pair individually. The similar number for negative samples was also counted and further adjusted due to the great gap between the numbers of positive and negative samples. The difference of the above two numbers and the relative ratio compared with the number on positive samples were calculated. This ratio provided a precise evaluation of the occurrence of GO term pairs for positive samples and negative samples, indicating the latent GO term patterns for PPIs. Our analysis unveiled several nuclear biological processes, including gene transcription, cell proliferation, and nutrient metabolism, as key biological functions. Interactions between major proliferative or metabolic GO terms consistently correspond with significantly reported PPIs in recent literature.


Subject(s)
Databases, Protein , Gene Ontology , Humans , Protein Interaction Mapping/methods , Proteins/genetics , Proteins/metabolism , Proteins/chemistry , Protein Interaction Maps , Computational Biology/methods
6.
Life (Basel) ; 13(6)2023 May 31.
Article in English | MEDLINE | ID: mdl-37374089

ABSTRACT

Phase-separation proteins (PSPs) are a class of proteins that play a role in the process of liquid-liquid phase separation, which is a mechanism that mediates the formation of membranelle compartments in cells. Identifying phase separation proteins and their associated function could provide insights into cellular biology and the development of diseases, such as neurodegenerative diseases and cancer. Here, PSPs and non-PSPs that have been experimentally validated in earlier studies were gathered as positive and negative samples. Each protein's corresponding Gene Ontology (GO) terms were extracted and used to create a 24,907-dimensional binary vector. The purpose was to extract essential GO terms that can describe essential functions of PSPs and build efficient classifiers to identify PSPs with these GO terms at the same time. To this end, the incremental feature selection computational framework and an integrated feature analysis scheme, containing categorical boosting, least absolute shrinkage and selection operator, light gradient-boosting machine, extreme gradient boosting, and permutation feature importance, were used to build efficient classifiers and identify GO terms with classification-related importance. A set of random forest (RF) classifiers with F1 scores over 0.960 were established to distinguish PSPs from non-PSPs. A number of GO terms that are crucial for distinguishing between PSPs and non-PSPs were found, including GO:0003723, which is related to a biological process involving RNA binding; GO:0016020, which is related to membrane formation; and GO:0045202, which is related to the function of synapses. This study offered recommendations for future research aimed at determining the functional roles of PSPs in cellular processes by developing efficient RF classifiers and identifying the representative GO terms related to PSPs.

7.
Biomed Res Int ; 2023: 5333361, 2023.
Article in English | MEDLINE | ID: mdl-36644165

ABSTRACT

Long-term cigarette smoking causes various human diseases, including respiratory disease, cancer, and gastrointestinal (GI) disorders. Alterations in gene expression and variable splicing processes induced by smoking are associated with the development of diseases. This study applied advanced machine learning methods to identify the isoforms with important roles in distinguishing smokers from former smokers based on the expression profile of isoforms from current and former smokers collected in one previous study. These isoforms were deemed as features, which were first analyzed by the Boruta to select features highly correlated with the target variables. Then, the selected features were evaluated by four feature ranking algorithms, resulting in four feature lists. The incremental feature selection method was applied to each list for obtaining the optimal feature subsets and building high-performance classification models. Furthermore, a series of classification rules were accessed by decision tree with the highest performance. Eventually, the rationality of the mined isoforms (features) and classification rules was verified by reviewing previous research. Features such as isoforms ENST00000464835 (expressed by LRRN3), ENST00000622663 (expressed by SASH1), and ENST00000284311 (expressed by GPR15), and pathways (cytotoxicity mediated by natural killer cell and cytokine-cytokine receptor interaction) revealed by the enrichment analysis, were highly relevant to smoking response, suggesting the robustness of our analysis pipeline.


Subject(s)
Smoking , Transcriptome , Humans , Algorithms , Machine Learning , Receptors, G-Protein-Coupled/genetics , Receptors, Peptide/genetics , Smoking/adverse effects , Smoking/genetics , Transcriptome/genetics
8.
Biochim Biophys Acta Proteins Proteom ; 1871(3): 140889, 2023 05 01.
Article in English | MEDLINE | ID: mdl-36610583

ABSTRACT

Metabolic stability of proteins plays a vital role in various dedicated cellular processes. Traditional methods of measuring the metabolic stability are time-consuming and expensive. Therefore, we developed a more efficient computational approach to understand the protein dynamic action mechanisms in biological process networks. In this study, we collected 341 short-lived proteins and 824 non-short-lived proteins from U2OS; 342 short-lived proteins and 821 non-short-lived proteins from HEK293T; 424 short-lived proteins and 1153 non-short-lived proteins from HCT116; and 384 short-lived proteins and 992 non-short-lived proteins from RPE1. The proteins were encoded by GO and KEGG enrichment scores based on the genes and their neighbors in STRING, resulting in 20,681 GO term features and 297 KEGG pathway features. We also incorporated the protein interaction information from STRING into the features and obtained 19,247 node features. Boruta and mRMR methods were used for feature filtering, and IFS method was used to obtain the best feature subsets and create the models with the highest performance. The present study identified 42 features that did not appear in previous studies and classified them into eight groups according to their functional annotation. By reviewing the literature, we found that the following three functional groups were critical in determining the stability of proteins: synaptic transmission, post-translational modifications, and cell fate determination. These findings may serve as a valuable reference for developing drugs that target protein stability.


Subject(s)
Proteins , Humans , Gene Ontology , HEK293 Cells , Proteins/genetics , Proteins/metabolism , Protein Stability
9.
Front Immunol ; 14: 1131051, 2023.
Article in English | MEDLINE | ID: mdl-36936955

ABSTRACT

The widely used ChAdOx1 nCoV-19 (ChAd) vector and BNT162b2 (BNT) mRNA vaccines have been shown to induce robust immune responses. Recent studies demonstrated that the immune responses of people who received one dose of ChAdOx1 and one dose of BNT were better than those of people who received vaccines with two homologous ChAdOx1 or two BNT doses. However, how heterologous vaccines function has not been extensively investigated. In this study, single-cell RNA sequencing data from three classes of samples: volunteers vaccinated with heterologous ChAdOx1-BNT and volunteers vaccinated with homologous ChAd-ChAd and BNT-BNT vaccinations after 7 days were divided into three types of immune cells (3654 B, 8212 CD4+ T, and 5608 CD8+ T cells). To identify differences in gene expression in various cell types induced by vaccines administered through different vaccination strategies, multiple advanced feature selection methods (max-relevance and min-redundancy, Monte Carlo feature selection, least absolute shrinkage and selection operator, light gradient boosting machine, and permutation feature importance) and classification algorithms (decision tree and random forest) were integrated into a computational framework. Feature selection methods were in charge of analyzing the importance of gene features, yielding multiple gene lists. These lists were fed into incremental feature selection, incorporating decision tree and random forest, to extract essential genes, classification rules and build efficient classifiers. Highly ranked genes include PLCG2, whose differential expression is important to the B cell immune pathway and is positively correlated with immune cells, such as CD8+ T cells, and B2M, which is associated with thymic T cell differentiation. This study gave an important contribution to the mechanistic explanation of results showing the stronger immune response of a heterologous ChAdOx1-BNT vaccination schedule than two doses of either BNT or ChAdOx1, offering a theoretical foundation for vaccine modification.


Subject(s)
BNT162 Vaccine , ChAdOx1 nCoV-19 , Humans , BNT162 Vaccine/immunology , CD8-Positive T-Lymphocytes , ChAdOx1 nCoV-19/immunology , Machine Learning , COVID-19/prevention & control , CD4-Positive T-Lymphocytes
10.
Life (Basel) ; 13(6)2023 May 31.
Article in English | MEDLINE | ID: mdl-37374086

ABSTRACT

Vaccines trigger an immunological response that includes B and T cells, with B cells producing antibodies. SARS-CoV-2 immunity weakens over time after vaccination. Discovering key changes in antigen-reactive antibodies over time after vaccination could help improve vaccine efficiency. In this study, we collected data on blood antibody levels in a cohort of healthcare workers vaccinated for COVID-19 and obtained 73 antigens in samples from four groups according to the duration after vaccination, including 104 unvaccinated healthcare workers, 534 healthcare workers within 60 days after vaccination, 594 healthcare workers between 60 and 180 days after vaccination, and 141 healthcare workers over 180 days after vaccination. Our work was a reanalysis of the data originally collected at Irvine University. This data was obtained in Orange County, California, USA, with the collection process commencing in December 2020. British variant (B.1.1.7), South African variant (B.1.351), and Brazilian/Japanese variant (P.1) were the most prevalent strains during the sampling period. An efficient machine learning based framework containing four feature selection methods (least absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and maximum relevance minimum redundancy) and four classification algorithms (decision tree, k-nearest neighbor, random forest, and support vector machine) was designed to select essential antibodies against specific antigens. Several efficient classifiers with a weighted F1 value around 0.75 were constructed. The antigen microarray used for identifying antibody levels in the coronavirus features ten distinct SARS-CoV-2 antigens, comprising various segments of both nucleocapsid protein (NP) and spike protein (S). This study revealed that S1 + S2, S1.mFcTag, S1.HisTag, S1, S2, Spike.RBD.His.Bac, Spike.RBD.rFc, and S1.RBD.mFc were most highly ranked among all features, where S1 and S2 are the subunits of Spike, and the suffixes represent the tagging information of different recombinant proteins. Meanwhile, the classification rules were obtained from the optimal decision tree to explain quantitatively the roles of antigens in the classification. This study identified antibodies associated with decreased clinical immunity based on populations with different time spans after vaccination. These antibodies have important implications for maintaining long-term immunity to SARS-CoV-2.

11.
Front Genet ; 14: 1145647, 2023.
Article in English | MEDLINE | ID: mdl-36936430

ABSTRACT

Chromatin accessibility is a generic property of the eukaryotic genome, which refers to the degree of physical compaction of chromatin. Recent studies have shown that chromatin accessibility is cell type dependent, indicating chromatin heterogeneity across cell lines and tissues. The identification of markers used to distinguish cell types at the chromosome level is important to understand cell function and classify cell types. In the present study, we investigated transcriptionally active chromosome segments identified by sci-ATAC-seq at single-cell resolution, including 69,015 cells belonging to 77 different cell types. Each cell was represented by existence status on 20,783 genes that were obtained from 436,206 active chromosome segments. The gene features were deeply analyzed by Boruta, resulting in 3897 genes, which were ranked in a list by Monte Carlo feature selection. Such list was further analyzed by incremental feature selection (IFS) method, yielding essential genes, classification rules and an efficient random forest (RF) classifier. To improve the performance of the optimal RF classifier, its features were further processed by autoencoder, light gradient boosting machine and IFS method. The final RF classifier with MCC of 0.838 was constructed. Some marker genes such as H2-Dmb2, which are specifically expressed in antigen-presenting cells (e.g., dendritic cells or macrophages), and Tenm2, which are specifically expressed in T cells, were identified in this study. Our analysis revealed numerous potential epigenetic modification patterns that are unique to particular cell types, thereby advancing knowledge of the critical functions of chromatin accessibility in cell processes.

12.
Environ Sci Pollut Res Int ; 29(10): 14665-14676, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34617224

ABSTRACT

This work investigates the relationship of financial development with energy efficiency and economic growth. Due to the coexistence of economic expansion, trade openness, financial development, and urbanization in Indonesia and Turkey, these two countries are considered. Johansen cointegration, error correction, and Granger causality tests are applied to validate the predicted effects of economic activity on the environment. Results show a long-term relationship of Indonesia's CO2 emissions with five out of six macroeconomic factors, except for urbanization, which has a detrimental effect on carbon emissions. On the other hand, no cointegration across variables is found in the case of Turkey. However, unidirectional causality is observed from energy consumption and economic growth to economic growth. Furthermore, economic growth, energy consumption, and trade openness have a two-way causal effect on financial development. This work encourages Turkish and Indonesian policymakers and regulators to strengthen environmental laws. It also encourages other economies and governments to conduct similar analyses and determine the best course of action.


Subject(s)
Carbon Dioxide , Conservation of Energy Resources , Carbon , Economic Development , Urbanization
13.
Biomed Res Int ; 2022: 2516653, 2022.
Article in English | MEDLINE | ID: mdl-36004205

ABSTRACT

The cell cycle is composed of a series of ordered, highly regulated processes through which a cell grows and duplicates its genome and eventually divides into two daughter cells. According to the complex changes in cell structure and biosynthesis, the cell cycle is divided into four phases: gap 1 (G1), DNA synthesis (S), gap 2 (G2), and mitosis (M). Determining which cell cycle phases a cell is in is critical to the research of cancer development and pharmacy for targeting cell cycle. However, current detection methods have the following problems: (1) they are complicated and time consuming to perform, and (2) they cannot detect the cell cycle on a large scale. Rapid developments in single-cell technology have made dissecting cells on a large scale possible with unprecedented resolution. In the present research, we construct efficient classifiers and identify essential gene biomarkers based on single-cell RNA sequencing data through Boruta and three feature ranking algorithms (e.g., mRMR, MCFS, and SHAP by LightGBM) by utilizing four advanced classification algorithms. Meanwhile, we mine a series of classification rules that can distinguish different cell cycle phases. Collectively, we have provided a novel method for determining the cell cycle and identified new potential cell cycle-related genes, thereby contributing to the understanding of the processes that regulate the cell cycle.


Subject(s)
Machine Learning , Mitosis , Biomarkers , Cell Cycle/genetics , Humans , Mitosis/genetics , RNA-Seq
14.
Front Bioeng Biotechnol ; 10: 916309, 2022.
Article in English | MEDLINE | ID: mdl-35706505

ABSTRACT

Cell transplantation is an effective method for compensating for the loss of liver function and improve patient survival. However, given that hepatocytes cultivated in vitro have diverse developmental processes and physiological features, obtaining hepatocytes that can properly function in vivo is difficult. In the present study, we present an advanced computational analysis on single-cell transcriptional profiling to resolve the heterogeneity of the hepatocyte differentiation process in vitro and to mine biomarkers at different periods of differentiation. We obtained a batch of compressed and effective classification features with the Boruta method and ranked them using the Max-Relevance and Min-Redundancy method. Some key genes were identified during the in vitro culture of hepatocytes, including CD147, which not only regulates terminally differentiated cells in the liver but also affects cell differentiation. PPIA, which encodes a CD147 ligand, also appeared in the identified gene list, and the combination of the two proteins mediated multiple biological pathways. Other genes, such as TMSB10, TMEM176B, and CD63, which are involved in the maturation and differentiation of hepatocytes and assist different hepatic cell types in performing their roles were also identified. Then, several classifiers were trained and evaluated to obtain optimal classifiers and optimal feature subsets, using three classification algorithms (random forest, k-nearest neighbor, and decision tree) and the incremental feature selection method. The best random forest classifier with a 0.940 Matthews correlation coefficient was constructed to distinguish different hepatic cell types. Finally, classification rules were created for quantitatively describing hepatic cell types. In summary, This study provided potential targets for cell transplantation associated liver disease treatment strategies by elucidating the process and mechanism of hepatocyte development at both qualitative and quantitative levels.

15.
Environ Sci Pollut Res Int ; 29(16): 23105-23116, 2022 Apr.
Article in English | MEDLINE | ID: mdl-34800272

ABSTRACT

The aim of study is to estimate the role of energy financing for energy retrofit in COVID-19, with the intervening role of green bond financing. For this, Kalman technique is applied to infer the empirical findings. It is found that energy financing is significantly dependent on green bonds, and green bonds have a significant role in energy retrofit in E-7 economies specifically. It is further found that E-7 economies gained significant rise in energy efficiency financing green bonds financing, that has supportively extended energy retrofit - before and during COVID-19 crises. It is further found significant that the E-7 nations have to put alot of money into hydro and nuclear energy for energy retrofit, with low carbon emissions. In the light of COVID-19 crises, this study offers policy recommendations for effective energy management. However, such policy recommendations are expected to finely serve the financial intermediaries and national governments of E-7 economies to better optimize energy financing through green bond financing. The novelty of the study exists in topical framework and research directions, talking about the way forwards for energy efficiency financing - which is one of the latest issue of the recent times. Hence, this research provides some empirical verifications about energy financing in COVID-19 crises for energy retrofit, and shares some suggestions for stakeholders.


Subject(s)
COVID-19 , Nuclear Energy , Carbon , Carbon Dioxide , Economic Development , Efficiency , Humans , Renewable Energy
16.
Front Oncol ; 12: 976262, 2022.
Article in English | MEDLINE | ID: mdl-36033519

ABSTRACT

CD19-targeted CAR T cell immunotherapy has exceptional efficacy for the treatment of B-cell malignancies. B-cell acute lymphocytic leukemia and non-Hodgkin's lymphoma are two common B-cell malignancies with high recurrence rate and are refractory to cure. Although CAR T-cell immunotherapy overcomes the limitations of conventional treatments for such malignancies, failure of treatment and tumor recurrence remain common. In this study, we searched for important methylation signatures to differentiate CAR-transduced and untransduced T cells from patients with acute lymphoblastic leukemia and non-Hodgkin's lymphoma. First, we used three feature ranking methods, namely, Monte Carlo feature selection, light gradient boosting machine, and least absolute shrinkage and selection operator, to rank all methylation features in order of their importance. Then, the incremental feature selection method was adopted to construct efficient classifiers and filter the optimal feature subsets. Some important methylated genes, namely, SERPINB6, ANK1, PDCD5, DAPK2, and DNAJB6, were identified. Furthermore, the classification rules for distinguishing different classes were established, which can precisely describe the role of methylation features in the classification. Overall, we applied advanced machine learning approaches to the high-throughput data, investigating the mechanism of CAR T cells to establish the theoretical foundation for modifying CAR T cells.

17.
Front Mol Biosci ; 9: 952626, 2022.
Article in English | MEDLINE | ID: mdl-35928229

ABSTRACT

Notably, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a tight relationship with the immune system. Human resistance to COVID-19 infection comprises two stages. The first stage is immune defense, while the second stage is extensive inflammation. This process is further divided into innate and adaptive immunity during the immune defense phase. These two stages involve various immune cells, including CD4+ T cells, CD8+ T cells, monocytes, dendritic cells, B cells, and natural killer cells. Various immune cells are involved and make up the complex and unique immune system response to COVID-19, providing characteristics that set it apart from other respiratory infectious diseases. In the present study, we identified cell markers for differentiating COVID-19 from common inflammatory responses, non-COVID-19 severe respiratory diseases, and healthy populations based on single-cell profiling of the gene expression of six immune cell types by using Boruta and mRMR feature selection methods. Some features such as IFI44L in B cells, S100A8 in monocytes, and NCR2 in natural killer cells are involved in the innate immune response of COVID-19. Other features such as ZFP36L2 in CD4+ T cells can regulate the inflammatory process of COVID-19. Subsequently, the IFS method was used to determine the best feature subsets and classifiers in the six immune cell types for two classification algorithms. Furthermore, we established the quantitative rules used to distinguish the disease status. The results of this study can provide theoretical support for a more in-depth investigation of COVID-19 pathogenesis and intervention strategies.

18.
Front Genet ; 13: 1011659, 2022.
Article in English | MEDLINE | ID: mdl-36171880

ABSTRACT

Protein-protein interactions (PPIs) are extremely important for gaining mechanistic insights into the functional organization of the proteome. The resolution of PPI functions can help in the identification of novel diagnostic and therapeutic targets with medical utility, thus facilitating the development of new medications. However, the traditional methods for resolving PPI functions are mainly experimental methods, such as co-immunoprecipitation, pull-down assays, cross-linking, label transfer, and far-Western blot analysis, that are not only expensive but also time-consuming. In this study, we constructed an integrated feature selection scheme for the large-scale selection of the relevant functions of PPIs by using the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotations of PPI participants. First, we encoded the proteins in each PPI with their gene ontologies and KEGG pathways. Then, the encoded protein features were refined as features of both positive and negative PPIs. Subsequently, Boruta was used for the initial filtering of features to obtain 5684 features. Three feature ranking algorithms, namely, least absolute shrinkage and selection operator, light gradient boosting machine, and max-relevance and min-redundancy, were applied to evaluate feature importance. Finally, the top-ranked features derived from multiple datasets were comprehensively evaluated, and the intersection of results mined by three feature ranking algorithms was taken to identify the features with high correlation with PPIs. Some functional terms were identified in our study, including cytokine-cytokine receptor interaction (hsa04060), intrinsic component of membrane (GO:0031224), and protein-binding biological process (GO:0005515). Our newly proposed integrated computational approach offers a novel perspective of the large-scale mining of biological functions linked to PPI.

19.
Front Oncol ; 12: 998032, 2022.
Article in English | MEDLINE | ID: mdl-36249027

ABSTRACT

Cervical and anal carcinoma are neoplastic diseases with various intraepithelial neoplasia stages. The underlying mechanisms for cancer initiation and progression have not been fully revealed. DNA methylation has been shown to be aberrantly regulated during tumorigenesis in anal and cervical carcinoma, revealing the important roles of DNA methylation signaling as a biomarker to distinguish cancer stages in clinics. In this research, several machine learning methods were used to analyze the methylation profiles on anal and cervical carcinoma samples, which were divided into three classes representing various stages of tumor progression. Advanced feature selection methods, including Boruta, LASSO, LightGBM, and MCFS, were used to select methylation features that are highly correlated with cancer progression. Some methylation probes including cg01550828 and its corresponding gene RNF168 have been reported to be associated with human papilloma virus-related anal cancer. As for biomarkers for cervical carcinoma, cg27012396 and its functional gene HDAC4 were confirmed to regulate the glycolysis and survival of hypoxic tumor cells in cervical carcinoma. Furthermore, we developed effective classifiers for identifying various tumor stages and derived classification rules that reflect the quantitative impact of methylation on tumorigenesis. The current study identified methylation signals associated with the development of cervical and anal carcinoma at qualitative and quantitative levels using advanced machine learning methods.

20.
Life (Basel) ; 12(6)2022 May 28.
Article in English | MEDLINE | ID: mdl-35743837

ABSTRACT

SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and nonstructural proteins, and some of these proteins such as spike proteins have been shown to be directly associated with the clinical status of patients with severe COVID-19 pneumonia. In this study, we collected genome-wide mutation information of virulent strains and the severity of COVID-19 pneumonia in patients varying depending on their clinical status. Important protein mutations and untranslated region mutations were extracted using machine learning methods. First, through Boruta and four ranking algorithms (least absolute shrinkage and selection operator, light gradient boosting machine, max-relevance and min-redundancy, and Monte Carlo feature selection), mutations that were highly correlated with the clinical status of the patients were screened out and sorted in four feature lists. Some mutations such as D614G and V1176F were shown to be associated with viral infectivity. Moreover, previously unreported mutations such as A320V of nsp14 and I164ILV of nsp14 were also identified, which suggests their potential roles. We then applied the incremental feature selection method to each feature list to construct efficient classifiers, which can be directly used to distinguish the clinical status of COVID-19 patients. Meanwhile, four sets of quantitative rules were set up, which can help us to more intuitively understand the role of each mutation in differentiating the clinical status of COVID-19 patients. Identified key mutations linked to virologic properties will help better understand the mechanisms of infection and will aid in the development of antiviral treatments.

SELECTION OF CITATIONS
SEARCH DETAIL