Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 103
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Proteomics ; : e2300302, 2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38258387

RESUMEN

Small proteins (SPs) are a unique group of proteins that play crucial roles in many important biological processes. Exploring the biological function of SPs is necessary. In this study, the InterPro tool and the maximum correlation method were utilized to analyze functional domains of SPs. The purpose was to identify important functional domains that can indicate the essential differences between small and large protein sequences. First, the small and large proteins were represented by their functional domains via a one-hot scheme. Then, the MaxRel method was adopted to evaluate the relationships between each domain and the target variable, indicating small or large protein. The top 36 domain features were selected for further investigation. Among them, 14 were deemed to be highly related to SPs because they were annotated to SPs more frequently than large proteins. We found the involvement of functional domains, such as ubiquitin-conjugating enzyme/RWD-like, nuclear transport factor 2 domain, and alpha subunit of guanine nucleotide-binding protein (G-protein) in regulating the biological function of SPs. The involvement of these domains has been confirmed by other recent studies. Our findings indicate that protein functional domains may regulate small protein-related functions and predict their biological activity.

2.
Biochem Genet ; 2024 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-38383836

RESUMEN

Breast cancer remains the most prevalent cancer in women. To date, its underlying molecular mechanisms have not been fully uncovered. The determination of gene factors is important to improve our understanding on breast cancer, which can correlate the specific gene expression and tumor staging. However, the knowledge in this regard is still far from complete. Thus, this study aimed to explore these knowledge gaps by analyzing existing gene expression profile data from 3149 breast cancer samples, where each sample was represented by the expression of 19,644 genes and classified into Nottingham histological grade (NHG) classes (Grade 1, 2, and 3). To this end, a machine learning-based framework was designed. First, the profile data were analyzed by using seven feature ranking algorithms to evaluate the importance of features (genes). Seven feature lists were generated, each of which sorted features in accordance with feature importance evaluated from a special aspect. Then, the incremental feature selection method was applied to each list to determine essential features for classification and building efficient classifiers. Consequently, overlapping genes, such as AURKA, CBX2, and MYBL2, were deemed as potentially related to breast cancer malignancy and prognosis, indicating that such genes were identified to be important by multiple feature ranking algorithms. In addition, the study formulated classification rules to reflect special gene expression patterns for three NHG classes. Some genes and rules were analyzed and supported by recent literature, providing new references for studying breast cancer.

3.
BMC Geriatr ; 23(1): 382, 2023 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-37344765

RESUMEN

BACKGROUND AND OBJECTIVE: The pathogenesis and pathophysiology of idiopathic normal pressure hydrocephalus (iNPH) remain unclear. Homocysteine may reduce the compliance of intracranial arteries and damage the endothelial function of the blood-brain barrier (BBB), which may be the underlying mechanism of iNPH. The overlap cases between deep perforating arteriopathy (DPA) and iNPH were not rare for the shared risk factors. We aimed to investigate the relationship between serum homocysteine and iNPH in DPA. METHODS: A total of 41 DPA patients with iNPH and 49 DPA patients without iNPH were included. Demographic characteristics, vascular risk factors, laboratory results, and neuroimaging data were collected. Multivariable logistic regression analysis was performed to investigate the relationship between serum homocysteine and iNPH in DPA patients. RESULTS: Patients with iNPH had significantly higher homocysteine levels than those without iNPH (median, 16.34 mmol/L versus 14.28 mmol/L; P = 0.002). There was no significant difference in CSVD burden scores between patients with iNPH and patients without iNPH. Univariate logistic regression analysis demonstrated that patients with homocysteine levels in the Tertile3 were more likely to have iNPH than those in the Tertile1 (OR, 4.929; 95% CI, 1.612-15.071; P = 0.005). The association remained significant after multivariable adjustment for potential confounders, including age, male, hypertension, diabetes mellitus, atherosclerotic cardiovascular disease (ASCVD) or hypercholesterolemia, and eGFR level. CONCLUSION: Our study indicated that high serum homocysteine levels were independently associated with iNPH in DPA. However, further research is needed to determine the predictive value of homocysteine and to confirm the underlying mechanism between homocysteine and iNPH.


Asunto(s)
Hidrocéfalo Normotenso , Enfermedades Vasculares , Humanos , Masculino , Hidrocéfalo Normotenso/diagnóstico por imagen , Hidrocéfalo Normotenso/complicaciones , Estudios Transversales , Enfermedades Vasculares/complicaciones , Factores de Riesgo , Neuroimagen
4.
Int J Mol Sci ; 20(9)2019 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-31052553

RESUMEN

Small nucleolar RNAs (snoRNAs) are a new type of functional small RNAs involved in the chemical modifications of rRNAs, tRNAs, and small nuclear RNAs. It is reported that they play important roles in tumorigenesis via various regulatory modes. snoRNAs can both participate in the regulation of methylation and pseudouridylation and regulate the expression pattern of their host genes. This research investigated the expression pattern of snoRNAs in eight major cancer types in TCGA via several machine learning algorithms. The expression levels of snoRNAs were first analyzed by a powerful feature selection method, Monte Carlo feature selection (MCFS). A feature list and some informative features were accessed. Then, the incremental feature selection (IFS) was applied to the feature list to extract optimal features/snoRNAs, which can make the support vector machine (SVM) yield best performance. The discriminative snoRNAs included HBII-52-14, HBII-336, SNORD123, HBII-85-29, HBII-420, U3, HBI-43, SNORD116, SNORA73B, SCARNA4, HBII-85-20, etc., on which the SVM can provide a Matthew's correlation coefficient (MCC) of 0.881 for predicting these eight cancer types. On the other hand, the informative features were fed into the Johnson reducer and repeated incremental pruning to produce error reduction (RIPPER) algorithms to generate classification rules, which can clearly show different snoRNAs expression patterns in different cancer types. The analysis results indicated that extracted discriminative snoRNAs can be important for identifying cancer samples in different types and the expression pattern of snoRNAs in different cancer types can be partly uncovered by quantitative recognition rules.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Aprendizaje Automático , Neoplasias/genética , ARN Nucleolar Pequeño/genética , Algoritmos , Humanos , Método de Montecarlo , Máquina de Vectores de Soporte
5.
J Cell Biochem ; 119(4): 3394-3403, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29130544

RESUMEN

Adult neural stem cells (NSCs) are a group of multi-potent, self-renewing progenitor cells that contribute to the generation of new neurons and oligodendrocytes. Three subtypes of NSCs can be isolated based on the stages of the NSC lineage, including quiescent neural stem cells (qNSCs), activated neural stem cells (aNSCs) and neural progenitor cells (NPCs). Although it is widely accepted that these three groups of NSCs play different roles in the development of the nervous system, their molecular signatures are poorly understood. In this study, we applied the Monte-Carlo Feature Selection (MCFS) method to identify the gene expression signatures, which can yield a Matthews correlation coefficient (MCC) value of 0.918 with a support vector machine evaluated by ten-fold cross-validation. In addition, some classification rules yielded by the MCFS program for distinguishing above three subtypes were reported. Our results not only demonstrate a high classification capacity and subtype-specific gene expression patterns but also quantitatively reflect the pattern of the gene expression levels across the NSC lineage, providing insight into deciphering the molecular basis of NSC differentiation.


Asunto(s)
Astrocitos/citología , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Células-Madre Neurales/clasificación , Algoritmos , Linaje de la Célula , Células Cultivadas , Humanos , Método de Montecarlo , Máquina de Vectores de Soporte
7.
Biochim Biophys Acta ; 1844(1 Pt B): 207-13, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23732562

RESUMEN

Drug-target interaction is a key research topic in drug discovery since correct identification of target proteins of drug candidates can help screen out those with unacceptable toxicities, thereby saving expense. In this study, we developed a novel computational approach to predict drug target groups that may reduce the number of candidate target proteins associated with a query drug. A benchmark dataset, consisting of 3028 drugs assigned within nine categories, was constructed by collecting data from KEGG. The nine categories are (1) G protein-coupled receptors, (2) cytokine receptors, (3) nuclear receptors, (4) ion channels, (5) transporters, (6) enzymes, (7) protein kinases, (8) cellular antigens and (9) pathogens. The proposed method combines the data gleaned from chemical-chemical similarities, chemical-chemical connections and chemical-protein connections to allocate drugs to each of the nine target groups. A jackknife test applied to the training dataset that was constructed from the benchmark dataset, provided an overall correct prediction rate of 87.45%, as compared to 87.79% for the test dataset that was constructed by randomly selecting 10% of samples from the benchmark dataset. These prediction rates are much higher than the 11.11% achieved by random guesswork. These promising results suggest that the proposed method can become a useful tool in identifying drug target groups. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.


Asunto(s)
Bases de Datos de Proteínas , Diseño de Fármacos , Proteínas/química , Receptores Acoplados a Proteínas G/química , Algoritmos , Interacciones Farmacológicas , Humanos , Canales Iónicos/química , Terapia Molecular Dirigida , Receptores Citoplasmáticos y Nucleares/química
8.
Mol Genet Genomics ; 289(3): 489-99, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24448651

RESUMEN

Protein-DNA interactions play important roles in many biological processes. To understand the molecular mechanisms of protein-DNA interaction, it is necessary to identify the DNA-binding sites in DNA-binding proteins. In the last decade, computational approaches have been developed to predict protein-DNA-binding sites based solely on protein sequences. In this study, we developed a novel predictor based on support vector machine algorithm coupled with the maximum relevance minimum redundancy method followed by incremental feature selection. We incorporated not only features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure, solvent accessibility, but also five three-dimensional (3D) structural features calculated from PDB data to predict the protein-DNA interaction sites. Feature analysis showed that 3D structural features indeed contributed to the prediction of DNA-binding site and it was demonstrated that the prediction performance was better with 3D structural features than without them. It was also shown via analysis of features from each site that the features of DNA-binding site itself contribute the most to the prediction. Our prediction method may become a useful tool for identifying the DNA-binding sites and the feature analysis described in this paper may provide useful insights for in-depth investigations into the mechanisms of protein-DNA interaction.


Asunto(s)
Sitios de Unión , Biología Computacional/métodos , Proteínas de Unión al ADN/química , ADN/química , Máquina de Vectores de Soporte , Algoritmos , ADN/metabolismo , Proteínas de Unión al ADN/metabolismo , Conformación Molecular , Unión Proteica , Reproducibilidad de los Resultados
9.
J Biomed Inform ; 48: 130-6, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24486562

RESUMEN

Extracting information from unstructured clinical narratives is valuable for many clinical applications. Although natural Language Processing (NLP) methods have been profoundly studied in electronic medical records (EMR), few studies have explored NLP in extracting information from Chinese clinical narratives. In this study, we report the development and evaluation of extracting tumor-related information from operation notes of hepatic carcinomas which were written in Chinese. Using 86 operation notes manually annotated by physicians as the training set, we explored both rule-based and supervised machine-learning approaches. Evaluating on unseen 29 operation notes, our best approach yielded 69.6% in precision, 58.3% in recall and 63.5% F-score.


Asunto(s)
Inteligencia Artificial , Carcinoma/diagnóstico , Neoplasias Hepáticas/diagnóstico , Procesamiento de Lenguaje Natural , Algoritmos , Carcinoma/patología , China , Simulación por Computador , Sistemas de Computación , Minería de Datos/métodos , Registros Electrónicos de Salud , Humanos , Lenguaje , Neoplasias Hepáticas/patología , Informática Médica/métodos , Programas Informáticos
10.
Life (Basel) ; 14(4)2024 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-38672772

RESUMEN

Smoking significantly elevates the risk of lung diseases such as chronic obstructive pulmonary disease (COPD) and lung cancer. This risk is attributed to the harmful chemicals in tobacco smoke that damage lung tissue and impair lung function. Current research on the impact of smoking on gene expression in specific lung cells is limited. This study addresses this gap by analyzing gene expression profiles at the single-cell level from 43,539 lung endothelial cells, 234,349 lung epithelial cells, 189,843 lung immune cells, and 16,031 lung stromal cells using advanced machine learning techniques. The data, categorized by different lung cell types, were classified into three smoking states: active smoker, former smoker, and never smoker. Each cell sample encompassed 28,024 feature genes. Employing an incremental feature selection method within a computational framework, several specific genes have been identified as potential markers of smoking status in different lung cell types. These include B2M, EEF1A1, and TPT1 in lung endothelial cells; FTL and MT-ATP8 in lung epithelial cells; HLA-B and HLA-C in lung immune cells; and HSP90B1 and LCN2 in lung stroma cells. Additionally, this study developed quantitative rules for representing the gene expression patterns related to smoking. This research highlights the potential of machine learning in oncology, enhancing our molecular understanding of smoking's harm and laying the groundwork for future mechanism-based studies.

11.
Front Biosci (Landmark Ed) ; 29(1): 21, 2024 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-38287832

RESUMEN

BACKGROUND: Autophagy is instrumental in various health conditions, including cancer, aging, and infections. Therefore, examining proteins and compounds associated with autophagy is paramount to understanding cellular biology and the origins of diseases, paving the way for potential therapeutic and disease prediction strategies. However, the complexity of autophagy, its intersection with other cellular pathways, and the challenges in monitoring autophagic activity make the experimental identification of these elements arduous. METHODS: In this study, autophagy-related proteins and chemicals were catalogued on the basis of Human Autophagy-dedicated Database. These entities were mapped to their respective PubChem identifications (IDs) for chemicals and Ensembl IDs for proteins, yielding 563 chemicals and 779 proteins. A network comprising protein-protein, protein-chemical, and chemical-chemical interactions was probed employing the Random-Walk-with-Restart algorithm using the aforementioned proteins and chemicals as seed nodes to unearth additional autophagy-associated proteins and chemicals. Screening tests were performed to exclude proteins and chemicals with minimal autophagy associations. RESULTS: A total of 88 inferred proteins and 50 inferred chemicals of high autophagy relevance were identified. Certain entities, such as the chemical prostaglandin E2 (PGE2), which is recognized for modulating cell death-induced inflammatory responses during pathogen invasion, and the protein G Protein Subunit Alpha I1 (GNAI1), implicated in ether lipid metabolism influencing a range of cellular processes including autophagy, were associated with autophagy. CONCLUSIONS: The discovery of novel autophagy-associated proteins and chemicals is of vital importance because it enhances the understanding of autophagy, provides potential therapeutic targets, and fosters the development of innovative therapeutic strategies and interventions.


Asunto(s)
Neoplasias , Proteínas , Humanos , Autofagia , Algoritmos , Biología Computacional/métodos
12.
Comput Biol Med ; 169: 107883, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38157776

RESUMEN

COVID-19 is hypothesized to exert enduring effects on the immune systems of patients, leading to alterations in immune-related gene expression. This study aimed to scrutinize the persistent implications of SARS-CoV-2 infection on gene expression and its influence on subsequent immune activation responses. We designed a machine learning-based approach to analyze transcriptomic data from both healthy individuals and patients who had recovered from COVID-19. Patients were categorized based on their influenza vaccination status and then compared with healthy controls. The initial sample set encompassed 86 blood samples from healthy controls and 72 blood samples from recuperated COVID-19 patients prior to influenza vaccination. The second sample set included 123 blood samples from healthy controls and 106 blood samples from recovered COVID-19 patients who had been vaccinated against influenza. For each sample, the dataset captured expression levels of 17,060 genes. Above two sample sets were first analyzed by seven feature ranking algorithms, yielding seven feature lists for each dataset. Then, each list was fed into the incremental feature selection method, incorporating three classic classification algorithms, to extract essential genes, classification rules and build efficient classifiers. The genes and rules were analyzed in this study. The main findings included that NEXN and ZNF354A were highly expressed in recovered COVID-19 patients, whereas MKI67 and GZMB were highly expressed in patients with secondary immune activation post-COVID-19 recovery. These pivotal genes could provide valuable insights for future health monitoring of COVID-19 patients and guide the creation of continued treatment regimens.


Asunto(s)
COVID-19 , Gripe Humana , Humanos , SARS-CoV-2 , Vacunación , Aprendizaje Automático
13.
Med Biol Eng Comput ; 62(4): 1031-1048, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38123886

RESUMEN

Post-acute sequelae of COVID-19 (PASC) is a persistent complication of severe acute respiratory syndrome coronavirus 2 infection that includes symptoms, such as fatigue, cognitive impairment, and respiratory distress. These symptoms severely affect the quality of life of patients after their recovery from COVID-19. In this study, a group of machine learning algorithms analyzed the whole blood RNA-seq data from patients with different PASC levels. The purpose of this analysis was to identify the gene markers associated with PASC and the special expression patterns for different PASC levels. By comparing the quality of life of patients after the acute phase of COVID-19 and before the disease, samples in the dataset were divided into three groups, namely, "Better," "The Same," and "Worse." Each patient was represented by the expression levels of 58,929 genes. The machine learning-based workflow included six feature-ranking algorithms, incremental feature selection (IFS), and four classification algorithms. The feature ranking algorithms were in charge of assessing feature importance, whereas IFS with classification algorithms were used to extract essential genes and to construct efficient classifiers and classification rules. The expression of top genes in the results was associated with the immune response to viral infection, which is supported by the published literature. For example, patients with low CCDC18 expression and high CPED1 expression had good quality of life, whereas those with low CDC16 expression had poor quality of life.


Asunto(s)
COVID-19 , Disfunción Cognitiva , Humanos , Calidad de Vida , Algoritmos , Expresión Génica , Progresión de la Enfermedad
14.
Protein J ; 43(3): 477-486, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38436837

RESUMEN

Protein-protein interactions (PPIs) involve the physical or functional contact between two or more proteins. Generally, proteins that can interact with each other always have special relationships. Some previous studies have reported that gene ontology (GO) terms are related to the determination of PPIs, suggesting the special patterns on the GO terms of proteins in PPIs. In this study, we explored the special GO term patterns on human PPIs, trying to uncover the underlying functional mechanism of PPIs. The experimental validated human PPIs were retrieved from STRING database, which were termed as positive samples. Additionally, we randomly paired proteins occurring in positive samples, yielding lots of negative samples. A simple calculation was conducted to count the number of positive samples for each GO term pair, where proteins in samples were annotated by GO terms in the pair individually. The similar number for negative samples was also counted and further adjusted due to the great gap between the numbers of positive and negative samples. The difference of the above two numbers and the relative ratio compared with the number on positive samples were calculated. This ratio provided a precise evaluation of the occurrence of GO term pairs for positive samples and negative samples, indicating the latent GO term patterns for PPIs. Our analysis unveiled several nuclear biological processes, including gene transcription, cell proliferation, and nutrient metabolism, as key biological functions. Interactions between major proliferative or metabolic GO terms consistently correspond with significantly reported PPIs in recent literature.


Asunto(s)
Bases de Datos de Proteínas , Ontología de Genes , Humanos , Mapeo de Interacción de Proteínas/métodos , Proteínas/genética , Proteínas/metabolismo , Proteínas/química , Mapas de Interacción de Proteínas , Biología Computacional/métodos
15.
Mol Genet Genomics ; 288(9): 391-400, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23793388

RESUMEN

Carboxy-terminal α-amidation is a widespread post-translational modification of proteins found widely in vertebrates and invertebrates. The α-amide group is required for full biological activity, since it may render a peptide more hydrophobic and thus better be able to bind to other proteins, preventing ionization of the C-terminus. However, in particular, the C-terminal amidation is very difficult to detect because experimental methods are often labor-intensive, time-consuming and expensive. Therefore, in silico methods may complement due to their high efficiency. In this study, a computational method was developed to predict protein amidation sites, by incorporating the maximum relevance minimum redundancy method and the incremental feature selection method based on the nearest neighbor algorithm. From a total of 735 features, 41 optimal features were selected and were utilized to construct the final predictor. As a result, the predictor achieved an overall Matthews correlation coefficient of 0.8308. Feature analysis showed that PSSM conservation scores and amino acid factors played the most important roles in the α-amidation site prediction. Site-specific feature analyses showed that features derived from the amidation site itself and adjacent sites were most significant. This method presented could be used as an efficient tool to theoretically predict amidated peptides. And the selected features from our study could shed some light on the in-depth understanding of the mechanisms of the amidation modification, providing guidelines for experimental validation.


Asunto(s)
Algoritmos , Procesamiento Proteico-Postraduccional/fisiología , Proteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Estructura Terciaria de Proteína , Proteínas/genética
16.
Clin Chem ; 59(5): 846-9, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23364181

RESUMEN

BACKGROUND: Noninvasive prenatal detection of common fetal aneuploidies with cell-free DNA from maternal plasma has been achieved with high-throughput next-generation sequencing platforms. Turnaround times for previously tested platforms are still unsatisfactory for clinical applications, however, because of the time spent on sequencing. The development of semiconductor sequencing technology has provided a way to shorten overall run times. We studied the feasibility of using semiconductor sequencing technology for the noninvasive detection of fetal aneuploidy. METHODS: Maternal plasma DNA from 13 pregnant women, corresponding to 4 euploid, 6 trisomy 21 (T21), 2 trisomy 18 (T18), and 1 trisomy 13 (T13) pregnancies, were sequenced on the Ion Torrent Personal Genome Machine sequencer platform with 318 chips. The data were analyzed with the T statistic method after correcting for GC bias, and the T value was calculated as an indicator of fetal aneuploidy. RESULTS: We obtained a mean of 3 524 401 high-quality reads per sample, with an efficiency rate of 77.9%. All of the T21, T13, and T18 fetuses could be clearly distinguished from euploid fetuses, and the time spent on library preparation and sequencing was 24 h. CONCLUSIONS: Semiconductor sequencing represents a suitable technology for the noninvasive prenatal detection of fetal aneuploidy. With this platform, sequencing times can be substantially reduced; however, a further larger-scale study is needed to determine the imprecision of noninvasive fetal aneuploidy detection with this system.


Asunto(s)
Trastornos de los Cromosomas/sangre , ADN/química , Feto/patología , Pruebas de Detección del Suero Materno/métodos , Semiconductores , Análisis de Secuencia de ADN/métodos , Trisomía/genética , Trastornos de los Cromosomas/embriología , Cromosomas Humanos Par 13/genética , Cromosomas Humanos Par 18/genética , Cromosomas Humanos Par 21/genética , ADN/sangre , ADN/genética , Estudios de Factibilidad , Femenino , Humanos , Pruebas de Detección del Suero Materno/instrumentación , Embarazo , Análisis de Secuencia de ADN/instrumentación , Trisomía/patología
17.
Life (Basel) ; 13(6)2023 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-37374089

RESUMEN

Phase-separation proteins (PSPs) are a class of proteins that play a role in the process of liquid-liquid phase separation, which is a mechanism that mediates the formation of membranelle compartments in cells. Identifying phase separation proteins and their associated function could provide insights into cellular biology and the development of diseases, such as neurodegenerative diseases and cancer. Here, PSPs and non-PSPs that have been experimentally validated in earlier studies were gathered as positive and negative samples. Each protein's corresponding Gene Ontology (GO) terms were extracted and used to create a 24,907-dimensional binary vector. The purpose was to extract essential GO terms that can describe essential functions of PSPs and build efficient classifiers to identify PSPs with these GO terms at the same time. To this end, the incremental feature selection computational framework and an integrated feature analysis scheme, containing categorical boosting, least absolute shrinkage and selection operator, light gradient-boosting machine, extreme gradient boosting, and permutation feature importance, were used to build efficient classifiers and identify GO terms with classification-related importance. A set of random forest (RF) classifiers with F1 scores over 0.960 were established to distinguish PSPs from non-PSPs. A number of GO terms that are crucial for distinguishing between PSPs and non-PSPs were found, including GO:0003723, which is related to a biological process involving RNA binding; GO:0016020, which is related to membrane formation; and GO:0045202, which is related to the function of synapses. This study offered recommendations for future research aimed at determining the functional roles of PSPs in cellular processes by developing efficient RF classifiers and identifying the representative GO terms related to PSPs.

18.
Biochim Biophys Acta Proteins Proteom ; 1871(3): 140889, 2023 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-36610583

RESUMEN

Metabolic stability of proteins plays a vital role in various dedicated cellular processes. Traditional methods of measuring the metabolic stability are time-consuming and expensive. Therefore, we developed a more efficient computational approach to understand the protein dynamic action mechanisms in biological process networks. In this study, we collected 341 short-lived proteins and 824 non-short-lived proteins from U2OS; 342 short-lived proteins and 821 non-short-lived proteins from HEK293T; 424 short-lived proteins and 1153 non-short-lived proteins from HCT116; and 384 short-lived proteins and 992 non-short-lived proteins from RPE1. The proteins were encoded by GO and KEGG enrichment scores based on the genes and their neighbors in STRING, resulting in 20,681 GO term features and 297 KEGG pathway features. We also incorporated the protein interaction information from STRING into the features and obtained 19,247 node features. Boruta and mRMR methods were used for feature filtering, and IFS method was used to obtain the best feature subsets and create the models with the highest performance. The present study identified 42 features that did not appear in previous studies and classified them into eight groups according to their functional annotation. By reviewing the literature, we found that the following three functional groups were critical in determining the stability of proteins: synaptic transmission, post-translational modifications, and cell fate determination. These findings may serve as a valuable reference for developing drugs that target protein stability.


Asunto(s)
Proteínas , Humanos , Ontología de Genes , Células HEK293 , Proteínas/genética , Proteínas/metabolismo , Estabilidad Proteica
19.
Life (Basel) ; 13(6)2023 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-37374086

RESUMEN

Vaccines trigger an immunological response that includes B and T cells, with B cells producing antibodies. SARS-CoV-2 immunity weakens over time after vaccination. Discovering key changes in antigen-reactive antibodies over time after vaccination could help improve vaccine efficiency. In this study, we collected data on blood antibody levels in a cohort of healthcare workers vaccinated for COVID-19 and obtained 73 antigens in samples from four groups according to the duration after vaccination, including 104 unvaccinated healthcare workers, 534 healthcare workers within 60 days after vaccination, 594 healthcare workers between 60 and 180 days after vaccination, and 141 healthcare workers over 180 days after vaccination. Our work was a reanalysis of the data originally collected at Irvine University. This data was obtained in Orange County, California, USA, with the collection process commencing in December 2020. British variant (B.1.1.7), South African variant (B.1.351), and Brazilian/Japanese variant (P.1) were the most prevalent strains during the sampling period. An efficient machine learning based framework containing four feature selection methods (least absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and maximum relevance minimum redundancy) and four classification algorithms (decision tree, k-nearest neighbor, random forest, and support vector machine) was designed to select essential antibodies against specific antigens. Several efficient classifiers with a weighted F1 value around 0.75 were constructed. The antigen microarray used for identifying antibody levels in the coronavirus features ten distinct SARS-CoV-2 antigens, comprising various segments of both nucleocapsid protein (NP) and spike protein (S). This study revealed that S1 + S2, S1.mFcTag, S1.HisTag, S1, S2, Spike.RBD.His.Bac, Spike.RBD.rFc, and S1.RBD.mFc were most highly ranked among all features, where S1 and S2 are the subunits of Spike, and the suffixes represent the tagging information of different recombinant proteins. Meanwhile, the classification rules were obtained from the optimal decision tree to explain quantitatively the roles of antigens in the classification. This study identified antibodies associated with decreased clinical immunity based on populations with different time spans after vaccination. These antibodies have important implications for maintaining long-term immunity to SARS-CoV-2.

20.
Front Immunol ; 14: 1131051, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36936955

RESUMEN

The widely used ChAdOx1 nCoV-19 (ChAd) vector and BNT162b2 (BNT) mRNA vaccines have been shown to induce robust immune responses. Recent studies demonstrated that the immune responses of people who received one dose of ChAdOx1 and one dose of BNT were better than those of people who received vaccines with two homologous ChAdOx1 or two BNT doses. However, how heterologous vaccines function has not been extensively investigated. In this study, single-cell RNA sequencing data from three classes of samples: volunteers vaccinated with heterologous ChAdOx1-BNT and volunteers vaccinated with homologous ChAd-ChAd and BNT-BNT vaccinations after 7 days were divided into three types of immune cells (3654 B, 8212 CD4+ T, and 5608 CD8+ T cells). To identify differences in gene expression in various cell types induced by vaccines administered through different vaccination strategies, multiple advanced feature selection methods (max-relevance and min-redundancy, Monte Carlo feature selection, least absolute shrinkage and selection operator, light gradient boosting machine, and permutation feature importance) and classification algorithms (decision tree and random forest) were integrated into a computational framework. Feature selection methods were in charge of analyzing the importance of gene features, yielding multiple gene lists. These lists were fed into incremental feature selection, incorporating decision tree and random forest, to extract essential genes, classification rules and build efficient classifiers. Highly ranked genes include PLCG2, whose differential expression is important to the B cell immune pathway and is positively correlated with immune cells, such as CD8+ T cells, and B2M, which is associated with thymic T cell differentiation. This study gave an important contribution to the mechanistic explanation of results showing the stronger immune response of a heterologous ChAdOx1-BNT vaccination schedule than two doses of either BNT or ChAdOx1, offering a theoretical foundation for vaccine modification.


Asunto(s)
Vacuna BNT162 , ChAdOx1 nCoV-19 , Humanos , Vacuna BNT162/inmunología , Linfocitos T CD8-positivos , ChAdOx1 nCoV-19/inmunología , Aprendizaje Automático , COVID-19/prevención & control , Linfocitos T CD4-Positivos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA