Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 240
Filtrar
1.
Front Genet ; 12: 783128, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34804131

RESUMO

Given the limitation of technologies, the subcellular localizations of proteins are difficult to identify. Predicting the subcellular localization and the intercellular distribution patterns of proteins in accordance with their specific biological roles, including validated functions, relationships with other proteins, and even their specific sequence characteristics, is necessary. The computational prediction of protein subcellular localizations can be performed on the basis of the sequence and the functional characteristics. In this study, the protein-protein interaction network, functional annotation of proteins and a group of direct proteins with known subcellular localization were used to construct models. To build efficient models, several powerful machine learning algorithms, including two feature selection methods, four classification algorithms, were employed. Some key proteins and functional terms were discovered, which may provide important contributions for determining protein subcellular locations. Furthermore, some quantitative rules were established to identify the potential subcellular localizations of proteins. As the first prediction model that uses direct protein annotation information (i.e., functional features) and STRING-based protein-protein interaction network (i.e., network features), our computational model can help promote the development of predictive technologies on subcellular localizations and provide a new approach for exploring the protein subcellular localization patterns and their potential biological importance.

2.
Front Cell Dev Biol ; 9: 712931, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34513841

RESUMO

Cancer has been generally defined as a cluster of systematic malignant pathogenesis involving abnormal cell growth. Genetic mutations derived from environmental factors and inherited genetics trigger the initiation and progression of cancers. Although several well-known factors affect cancer, mutation features and rules that affect cancers are relatively unknown due to limited related studies. In this study, a computational investigation on mutation profiles of cancer samples in 27 types was given. These profiles were first analyzed by the Monte Carlo Feature Selection (MCFS) method. A feature list was thus obtained. Then, the incremental feature selection (IFS) method adopted such list to extract essential mutation features related to 27 cancer types, find out 207 mutation rules and construct efficient classifiers. The top 37 mutation features corresponding to different cancer types were discussed. All the qualitatively analyzed gene mutation features contribute to the distinction of different types of cancers, and most of such mutation rules are supported by recent literature. Therefore, our computational investigation could identify potential biomarkers and prediction rules for cancers in the mutation signature level.

3.
Life (Basel) ; 11(9)2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-34575089

RESUMO

Non-small cell lung cancer is a major lethal subtype of epithelial lung cancer, with high morbidity and mortality. The single-cell sequencing technique plays a key role in exploring the pathogenesis of non-small cell lung cancer. We proposed a computational method for distinguishing cell subtypes from the different pathological regions of non-small cell lung cancer on the basis of transcriptomic profiles, including a group of qualitative classification criteria (biomarkers) and various rules. The random forest classifier reached a Matthew's correlation coefficient (MCC) of 0.922 by using 720 features, and the decision tree reached an MCC of 0.786 by using 1880 features. The obtained biomarkers and rules were analyzed in the end of this study.

4.
Front Microbiol ; 12: 711244, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34305880

RESUMO

Type 2 diabetes (T2D) is a systematic chronic metabolic condition with abnormal sugar metabolism dysfunction, and its complications are the most harmful to human beings and may be life-threatening after long-term durations. Considering the high incidence and severity at late stage, researchers have been focusing on the identification of specific biomarkers and potential drug targets for T2D at the genomic, epigenomic, and transcriptomic levels. Microbes participate in the pathogenesis of multiple metabolic diseases including diabetes. However, the related studies are still non-systematic and lack the functional exploration on identified microbes. To fill this gap between gut microbiome and diabetes study, we first introduced eggNOG database and KEGG ORTHOLOGY (KO) database for orthologous (protein/gene) annotation of microbiota. Two datasets with these annotations were employed, which were analyzed by multiple machine-learning models for identifying significant microbiota biomarkers of T2D. The powerful feature selection method, Max-Relevance and Min-Redundancy (mRMR), was first applied to the datasets, resulting in a feature list for each dataset. Then, the list was fed into the incremental feature selection (IFS), incorporating support vector machine (SVM) as the classification algorithm, to extract essential annotations and build efficient classifiers. This study not only revealed potential pathological factors for diabetes at the microbiome level but also provided us new candidates for drug development against diabetes.

5.
Biomed Res Int ; 2021: 9939134, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34307679

RESUMO

COVID-19, a severe respiratory disease caused by a new type of coronavirus SARS-CoV-2, has been spreading all over the world. Patients infected with SARS-CoV-2 may have no pathogenic symptoms, i.e., presymptomatic patients and asymptomatic patients. Both patients could further spread the virus to other susceptible people, thereby making the control of COVID-19 difficult. The two major challenges for COVID-19 diagnosis at present are as follows: (1) patients could share similar symptoms with other respiratory infections, and (2) patients may not have any symptoms but could still spread the virus. Therefore, new biomarkers at different omics levels are required for the large-scale screening and diagnosis of COVID-19. Although some initial analyses could identify a group of candidate gene biomarkers for COVID-19, the previous work still could not identify biomarkers capable for clinical use in COVID-19, which requires disease-specific diagnosis compared with other multiple infectious diseases. As an extension of the previous study, optimized machine learning models were applied in the present study to identify some specific qualitative host biomarkers associated with COVID-19 infection on the basis of a publicly released transcriptomic dataset, which included healthy controls and patients with bacterial infection, influenza, COVID-19, and other kinds of coronavirus. This dataset was first analysed by Boruta, Max-Relevance and Min-Redundancy feature selection methods one by one, resulting in a feature list. This list was fed into the incremental feature selection method, incorporating one of the classification algorithms to extract essential biomarkers and build efficient classifiers and classification rules. The capacity of these findings to distinguish COVID-19 with other similar respiratory infectious diseases at the transcriptomic level was also validated, which may improve the efficacy and accuracy of COVID-19 diagnosis.


Assuntos
Teste para COVID-19/métodos , COVID-19/diagnóstico , COVID-19/genética , Biomarcadores/análise , COVID-19/sangue , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Humanos , Influenza Humana , Aprendizado de Máquina , Programas de Rastreamento/métodos , Modelos Teóricos , Infecções Respiratórias/sangue , Infecções Respiratórias/diagnóstico , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade , Transcriptoma/genética
6.
Life (Basel) ; 11(6)2021 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-34204983

RESUMO

Antifreeze protein (AFP) is a proteinaceous compound with improved antifreeze ability and binding ability to ice to prevent its growth. As a surface-active material, a small number of AFPs have a tremendous influence on the growth of ice. Therefore, identifying novel AFPs is important to understand protein-ice interactions and create novel ice-binding domains. To date, predicting AFPs is difficult due to their low sequence similarity for the ice-binding domain and the lack of common features among different AFPs. Here, a computational engine was developed to predict the features of AFPs and reveal the most important 39 features for AFP identification, such as antifreeze-like/N-acetylneuraminic acid synthase C-terminal, insect AFP motif, C-type lectin-like, and EGF-like domain. With this newly presented computational method, a group of previously confirmed functional AFP motifs was screened out. This study has identified some potential new AFP motifs and contributes to understanding biological antifreeze mechanisms.

7.
Artigo em Inglês | MEDLINE | ID: mdl-33989156

RESUMO

Identifying protein subcellular locations is an important topic in protein function prediction. Interacting proteins may share similar locations. Thus, it is imperative to infer protein subcellular locations by taking protein-protein interactions (PPIs) into account. In this study, we present a network embedding-based method, node2loc, to identify protein subcellular locations. node2loc first learns distributed embeddings of proteins in a protein-protein interaction (PPI) network using node2vec. Then the learned embeddings are further fed into a recurrent neural network (RNN). To resolve the severe class imbalance of different subcellular locations, Synthetic Minority Over-sampling Technique (SMOTE) is applied to artificially synthesize proteins for minority classes. node2loc is evaluated on our constructed human benchmark dataset with 16 subcellular locations and yields a Matthews correlation coefficient (MCC) value of 0.800, which is superior to baseline methods. In addition, node2loc yields a better performance on a Yeast benchmark dataset with 17 locations. The results demonstrate that the learned representations from a PPI network have certain discriminative ability for classifying protein subcellular locations. However, node2loc is a transductive method, it only works for proteins connected in a PPI network, and it needs to be retrained for new proteins. In addition, the PPI network needs be annotated to some extent with location information.

8.
PLoS One ; 16(4): e0250032, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33886611

RESUMO

Pregnancy is a complicated and long procedure during one or more offspring development inside a woman. A short period of oxygen shortage after birth is quite normal for most babies and does not threaten their health. However, if babies have to suffer from a long period of oxygen shortage, then this condition is an indication of pathological fetal intolerance, which probably causes their death. The identification of the pathological fetal intolerance from the physical oxygen shortage is one of the important clinical problems in obstetrics for a long time. The clinical syndromes typically manifest five symptoms that indicate that the baby may suffer from fetal intolerance. At present, liquid biopsy combined with high-throughput sequencing or mass spectrum techniques provides a quick approach to detect real-time alteration in the peripheral blood at multiple levels with the rapid development of molecule sequencing technologies. Gene methylation is functionally correlated with gene expression; thus, the combination of gene methylation and expression information would help in screening out the key regulators for the pathogenesis of fetal intolerance. We combined gene methylation and expression features together and screened out the optimal features, including gene expression or methylation signatures, for fetal intolerance prediction for the first time. In addition, we applied various computational methods to construct a comprehensive computational pipeline to identify the potential biomarkers for fetal intolerance dependent on the liquid biopsy samples. We set up qualitative and quantitative computational models for the prediction for fetal intolerance during pregnancy. Moreover, we provided a new prospective for the detailed pathological mechanism of fetal intolerance. This work can provide a solid foundation for further experimental research and contribute to the application of liquid biopsy in antenatal care.


Assuntos
Metilação de DNA , Hipóxia/genética , Bases de Dados Factuais , Feminino , Humanos , Gravidez , Cuidado Pré-Natal , Fatores de Risco
9.
Mol Genet Genomics ; 296(4): 905-918, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33914130

RESUMO

Phenotype is one of the most significant concepts in genetics, which is used to describe all the characteristics of a research object that can be observed. Considering that phenotype reflects the integrated features of genotype and environment factors, it is hard to define phenotype characteristics, even difficult to predict unknown phenotypes. Restricted by current biological techniques, it is still quite expensive and time-consuming to obtain sufficient structural information of large-scale phenotype-associated genes/proteins. Various bioinformatics methods have been presented to solve such problem, and researchers have confirmed the efficacy and prediction accuracy of functional network-based prediction. But general functional descriptions have highly complicated inner structures for phenotype prediction. To further address this issue and improve the efficacy of phenotype prediction on more than ten kinds of phenotypes, we first extract functional enrichment features from GO and KEGG, and then use node2vec to learn functional embedding features of genes from a gene-gene network. All these features are analyzed by some feature selection methods (Boruta, minimum redundancy maximum relevance) to generate a feature list. Such list is fed into the incremental feature selection, incorporating some multi-label classifiers built by RAkEL and some classic base classifiers, to build an optimum multi-label multi-class classification model for phenotype prediction. According to recent researches, our method has indeed identified many literature-supported genes/proteins and their associated phenotypes, and even some candidate genes with re-assigned new phenotypes, which provide a new computational tool for the accurate and effective phenotypic prediction.


Assuntos
Algoritmos , Biologia Computacional/métodos , Estudos de Associação Genética/métodos , Conjuntos de Dados como Assunto , Redes Reguladoras de Genes/fisiologia , Redes e Vias Metabólicas/genética , Fenótipo , Proteínas/química , Proteínas/genética , Proteínas/fisiologia , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiologia , Relação Estrutura-Atividade
10.
Front Genet ; 12: 651610, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33767734

RESUMO

Cancer is one of the most threatening diseases to humans. It can invade multiple significant organs, including lung, liver, stomach, pancreas, and even brain. The identification of cancer biomarkers is one of the most significant components of cancer studies as the foundation of clinical cancer diagnosis and related drug development. During the large-scale screening for cancer prevention and early diagnosis, obtaining cancer-related tissues is impossible. Thus, the identification of cancer-associated circulating biomarkers from liquid biopsy targeting has been proposed and has become the most important direction for research on clinical cancer diagnosis. Here, we analyzed pan-cancer extracellular microRNA profiles by using multiple machine-learning models. The extracellular microRNA profiles on 11 cancer types and non-cancer were first analyzed by Boruta to extract important microRNAs. Selected microRNAs were then evaluated by the Max-Relevance and Min-Redundancy feature selection method, resulting in a feature list, which were fed into the incremental feature selection method to identify candidate circulating extracellular microRNA for cancer recognition and classification. A series of quantitative classification rules was also established for such cancer classification, thereby providing a solid research foundation for further biomarker exploration and functional analyses of tumorigenesis at the level of circulating extracellular microRNA.

11.
Biochim Biophys Acta Proteins Proteom ; 1869(6): 140621, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33561576

RESUMO

Protein-protein interactions (PPIs) describe the direct physical contact of two proteins that usually results in specific biological functions or regulatory processes. The characterization and study of PPIs through the investigation of their pattern and principle have remained a question in biological studies. Various experimental and computational methods have been used for PPI studies, but most of them are based on the sequence similarity with current validated PPI participators or cellular localization patterns. Most methods ignore the fact that PPIs are defined by their specific biological functions. In this study, we constructed a novel rule-based computational method using gene ontology and KEGG pathway annotation of PPI participators that correspond to the complicated biological effects of PPIs. Our newly presented computational method identified a group of biological functions that are tightly associated with PPIs and provided a new function-based tool for PPI studies in a rule manner.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Bases de Dados de Proteínas , Árvores de Decisões , Ontologia Genética , Humanos
12.
Front Mol Biosci ; 7: 604794, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33330634

RESUMO

Cancer can be generally defined as a cluster of systematic diseases triggered by abnormal cell proliferation and growth. With the development of biological sciences and biotechnologies, the etiology of cancer is partially revealed, including some of the most substantial pathogenic factors [either endogenous (genetics) or exogenous (environmental)]. However, some remaining factors that contribute to the tumorigenesis but have not been analyzed and discussed in detail remain. For instance, some typical correlations between microorganisms and tumorigenesis have been reported already, but previous studies are just sporadic studies on single microorganism-cancer subtype pairs and do not explain and validate the specific contribution of microbiome on tumorigenesis. On the basis of the systematic microbiome analyses of blood and cancer-associated tissues in cancer patients/controls in public domain, we performed interpretable analyses. We identified several core regulatory microorganisms that contribute to the classification of multiple tumor subtypes and established quantitative predictive models for interpretable prediction by using multiple machine learning methods. We also compared the optimal features (microorganisms) and rules identified from microbiome profiles processed using the Kraken and the SHOGUN. Collectively, our study identified new microbiome signatures and their interpretable classification rules for cancer discrimination and carried out reliable methodological comparison for robust cancer microbiome analyses, thereby promoting the development of tumor etiology at the microbiome level.

13.
Front Genet ; 11: 604336, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33329750

RESUMO

Glioblastoma, also called glioblastoma multiform (GBM), is the most aggressive cancer that initiates within the brain. GBM is produced in the central nervous system. Cancer cells in GBM are similar to stem cells. Several different schemes for GBM stratification exist. These schemes are based on intertumoral molecular heterogeneity, preoperative images, and integrated tumor characteristics. Although the formation of glioblastoma is remarkably related to gene methylation, GBM has been poorly classified by epigenetics. To classify glioblastoma subtypes on the basis of different degrees of genes' methylation, we adopted several powerful machine learning algorithms to identify numerous methylation features (sites) associated with the classification of GBM. The features were first analyzed by an excellent feature selection method, Monte Carlo feature selection (MCFS), resulting in a feature list. Then, such list was fed into the incremental feature selection (IFS), incorporating one classification algorithm, to extract essential sites. These sites can be annotated onto coding genes, such as CXCR4, TBX18, SP5, and TMEM22, and enriched in relevant biological functions related to GBM classification (e.g., subtype-specific functions). Representative functions, such as nervous system development, intrinsic plasma membrane component, calcium ion binding, systemic lupus erythematosus, and alcoholism, are potential pathogenic functions that participate in the initiation and progression of glioblastoma and its subtypes. With these sites, an efficient model can be built to classify the subtypes of glioblastoma.

14.
Genomics ; 112(6): 4945-4958, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32919019

RESUMO

Coronary artery disease (CAD) is the most common cardiovascular disease. CAD research has greatly progressed during the past decade. mRNA is a traditional and popular pipeline to investigate various disease, including CAD. Compared with mRNA, lncRNA has better stability and thus may serve as a better disease indicator in blood. Investigating potential CAD-related lncRNAs and mRNAs will greatly contribute to the diagnosis and treatment of CAD. In this study, a computational analysis was conducted on patients with CAD by using a comprehensive transcription dataset with combined mRNA and lncRNA expression data. Several machine learning algorithms, including feature selection methods and classification algorithms, were applied to screen for the most CAD-related RNA molecules. Decision rules were also reported to provide a quantitative description about the effect of these RNA molecules on CAD progression. These new findings (CAD-related RNA molecules and rules) can help understand mRNA and lncRNA expression levels in CAD.


Assuntos
Doença da Artéria Coronariana/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro/metabolismo , Doença da Artéria Coronariana/metabolismo , Perfilação da Expressão Gênica , Humanos , Aprendizado de Máquina
15.
Artigo em Inglês | MEDLINE | ID: mdl-32766217

RESUMO

Protein is one of the most significant components of all living creatures. All significant and essential biological structures and functions relies on proteins and their respective biological functions. However, proteins cannot perform their unique biological significance independently. They have to interact with each other to realize the complicated biological processes in all living creatures including human beings. In other words, proteins depend on interactions (protein-protein interactions) to realize their significant effects. Thus, the significance comparison and quantitative contribution of candidate PPI features must be determined urgently. According to previous studies, 258 physical and chemical characteristics of proteins have been reported and confirmed to definitively affect the interaction efficiency of the related proteins. Among such features, essential physiochemical features of proteins like stoichiometric balance, protein abundance, molecular weight and charge distribution have been validated to be quite significant and irreplaceable for protein-protein interactions (PPIs). Therefore, in this study, we, on one hand, presented a novel computational framework to identify the key factors affecting PPIs with Boruta feature selection (BFS), Monte Carlo feature selection (MCFS), incremental feature selection (IFS), and on the other hand, built a quantitative decision-rule system to evaluate the potential PPIs under real conditions with random forest (RF) and RIPPER algorithms, thereby supplying several new insights into the detailed biological mechanisms of complicated PPIs. The main datasets and codes can be downloaded at https://github.com/xypan1232/Mass-PPI.

16.
Biomed Res Int ; 2020: 4256301, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32685484

RESUMO

Coronaviruses are specific crown-shaped viruses that were first identified in the 1960s, and three typical examples of the most recent coronavirus disease outbreaks include severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and COVID-19. Particularly, COVID-19 is currently causing a worldwide pandemic, threatening the health of human beings globally. The identification of viral pathogenic mechanisms is important for further developing effective drugs and targeted clinical treatment methods. The delayed revelation of viral infectious mechanisms is currently one of the technical obstacles in the prevention and treatment of infectious diseases. In this study, we proposed a random walk model to identify the potential pathological mechanisms of COVID-19 on a virus-human protein interaction network, and we effectively identified a group of proteins that have already been determined to be potentially important for COVID-19 infection and for similar SARS infections, which help further developing drugs and targeted therapeutic methods against COVID-19. Moreover, we constructed a standard computational workflow for predicting the pathological biomarkers and related pharmacological targets of infectious diseases.


Assuntos
Infecções por Coronavirus/genética , Pneumonia Viral/genética , Betacoronavirus/isolamento & purificação , Biomarcadores/análise , COVID-19 , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/virologia , Humanos , Modelos Genéticos , Pandemias , Pneumonia Viral/diagnóstico , Pneumonia Viral/virologia , Mapas de Interação de Proteínas , SARS-CoV-2 , Síndrome Respiratória Aguda Grave/diagnóstico , Síndrome Respiratória Aguda Grave/genética , Síndrome Respiratória Aguda Grave/virologia
17.
Biomed Res Int ; 2020: 6384120, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32626751

RESUMO

Among various risk factors for the initiation and progression of cancer, alternative polyadenylation (APA) is a remarkable endogenous contributor that directly triggers the malignant phenotype of cancer cells. APA affects biological processes at a transcriptional level in various ways. As such, APA can be involved in tumorigenesis through gene expression, protein subcellular localization, or transcription splicing pattern. The APA sites and status of different cancer types may have diverse modification patterns and regulatory mechanisms on transcripts. Potential APA sites were screened by applying several machine learning algorithms on a TCGA-APA dataset. First, a powerful feature selection method, minimum redundancy maximum relevancy, was applied on the dataset, resulting in a feature list. Then, the feature list was fed into the incremental feature selection, which incorporated the support vector machine as the classification algorithm, to extract key APA features and build a classifier. The classifier can classify cancer patients into cancer types with perfect performance. The key APA-modified genes had a potential prognosis ability because of their significant power in the survival analysis of TCGA pan-cancer data.


Assuntos
Carcinogênese/genética , Regulação Neoplásica da Expressão Gênica/genética , Neoplasias , Poliadenilação/genética , Processamento Pós-Transcricional do RNA/genética , Algoritmos , Biologia Computacional , Bases de Dados Genéticas , Humanos , Aprendizado de Máquina , Neoplasias/classificação , Neoplasias/genética , Neoplasias/mortalidade , Neoplasias/patologia , Máquina de Vetores de Suporte
18.
Artigo em Inglês | MEDLINE | ID: mdl-32528944

RESUMO

DNA methylation is an essential epigenetic modification for multiple biological processes. DNA methylation in mammals acts as an epigenetic mark of transcriptional repression. Aberrant levels of DNA methylation can be observed in various types of tumor cells. Thus, DNA methylation has attracted considerable attention among researchers to provide new and feasible tumor therapies. Conventional studies considered single-gene methylation or specific loci as biomarkers for tumorigenesis. However, genome-scale methylated modification has not been completely investigated. Thus, we proposed and compared two novel computational approaches based on multiple machine learning algorithms for the qualitative and quantitative analyses of methylation-associated genes and their dys-methylated patterns. This study contributes to the identification of novel effective genes and the establishment of optimal quantitative rules for aberrant methylation distinguishing tumor cells with different origin tissues.

19.
Biochim Biophys Acta Proteins Proteom ; 1868(10): 140477, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32593761

RESUMO

The subcellular location of a protein is highly related to its function. Identifying the location of a given protein is an essential step for investigating its related problems. Traditional experimental methods can produce solid determination. However, their limitations, such as high cost and low efficiency, are evident. Computational methods provide an alternative means to address these problems. Most previous methods constantly extract features from protein sequences or structures for building prediction models. In this study, we use two types of features and combine them to construct the model. The first feature type is extracted from a protein-protein interaction network to abstract the relationship between the encoded protein and other proteins. The second type is obtained from gene ontology and biological pathways to indicate the existing functions of the encoded protein. These features are analyzed using some feature selection methods. The final optimum features are adopted to build the model with recurrent neural network as the classification algorithm. Such model yields good performance with Matthews correlation coefficient of 0.844. A decision tree is used as a rule learning classifier to extract decision rules. Although the performance of decision rules is poor, they are valuable in revealing the molecular mechanism of proteins with different subcellular locations. The final analysis confirms the reliability of the extracted rules. The source code of the propose method is freely available at https://github.com/xypan1232/rnnloc.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , Transporte Proteico , Proteínas/metabolismo , Algoritmos , Árvores de Decisões , Ontologia Genética , Espaço Intracelular/metabolismo , Ligação Proteica , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas/genética
20.
Artigo em Inglês | MEDLINE | ID: mdl-32411685

RESUMO

Single-cell sequencing technologies have emerged to address new and longstanding biological and biomedical questions. Previous studies focused on the analysis of bulk tissue samples composed of millions of cells. However, the genomes within the cells of an individual multicellular organism are not always the same. In this study, we aimed to identify the crucial and characteristically expressed genes that may play functional roles in tissue development and organogenesis, by analyzing a single-cell transcriptomic atlas of mice. We identified the most relevant gene features and decision rules classifying 18 cell categories, providing a list of genes that may perform important functions in the process of tissue development because of their tissue-specific expression patterns. These genes may serve as biomarkers to identify the origin of unknown cell subgroups so as to recognize specific cell stages/states during the dynamic process, and also be applied as potential therapy targets for developmental disorders.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...