Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
1.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36242564

RESUMO

Breast cancer patients often have recurrence and metastasis after surgery. Predicting the risk of recurrence and metastasis for a breast cancer patient is essential for the development of precision treatment. In this study, we proposed a novel multi-modal deep learning prediction model by integrating hematoxylin & eosin (H&E)-stained histopathological images, clinical information and gene expression data. Specifically, we segmented tumor regions in H&E into image blocks (256 × 256 pixels) and encoded each image block into a 1D feature vector using a deep neural network. Then, the attention module scored each area of the H&E-stained images and combined image features with clinical and gene expression data to predict the risk of recurrence and metastasis for each patient. To test the model, we downloaded all 196 breast cancer samples from the Cancer Genome Atlas with clinical, gene expression and H&E information simultaneously available. The samples were then divided into the training and testing sets with a ratio of 7: 3, in which the distributions of the samples were kept between the two datasets by hierarchical sampling. The multi-modal model achieved an area-under-the-curve value of 0.75 on the testing set better than those based solely on H&E image, sequencing data and clinical data, respectively. This study might have clinical significance in identifying high-risk breast cancer patients, who may benefit from postoperative adjuvant treatment.


Assuntos
Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Redes Neurais de Computação , Amarelo de Eosina-(YS) , Expressão Gênica
2.
Brief Bioinform ; 19(3): 361-373, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-28025178

RESUMO

Genomic islands (GIs) that are associated with microbial adaptations and carry sequence patterns different from that of the host are sporadically distributed among closely related species. This bias can dominate the signal of interest in GI detection. However, variations still exist among the segments of the host, although no uniform standard exists regarding the best methods of discriminating GIs from the rest of the genome in terms of compositional bias. In the present work, we proposed a robust software, MTGIpick, which used regions with pattern bias showing multiscale difference levels to identify GIs from the host. MTGIpick can identify GIs from a single genome without annotated information of genomes or prior knowledge from other data sets. When real biological data were used, MTGIpick demonstrated better performance than existing methods, as well as revealed potential GIs with accurate sizes missed by existing methods because of a uniform standard. Software and supplementary are freely available at http://bioinfo.zstu.edu.cn/MTGI or https://github.com/bioinfo0706/MTGIpick.


Assuntos
Genoma Bacteriano , Ilhas Genômicas , Genômica/métodos , Software , Algoritmos , Anotação de Sequência Molecular
3.
BMC Bioinformatics ; 20(Suppl 22): 719, 2019 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-31888447

RESUMO

BACKGROUND: Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted. RESULTS: In this paper, we firstly introduce two kinds of protein sequences encoding schemes: dipeptide information with space and Gapped k-mer information. Then, the Gapped k-mer calculation method which is based on quad-tree is also introduced. CONCLUSIONS: >From the prediction results, this method not only reduces the dimension, but also improves the prediction precision of protein subcellular localization.


Assuntos
Algoritmos , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Frações Subcelulares/metabolismo , Sequência de Aminoácidos , Bases de Dados de Proteínas , Dipeptídeos/química , Máquina de Vetores de Suporte
4.
Int J Mol Sci ; 20(2)2019 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-30641858

RESUMO

As a common malignant tumor disease, thyroid cancer lacks effective preventive and therapeutic drugs. Thus, it is crucial to provide an effective drug selection method for thyroid cancer patients. The connectivity map (CMAP) project provides an experimental validated strategy to repurpose and optimize cancer drugs, the rationale behind which is to select drugs to reverse the gene expression variations induced by cancer. However, it has a few limitations. Firstly, CMAP was performed on cell lines, which are usually different from human tissues. Secondly, only gene expression information was considered, while the information about gene regulations and modules/pathways was more or less ignored. In this study, we first measured comprehensively the perturbations of thyroid cancer on a patient including variations at gene expression level, gene co-expression level and gene module level. After that, we provided a drug selection pipeline to reverse the perturbations based on drug signatures derived from tissue studies. We applied the analyses pipeline to the cancer genome atlas (TCGA) thyroid cancer data consisting of 56 normal and 500 cancer samples. As a result, we obtained 812 up-regulated and 213 down-regulated genes, whose functions are significantly enriched in extracellular matrix and receptor localization to synapses. In addition, a total of 33,778 significant differentiated co-expressed gene pairs were found, which form a larger module associated with impaired immune function and low immunity. Finally, we predicted drugs and gene perturbations that could reverse the gene expression and co-expression changes incurred by the development of thyroid cancer through the Fisher's exact test. Top predicted drugs included validated drugs like baclofen, nevirapine, glucocorticoid, formaldehyde and so on. Combining our analyses with literature mining, we inferred that the regulation of thyroid hormone secretion might be closely related to the inhibition of the proliferation of thyroid cancer cells.


Assuntos
Antineoplásicos/farmacologia , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/efeitos dos fármacos , Neoplasias da Glândula Tireoide/tratamento farmacológico , Antineoplásicos/uso terapêutico , Biologia Computacional , Mineração de Dados , Reposicionamento de Medicamentos , Matriz Extracelular/genética , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Modelos Teóricos , Sinapses/genética , Neoplasias da Glândula Tireoide/genética
5.
Bioinformatics ; 33(20): 3195-3201, 2017 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-28637337

RESUMO

MOTIVATION: Low-rank matrix completion has been demonstrated to be powerful in predicting antigenic distances among influenza viruses and vaccines from partially revealed hemagglutination inhibition table. Meanwhile, influenza hemagglutinin (HA) protein sequences are also effective in inferring antigenic distances. Thus, it is natural to integrate HA protein sequence information into low-rank matrix completion model to help infer influenza antigenicity, which is critical to influenza vaccine development. RESULTS: We have proposed a novel algorithm called biological matrix completion with side information (BMCSI), which first measures HA protein sequence similarities among influenza viruses (especially on epitopes) and then integrates the similarity information into a low-rank matrix completion model to predict influenza antigenicity. This algorithm exploits both the correlations among viruses and vaccines in serological tests and the power of HA sequence in predicting influenza antigenicity. We applied this model into H3N2 seasonal influenza virus data. Comparing to previous methods, we significantly reduced the prediction root-mean-square error in a 10-fold cross validation analysis. Based on the cartographies constructed from imputed data, we showed that the antigenic evolution of H3N2 seasonal influenza is generally S-shaped while the genetic evolution is half-circle shaped. We also showed that the Spearman correlation between genetic and antigenic distances (among antigenic clusters) is 0.83, demonstrating a globally high correspondence and some local discrepancies between influenza genetic and antigenic evolution. Finally, we showed that 4.4%±1.2% genetic variance (corresponding to 3.11 ± 1.08 antigenic distances) caused an antigenic drift event for H3N2 influenza viruses historically. AVAILABILITY AND IMPLEMENTATION: The software and data for this study are available at http://bi.sky.zstu.edu.cn/BMCSI/. CONTACT: jialiang.yang@mssm.edu or pinganhe@zstu.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antígenos Virais , Biologia Computacional/métodos , Variação Genética , Vírus da Influenza A Subtipo H3N2/imunologia , Vacinas contra Influenza , Software , Algoritmos , Epitopos , Evolução Molecular , Testes de Inibição da Hemaglutinação , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , Vírus da Influenza A Subtipo H3N2/genética , Vírus da Influenza A Subtipo H3N2/metabolismo , Modelos Imunológicos , Análise de Sequência de Proteína/métodos
6.
J Theor Biol ; 347: 109-17, 2014 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-24412564

RESUMO

In this paper, a dynamic 3-D graphical representation of protein sequences is introduced based on three physical-chemical properties of amino acids. The coordinates of the graph have direct biological significance, which could reflect the innate structure of the proteins. The information of principal moments of inertia and range of axis coordinate are extracted as a novel mixed descriptor and proposed for the comparison of protein primary sequences. Meanwhile, the Euclidean distance of the normalized descriptor vectors which avoid the influence of the difference in length of protein sequences under consideration is employed as a quantitative measurement of the similarity of proteins. Finally, we take the nine ND5 (NADH dehydrogenase subunit 5) proteins for example and illustrate the effectiveness of our approach.


Assuntos
Proteínas/química , Análise de Sequência de Proteína
7.
J Theor Biol ; 353: 19-23, 2014 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-24607742

RESUMO

Knowledge of protein structural classes plays an important role in understanding protein folding patterns. Prediction of protein structural class based solely on sequence data remains to be a challenging problem. In this study, we extract the long-range correlation information and linear correlation information from position-specific score matrix (PSSM). A total of 3600 features are extracted, then, 278 features are selected by a filter feature selection method based on 1189 dataset. To verify the performance of our method (named by LCC-PSSM), jackknife tests are performed on three widely used low similarity benchmark datasets. Comparison of our results with the existing methods shows that our method provides the favorable performance for protein structural class prediction. Stand-alone version of the proposed method (LCC-PSSM) is written in MATLAB language and it can be downloaded from http://bioinfo.zstu.edu.cn/LCC-PSSM/.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Matrizes de Pontuação de Posição Específica , Estrutura Secundária de Proteína
8.
BMC Bioinformatics ; 14: 152, 2013 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-23641706

RESUMO

BACKGROUND: Many content-based statistical features of secondary structural elements (CBF-PSSEs) have been proposed and achieved promising results in protein structural class prediction, but until now position distribution of the successive occurrences of an element in predicted secondary structure sequences hasn't been used. It is necessary to extract some appropriate position-based features of the secondary structural elements for prediction task. RESULTS: We proposed some position-based features of predicted secondary structural elements (PBF-PSSEs) and assessed their intrinsic ability relative to the available CBF-PSSEs, which not only offers a systematic and quantitative experimental assessment of these statistical features, but also naturally complements the available comparison of the CBF-PSSEs. We also analyzed the performance of the CBF-PSSEs combined with the PBF-PSSE and further constructed a new combined feature set, PBF11CBF-PSSE. Based on these experiments, novel valuable guidelines for the use of PBF-PSSEs and CBF-PSSEs were obtained. CONCLUSIONS: PBF-PSSEs and CBF-PSSEs have a compelling impact on protein structural class prediction. When combining with the PBF-PSSE, most of the CBF-PSSEs get a great improvement over the prediction accuracies, so the PBF-PSSEs and the CBF-PSSEs have to work closely so as to make significant and complementary contributions to protein structural class prediction. Besides, the proposed PBF-PSSE's performance is extremely sensitive to the choice of parameter k. In summary, our quantitative analysis verifies that exploring the position information of predicted secondary structural elements is a promising way to improve the abilities of protein structural class prediction.


Assuntos
Estrutura Secundária de Proteína , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Dados de Sequência Molecular , Dobramento de Proteína , Proteínas/classificação , Homologia de Sequência de Aminoácidos , Máquina de Vetores de Suporte
9.
BMC Genomics ; 14: 661, 2013 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-24074203

RESUMO

BACKGROUND: Small non-coding RNAs (ncRNAs) are important regulators of gene expression in eukaryotes. Previously, only microRNAs (miRNAs) and piRNAs have been identified in the silkworm, Bombyx mori. Furthermore, only ncRNAs (50-500nt) of intermediate size have been systematically identified in the silkworm. RESULTS: Here, we performed a systematic identification and analysis of small RNAs (18-50nt) associated with the Bombyx mori argonaute2 (BmAgo2) protein. Using RIP-seq, we identified various types of small ncRNAs associated with BmAGO2. These ncRNAs showed a multimodal length distribution, with three peaks at ~20nt, ~27nt and ~33nt, which included tRNA-, transposable element (TE)-, rRNA-, snoRNA- and snRNA-derived small RNAs as well as miRNAs and piRNAs. The tRNA-derived fragments (tRFs) were found at an extremely high abundance and accounted for 69.90% of the BmAgo2-associated small RNAs. Northern blotting confirmed that many tRFs were expressed or up-regulated only in the BmNPV-infected cells, implying that the tRFs play a prominent role by binding to BmAgo2 during BmNPV infection. Additional evidence suggested that there are potential cleavage sites on the D, anti-codon and TψC loops of the tRNAs. TE-derived small RNAs and piRNAs also accounted for a significant proportion of the BmAgo2-associated small RNAs, suggesting that BmAgo2 could be involved in the maintenance of genome stability by suppressing the activities of transposons guided by these small RNAs. Finally, Northern blotting was also used to confirm the Bombyx 5.8 s rRNA-derived small RNAs, demonstrating that various novel small RNAs exist in the silkworm. CONCLUSIONS: Using an RIP-seq method in combination with Northern blotting, we identified various types of small RNAs associated with the BmAgo2 protein, including tRNA-, TE-, rRNA-, snoRNA- and snRNA-derived small RNAs as well as miRNAs and piRNAs. Our findings provide new clues for future functional studies of the role of small RNAs in insect development and evolution.


Assuntos
Proteínas Argonautas/metabolismo , Bombyx/genética , Imunoprecipitação/métodos , Pequeno RNA não Traduzido/metabolismo , RNA/metabolismo , Animais , Linhagem Celular , Elementos de DNA Transponíveis/genética , MicroRNAs/genética , MicroRNAs/metabolismo , Nucleopoliedrovírus/genética , RNA/genética , RNA/isolamento & purificação , RNA Ribossômico 5,8S/genética , RNA Ribossômico 5,8S/metabolismo , RNA Interferente Pequeno/metabolismo , RNA Nucleolar Pequeno/genética , RNA Nucleolar Pequeno/metabolismo , Pequeno RNA não Traduzido/genética , RNA de Transferência/metabolismo , Recombinação Genética/genética
10.
J Theor Biol ; 336: 52-60, 2013 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-23876763

RESUMO

Lempel-Ziv complexity has been widely used for sequence comparison and achieved promising results, but until now components' distribution in exhaustive history has not been studied. This paper investigated the whole distribution of LZ-words and presented a novel statistical method for sequence comparison. With the components' length in mind, we revised Lempel-Ziv complexity and obtained various sets of LZ-words. Instead of calculating the LZ-words' contents, we defined a series of set operations on LZ-word set to compare biological sequences. In order to assess the effectiveness of the proposed method, we performed two sets of experiments and compared it with alignment-based methods.


Assuntos
Algoritmos , Homologia de Sequência , Sequência de Bases , Análise por Conglomerados , Coronavirus/classificação , Coronavirus/genética , Genoma Viral , Vírus da Hepatite E/genética , Filogenia
11.
Math Biosci Eng ; 20(1): 1037-1057, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36650801

RESUMO

DNase I hypersensitive sites (DHSs) are a specific genomic region, which is critical to detect or understand cis-regulatory elements. Although there are many methods developed to detect DHSs, there is a big gap in practice. We presented a deep learning-based language model for predicting DHSs, named LangMoDHS. The LangMoDHS mainly comprised the convolutional neural network (CNN), the bi-directional long short-term memory (Bi-LSTM) and the feed-forward attention. The CNN and the Bi-LSTM were stacked in a parallel manner, which was helpful to accumulate multiple-view representations from primary DNA sequences. We conducted 5-fold cross-validations and independent tests over 14 tissues and 4 developmental stages. The empirical experiments showed that the LangMoDHS is competitive with or slightly better than the iDHS-Deep, which is the latest method for predicting DHSs. The empirical experiments also implied substantial contribution of the CNN, Bi-LSTM, and attention to DHSs prediction. We implemented the LangMoDHS as a user-friendly web server which is accessible at http:/www.biolscience.cn/LangMoDHS/. We used indices related to information entropy to explore the sequence motif of DHSs. The analysis provided a certain insight into the DHSs.


Assuntos
Aprendizado Profundo , Animais , Camundongos , Desoxirribonuclease I/genética , Desoxirribonuclease I/metabolismo , Genômica , Sequências Reguladoras de Ácido Nucleico
12.
Comput Biol Med ; 167: 107586, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37907029

RESUMO

The associations between cancer and bacteria/fungi have been extensively studied, but the implications of cancer-associated viruses have not been thoroughly examined. In this study, we comprehensively characterized the cancer virome of tissue samples across 31 cancer types, as well as blood samples from 23 cancer types. Our findings demonstrated the presence of viral DNA at low abundances in both tissue and blood across major human cancers, with significant differences in viral community composition observed among various cancer types. Furthermore, Cox regression analyses conducted on four cancers, including Head and Neck squamous cell carcinoma (HNSC), Kidney renal clear cell carcinoma (KIRC), Stomach adenocarcinoma (STAD), and Uterine Corpus Endometrial Carcinoma (UCEC), revealed strong correlation between viral composition/abundance in tissues and patient survival. Additionally, we identified virus-associated prognostic signatures (VAPS) for these four cancers, and discerned differences in the interplay between VAPS and dominant bacteria in tissues among patients with varying survival risks. Notably, clinically relevant analyses revealed prognostic capacities of the VAPS in these four cancers. Taken together, our study provides novel insights into the role of viruses in tissue in the prognosis of multiple cancers and offers guidance on the use of tissue viruses to stratify prognosis for patients with cancer.


Assuntos
Adenocarcinoma , Carcinoma de Células Renais , Neoplasias Renais , Neoplasias Gástricas , Humanos
13.
Amino Acids ; 42(5): 1867-77, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-21505825

RESUMO

There are two crucial problems with statistical measures for sequence comparison: overlapping structures and background information of words in biological sequences. Word normalization in improved composition vector method took into account these problems and achieved better performance in evolutionary analysis. The word normalization is desirable, but not sufficient, because it assumes that the four bases A, C, T, and G occur randomly with equal chance. This paper proposed an improved word normalization which uses Markov model to estimate exact k-word distribution according to observed biological sequence and thus has the ability to adjust the background information of the k-word frequencies in biological sequences. The improved word normalization was tested with three experiments and compared with the existing word normalization. The experiment results confirm that the improved word normalization using Markov model to estimate the exact k-word distribution in biological sequences is more efficient.


Assuntos
Algoritmos , Cadeias de Markov , Análise de Sequência de DNA/métodos , Biologia Computacional , Modelos Teóricos , Alinhamento de Sequência
14.
J Theor Biol ; 304: 81-7, 2012 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-22554947

RESUMO

Based on the order of 6-bit binary Gray code, a cyclic order of 20 amino acids is introduced. A novel 3D graphical representation of protein sequences is proposed according to the CGR of DNA sequences. Furthermore, the mathematical descriptor is suggested to characterize the graphical representation curve. The efficiency of our approach can be illustrated by performing the comparison of similarities/dissimilarities among sequences of the ND5 proteins of nine different species. With the correlation and significance analysis, the comparisons of both our results and results of other graphical representation with the ClustalW's results can show the utility of our approach.


Assuntos
Sequência de Aminoácidos , Modelos Moleculares , Proteínas/genética , Animais , Biologia Computacional/métodos , Código Genético , Humanos , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos , Especificidade da Espécie
15.
Comb Chem High Throughput Screen ; 25(3): 381-391, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33045963

RESUMO

AIM AND OBJECTIVE: The similarities comparison of biological sequences is an important task in bioinformatics. The methods of the similarities comparison for biological sequences are divided into two classes: sequence alignment method and alignment-free method. The graphical representation of biological sequences is a kind of alignment-free method, which constitutes a tool for analyzing and visualizing the biological sequences. In this article, a generalized iterative map of protein sequences was suggested to analyze the similarities of biological sequences. MATERIALS AND METHODS: Based on the normalized physicochemical indexes of 20 amino acids, each amino acid can be mapped into a point in 5D space. A generalized iterative function system was introduced to outline a generalized iterative map of protein sequences, which can not only reflect various physicochemical properties of amino acids but also incorporate with different compression ratios of the component of a generalized iterative map. Several properties were proved to illustrate the advantage of the generalized iterative map. The mathematical description of the generalized iterative map was suggested to compare the similarities and dissimilarities of protein sequences. Based on this method, similarities/dissimilarities were compared among ND5 protein sequences, as well as ND6 protein sequences of ten different species. RESULTS: By correlation analysis, the ClustalW results were compared with our similarity/dissimilarity results and other graphical representation results to show the utility of our approach. The comparison results show that our approach has better correlations with ClustalW for all species than other approaches and illustrate the effectiveness of our approach. CONCLUSION: Two examples show that our method not only has good performances and effects in the similarity/dissimilarity analysis of protein sequences but also does not require complex computation.


Assuntos
Proteínas , Análise de Sequência de Proteína , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos
16.
Front Genet ; 13: 1003711, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36568390

RESUMO

With the development of high-throughput sequencing technology, the scale of single-cell RNA sequencing (scRNA-seq) data has surged. Its data are typically high-dimensional, with high dropout noise and high sparsity. Therefore, gene imputation and cell clustering analysis of scRNA-seq data is increasingly important. Statistical or traditional machine learning methods are inefficient, and improved accuracy is needed. The methods based on deep learning cannot directly process non-Euclidean spatial data, such as cell diagrams. In this study, we developed scGAEGAT, a multi-modal model with graph autoencoders and graph attention networks for scRNA-seq analysis based on graph neural networks. Cosine similarity, median L1 distance, and root-mean-squared error were used to measure the gene imputation performance of different methods for comparison with scGAEGAT. Furthermore, adjusted mutual information, normalized mutual information, completeness score, and Silhouette coefficient score were used to measure the cell clustering performance of different methods for comparison with scGAEGAT. Experimental results demonstrated promising performance of the scGAEGAT model in gene imputation and cell clustering prediction on four scRNA-seq data sets with gold-standard cell labels.

17.
Sci Rep ; 12(1): 13996, 2022 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-35978023

RESUMO

Deep learning technology is changing the landscape of cybersecurity research, especially the study of large amounts of data. With the rapid growth in the number of malware, developing of an efficient and reliable method for classifying malware has become one of the research priorities. In this paper, a new method, BIR-CNN, is proposed to classify of Android malware. It combines convolution neural network (CNN) with batch normalization and inception-residual (BIR) network modules by using 347-dim network traffic features. CNN combines inception-residual modules with a convolution layer that can enhance the learning ability of the model. Batch Normalization can speed up the training process and avoid over-fitting of the model. Finally, experiments are conducted on the publicly available network traffic dataset CICAndMal2017 and compared with three traditional machine learning algorithms and CNN. The accuracy of BIR-CNN is 99.73% in binary classification (2-classifier). Moreover, the BIR-CNN can classify malware by its category (4-classifier) and malicious family (35-classifier), with a classification accuracy of 99.53% and 94.38%, respectively. The experimental results show that the proposed model is an effective method for Android malware classification, especially in malware category and family classifier.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Algoritmos , Segurança Computacional , Coleta de Dados
18.
Pharmaceuticals (Basel) ; 15(6)2022 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-35745625

RESUMO

Bioactive peptides are typically small functional peptides with 2-20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.

19.
Front Microbiol ; 13: 1048478, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36560938

RESUMO

Transcription factors (TFs) are typical regulators for gene expression and play versatile roles in cellular processes. Since it is time-consuming, costly, and labor-intensive to detect it by using physical methods, it is desired to develop a computational method to detect TFs. Here, we presented a capsule network-based method for identifying TFs. This method is an end-to-end deep learning method, consisting mainly of an embedding layer, bidirectional long short-term memory (LSTM) layer, capsule network layer, and three fully connected layers. The presented method obtained an accuracy of 0.8820, being superior to the state-of-the-art methods. These empirical experiments showed that the inclusion of the capsule network promoted great performances and that the capsule network-based representation was superior to the property-based representation for distinguishing between TFs and non-TFs. We also implemented the presented method into a user-friendly web server, which is freely available at http://www.biolscience.cn/Capsule_TF/ for all scientific researchers.

20.
Comput Biol Chem ; 98: 107689, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35537363

RESUMO

The embryonic stem cell (ESC) has the capacity to self-renew and maintain pluripotent, while continuously offering a source of various differentiated cell types. The fate decision process of remaining in the ground state or transiting to a differentiated state can be read out by the regulatory network of key transcription factors (TFs). However, its underlying mechanism remains to be fully elucidated. In this paper, we tackle this problem by proposing a novel cellular differentiation model for mouse embryonic stem cell (MESC) dynamics regulation: MESC-DRM. We employ nonlinear least-squares algorithm to infer model parameters by using benchmark datasets, construct a potential function by exploiting multivariate Gaussian distributions, and project the potential landscape into a 3D space to validate and replicate the stable cell states observed in experiments. The traditional cell landscape modeling techniques rely on the potential function visualization to decide the stable states of cells. But the visualization will be almost impossible when the dimensionality of the potential function is greater than 3. We handle the challenge by innovatively employing a Lyapunov method to resolve it through a more straightforward analytical approach. It also provides a more rigorous and robust way for accurate cell fate decision. The study not only validates the previous experimental results but also provides an insightful guide for cell fate decision besides inspiring future study on this topic.


Assuntos
Algoritmos , Células-Tronco Embrionárias , Animais , Diferenciação Celular , Camundongos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa