RESUMO
As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a "word" in linguistics, the word segmentation methods are proposed to divide DNA sequences into "words", and the skip-gram model is employed to transform the "words" into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract "words" from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.
Assuntos
DNA/genética , Elementos Facilitadores Genéticos/genética , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Aprendizado Profundo , Modelos Teóricos , Redes Neurais de Computação , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Sequência de DNA/métodosRESUMO
Neural circuits that determine the perception and modulation of pain remain poorly understood. The prefrontal cortex (PFC) provides top-down control of sensory and affective processes. While animal and human imaging studies have shown that the PFC is involved in pain regulation, its exact role in pain states remains incompletely understood. A key output target for the PFC is the nucleus accumbens (NAc), an important component of the reward circuitry. Interestingly, recent human imaging studies suggest that the projection from the PFC to the NAc is altered in chronic pain. The function of this corticostriatal projection in pain states, however, is not known. Here we show that optogenetic activation of the PFC produces strong antinociceptive effects in a rat model (spared nerve injury model) of persistent neuropathic pain. PFC activation also reduces the affective symptoms of pain. Furthermore, we show that this pain-relieving function of the PFC is likely mediated by projections to the NAc. Thus, our results support a novel role for corticostriatal circuitry in pain regulation.
Assuntos
Vias Neurais/fisiologia , Neuralgia/fisiopatologia , Neuralgia/terapia , Núcleo Accumbens/fisiologia , Córtex Pré-Frontal/fisiologia , Animais , Comportamento Animal/fisiologia , Masculino , Núcleo Accumbens/citologia , Optogenética , Medição da Dor , Córtex Pré-Frontal/citologia , RatosRESUMO
BACKGROUND: Aptamer-protein interacting pairs play a variety of physiological functions and therapeutic potentials in organisms. Rapidly and effectively predicting aptamer-protein interacting pairs is significant to design aptamers binding to certain interested proteins, which will give insight into understanding mechanisms of aptamer-protein interacting pairs and developing aptamer-based therapies. RESULTS: In this study, an ensemble method is presented to predict aptamer-protein interacting pairs with hybrid features. The features for aptamers are extracted from Pseudo K-tuple Nucleotide Composition (PseKNC) while the features for proteins incorporate Discrete Cosine Transformation (DCT), disorder information, and bi-gram Position Specific Scoring Matrix (PSSM). We investigate predictive capabilities of various feature spaces. The proposed ensemble method obtains the best performance with Youden's Index of 0.380, using the hybrid feature space of PseKNC, DCT, bi-gram PSSM, and disorder information by 10-fold cross validation. The Relief-Incremental Feature Selection (IFS) method is adopted to obtain the optimal feature set. Based on the optimal feature set, the proposed method achieves a balanced performance with a sensitivity of 0.753 and a specificity of 0.725 on the training dataset, which indicates that this method can solve the imbalanced data problem effectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous study with a sensitivity of 0.738 and a Youden's Index of 0.451. CONCLUSIONS: These results suggest that the proposed method can be a potential candidate for aptamer-protein interacting pair prediction, which may contribute to finding novel aptamer-protein interacting pairs and understanding the relationship between aptamers and proteins.
Assuntos
Aptâmeros de Peptídeos/química , Aptâmeros de Peptídeos/genética , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Humanos , Modelos Moleculares , Técnica de Seleção de Aptâmeros/métodosRESUMO
BACKGROUND: AMPAkines augment the function of α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors in the brain to increase excitatory outputs. These drugs are known to relieve persistent pain. However, their role in acute pain is unknown. Furthermore, a specific molecular and anatomic target for these novel analgesics remains elusive. METHODS: The authors studied the analgesic role of an AMPAkine, CX546, in a rat paw incision (PI) model of acute postoperative pain. The authors measured the effect of AMPAkines on sensory and depressive symptoms of pain using mechanical hypersensitivity and forced swim tests. The authors asked whether AMPA receptors in the nucleus accumbens (NAc), a key node in the brain's reward and pain circuitry, can be a target for AMPAkine analgesia. RESULTS: Systemic administration of CX546 (n = 13), compared with control (n = 13), reduced mechanical hypersensitivity (50% withdrawal threshold of 6.05 ± 1.30 g [mean ± SEM] vs. 0.62 ± 0.13 g), and it reduced depressive features of pain by decreasing immobility on the forced swim test in PI-treated rats (89.0 ± 15.5 vs. 156.7 ± 18.5 s). Meanwhile, CX546 delivered locally into the NAc provided pain-relieving effects in both PI (50% withdrawal threshold of 6.81 ± 1.91 vs. 0.50 ± 0.03 g; control, n = 6; CX546, n = 8) and persistent postoperative pain (spared nerve injury) models (50% withdrawal threshold of 3.85 ± 1.23 vs. 0.45 ± 0.00 g; control, n = 7; CX546, n = 11). Blocking AMPA receptors in the NAc with 2,3-dihydroxy-6-nitro-7-sulfamoyl-benzo[f]quinoxaline-2,3-dione inhibited these pain-relieving effects (50% withdrawal threshold of 7.18 ± 1.52 vs. 1.59 ± 0.66 g; n = 8 for PI groups; 10.70 ± 3.45 vs. 1.39 ± 0.88 g; n = 4 for spared nerve injury groups). CONCLUSIONS: AMPAkines relieve postoperative pain by acting through AMPA receptors in the NAc.
Assuntos
Analgésicos/farmacologia , Dioxóis/farmacologia , Núcleo Accumbens/efeitos dos fármacos , Dor Pós-Operatória/tratamento farmacológico , Piperidinas/farmacologia , Receptores de AMPA/efeitos dos fármacos , Animais , Comportamento Animal/efeitos dos fármacos , Depressão/prevenção & controle , Modelos Animais de Doenças , Masculino , Neuralgia/tratamento farmacológico , Ratos , Ratos Sprague-DawleyRESUMO
Conotoxins targeting different ion channels play distinct physiological functions and therapeutic potentials in organisms. Accurate identification of types of ion channel-targeted conotoxins will provide significant clues to reveal the physiological mechanism and pharmacological therapeutic potential of conotoxins. In this study, a random forest based predictor called ICTCPred for the types of ion channel-targeted conotoxin prediction is proposed with hybrid features incorporating CTD (Composition, Transition, and Distribution), g-Gap DC (g-Gap Dipeptide Composition), PP (Physicochemical Properties), and SSI (Secondary Structure Information). To deal with the imbalanced benchmark dataset, the SMOTE Technique (Synthetic Minority Over-sampling Technique) is applied. Based on the above-mentioned individual feature spaces, the average accuracy of ICTCPred lies in the range of 0.729-0.886, indicating the discriminative power of these features. In addition, ICTCPred yields the highest average accuracy of 0.895 using the hybrid feature space of CTD, g-Gap DC, PP and SSI. The Relief-IFS (Incremental Feature Selection) method is adopted to further improve the prediction performance of ICTCPred. Based on the training dataset, ICTCPred achieves satisfactory performance with an average accuracy of 0.910. To evaluate the prediction performance objectively, ICTCPred is compared with previous studies on the same independent testing dataset. Encouragingly, our proposed method performs better than previous studies to identify types of ion channel-targeted conotoxins, with the highest sensitivity of 0.919 for Na(+)-targeted conotoxins, the highest sensitivity of 1 for K(+)-targeted conotoxins, and the highest sensitivity of 1 for Ca(2+)-targeted conotoxins. It is anticipated that ICTCPred can be a potential candidate for the ion channel-targeted conotoxin prediction.
Assuntos
Algoritmos , Biologia Computacional/métodos , Conotoxinas/farmacologia , Canais Iônicos/metabolismo , Sequência de Aminoácidos , Aminoácidos/química , Análise por Conglomerados , Conotoxinas/química , Bases de Dados de Proteínas , Peptídeos/químicaRESUMO
The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew's Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.
Assuntos
Biologia Computacional , Complexo de Golgi/metabolismo , Proteínas/metabolismo , Algoritmos , Aminoácidos/química , Animais , Evolução Biológica , Biologia Computacional/métodos , Bases de Dados de Proteínas , Suscetibilidade a Doenças , Humanos , Peptídeos/química , Proteínas/química , Curva ROC , Reprodutibilidade dos Testes , Biologia de Sistemas/métodosRESUMO
Antifreeze proteins (AFPs) play a pivotal role in the antifreeze effect of overwintering organisms. They have a wide range of applications in numerous fields, such as improving the production of crops and the quality of frozen foods. Accurate identification of AFPs may provide important clues to decipher the underlying mechanisms of AFPs in ice-binding and to facilitate the selection of the most appropriate AFPs for several applications. Based on an ensemble learning technique, this study proposes an AFP identification system called AFP-Ensemble. In this system, random forest classifiers are trained by different training subsets and then aggregated into a consensus classifier by majority voting. The resulting predictor yields a sensitivity of 0.892, a specificity of 0.940, an accuracy of 0.938 and a balanced accuracy of 0.916 on an independent dataset, which are far better than the results obtained by previous methods. These results reveal that AFP-Ensemble is an effective and promising predictor for large-scale determination of AFPs. The detailed feature analysis in this study may give useful insights into the molecular mechanisms of AFP-ice interactions and provide guidance for the related experimental validation. A web server has been designed to implement the proposed method.
Assuntos
Proteínas Anticongelantes/química , Proteínas Anticongelantes/classificação , Biologia Computacional/métodos , Máquina de Vetores de Suporte , Algoritmos , Análise de Variância , Internet , Reprodutibilidade dos Testes , NavegadorRESUMO
Bacteriophage virion proteins and non-virion proteins have distinct functions in biological processes, such as specificity determination for host bacteria, bacteriophage replication and transcription. Accurate identification of bacteriophage virion proteins from bacteriophage protein sequences is significant to understand the complex virulence mechanism in host bacteria and the influence of bacteriophages on the development of antibacterial drugs. In this study, an ensemble method for bacteriophage virion protein prediction from bacteriophage protein sequences is put forward with hybrid feature spaces incorporating CTD (composition, transition and distribution), bi-profile Bayes, PseAAC (pseudo-amino acid composition) and PSSM (position-specific scoring matrix). When performing on the training dataset 10-fold cross-validation, the presented method achieves a satisfactory prediction result with a sensitivity of 0.870, a specificity of 0.830, an accuracy of 0.850 and Matthew's correlation coefficient (MCC) of 0.701, respectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous studies with a sensitivity of 0.853, a specificity of 0.815, an accuracy of 0.831 and MCC of 0.662 on the independent testing dataset. These results suggest that the proposed method can be a potential candidate for bacteriophage virion protein prediction, which may provide a useful tool to find novel antibacterial drugs and to understand the relationship between bacteriophage and host bacteria. For the convenience of the vast majority of experimental Int. J. Mol. Sci. 2015, 16,21735 scientists, a user-friendly and publicly-accessible web-server for the proposed ensemble method is established.
Assuntos
Bacteriófagos/metabolismo , Biologia Computacional/métodos , Proteínas Virais/química , Proteínas Virais/metabolismo , Vírion/metabolismo , Sequência de Aminoácidos , Internet , Curva ROC , Reprodutibilidade dos Testes , NavegadorRESUMO
Predicting drug-disease associations can contribute to discovering new therapeutic potentials of drugs, and providing important association information for new drug research and development. Many existing drug-disease association prediction methods have not distinguished relevant background information for the same drug targeted to different diseases. Therefore, this paper proposes a drug-disease association prediction model based on graph convolutional network and graph attention network (GCNGAT) to reposition marketed drugs under the distinguishment of background information. Firstly, in order to obtain initial drug-disease information, a drug-disease heterogeneous graph structure is constructed based on all known drug-disease associations. Secondly, based on the heterogeneous graph structure, the corresponding subgraphs of each group of drug-disease association pairs are extracted to distinguish different background information for the same drug from different diseases. Finally, a model combining Graph neural network with global Average pooling (GnnAp) is designed to predict potential drug-disease associations by learning drug-disease interaction feature representations. The experimental results show that adding subgraph extraction can effectively improve the prediction performance of the model, and the graph representation learning module can fully extract the deep features of drug-disease. Using the 5-fold cross-validation, the proposed model (GCNGAT) achieves AUC (Area Under the receiver operating characteristic Curve) values of 0.9182 and 0.9417 on the PREDICT dataset and CDataset dataset, respectively. Compared with other predictors on the same dataset (PREDICT dataset), GCNGAT outperforms the existing best-performing model (PSGCN), with a 1.58% increase in the AUC value. It is anticipated that this model can provide experimental reference for drug repositioning and further promote the drug research and development process.
Assuntos
Reposicionamento de Medicamentos , Aprendizagem , Redes Neurais de Computação , Curva ROCRESUMO
Amyloid fibrils formed by the mis-aggregation of amyloid proteins can lead to neuronal degenerations in the Alzheimer's disease. Predicting amyloid proteins not only contributes to understanding physicochemical properties and formation mechanism of amyloid proteins, but also has significant implications in the amyloid disease treatment and the development of a new purpose for amyloid materials. In this study, an ensemble learning model with sequence-derived features, ECAmyloid, is proposed to identify amyloids. The sequence-derived features including Pseudo Position Specificity Score Matrix (Pse-PSSM), Split Amino Acid Composition (SAAC), Solvent Accessibility (SA), and Secondary Structure Information (SSI) are employed to incorporate sequence composition, evolutionary and structural information. The individual learners of the ensemble learning model are selected by an increment classifier selection strategy. The final prediction results are determined by voting of prediction results of multiple individual learners. In view of the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted to generate positive samples. To eliminate irrelevant features and redundant features, correlation-based feature subset (CFS) selection combined with a heuristic search strategy is performed to obtain the optimal feature subset. Experimental results indicate that the ensemble classifier achieves an accuracy of 98.29%, a sensitivity of 0.992, a specificity of 0.974 on the training dataset using the 10-fold cross validation, far higher than the results obtained by its individual learners. Compared with the original feature set, the accuracy, sensitivity, specificity, MCC, F1-score, G-Mean of the ensemble method trained by the optimal feature subset are improved by 1.05%, 0.012, 0.01, 0.021, 0.011 and 0.011, respectively. Moreover, the comparison results with existing methods on two same independent test datasets demonstrate that the proposed method is an effective and promising predictor for large-scale determination of amyloid proteins. The data and code used to develop ECAmyloid has been shared to Github, and can be freely downloaded at https://github.com/KOALA-L/ECAmyloid.git.
Assuntos
Aminoácidos , Proteínas Amiloidogênicas , Aminoácidos/química , Aprendizado de Máquina , AlgoritmosRESUMO
Circular RNAs (circRNAs) are specifically and abnormally expressed in disease tissues, and thus can be used as biomarkers to diagnose relevant diseases. Predicting circRNA-disease associations will provide essential clues to reveal molecular mechanisms of disease development and discover novel therapeutic targets. Existing algorithms ignore the heterogeneous biological association information related to microRNAs (miRNAs). Based on a heterogeneous graph embedding model, a novel circRNA-disease association prediction method called HGECDA is developed in this paper. The heterogeneous graph network containing circRNA-miRNA-disease association information is first constructed. To sample the heterogeneous information, the meta-path-based random walk that can capture the relevance between various types of nodes is employed. Then, the path embedding model based on skip-gram and random negative sampling is built to acquire the initial feature vectors of circRNAs and diseases. Finally, the CosMulformer model with linearized self-attention and Hadamard product is designed to obtain the circRNA-disease interaction vectors and conduct the prediction task. Experimental results demonstrate the critical role of miRNA in enriching the information of the feature space, the effectiveness of the CosMulformer model in picking out deep local interaction features, and the feasibility of the Hadamard product chosen as the integration pattern in the CosMulformer model. Compared with existing state-of-the-art methods on the same dataset, HGECDA performs better than the other seven algorithms. Moreover, the case studies about breast cancer and colorectal cancer demonstrate the practical value of HGECDA in predicting potential circRNA-disease associations.
RESUMO
As a complication of malignant tumors, brain metastasis (BM) seriously threatens patients' survival and quality of life. Accurate detection of BM before determining radiation therapy plans is a paramount task. Due to the small size and heterogeneous number of BMs, their manual diagnosis faces enormous challenges. Thus, MRI-based artificial intelligence-assisted BM diagnosis is significant. Most of the existing deep learning (DL) methods for automatic BM detection try to ensure a good trade-off between precision and recall. However, due to the objective factors of the models, higher recall is often accompanied by higher number of false positive results. In real clinical auxiliary diagnosis, radiation oncologists are required to spend much effort to review these false positive results. In order to reduce false positive results while retaining high accuracy, a modified YOLOv5 algorithm is proposed in this paper. First, in order to focus on the important channels of the feature map, we add a convolutional block attention model to the neck structure. Furthermore, an additional prediction head is introduced for detecting small-size BMs. Finally, to distinguish between cerebral vessels and small-size BMs, a Swin transformer block is embedded into the smallest prediction head. With the introduction of the F2-score index to determine the most appropriate confidence threshold, the proposed method achieves a precision of 0.612 and recall of 0.904. Compared with existing methods, our proposed method shows superior performance with fewer false positive results. It is anticipated that the proposed method could reduce the workload of radiation oncologists in real clinical auxiliary diagnosis.
RESUMO
As a non-coding RNA molecule with closed-loop structure, circular RNA (circRNA) is tissue-specific and cell-specific in expression pattern. It regulates disease development by modulating the expression of disease-related genes. Therefore, exploring the circRNA-disease relationship can reveal the molecular mechanism of disease pathogenesis. Biological experiments for detecting circRNA-disease associations are time-consuming and laborious. Constrained by the sparsity of known circRNA-disease associations, existing algorithms cannot obtain relatively complete structural information to represent features accurately. To this end, this paper proposes a new predictor, VGAERF, combining Variational Graph Auto-Encoder (VGAE) and Random Forest (RF). Firstly, circRNA homogeneous graph structure and disease homogeneous graph structure are constructed by Gaussian interaction profile (GIP) kernel similarity, semantic similarity, and known circRNA-disease associations. VGAEs with the same structure are employed to extract the higher-order features by the encoding and decoding of input graph structures. To further increase the completeness of the network structure information, the deep features acquired from the two VGAEs are summed, and then train the RF with sparse data processing capability to perform the prediction task. On the independent test set, the Area Under ROC Curve (AUC), accuracy, and Area Under PR Curve (AUPR) of the proposed method reach up to 0.9803, 0.9345, and 0.9894, respectively. On the same dataset, the AUC, accuracy, and AUPR of VGAERF are 2.09%, 5.93%, and 1.86% higher than the best-performing method (AEDNN). It is anticipated that VGAERF will provide significant information to decipher the molecular mechanisms of circRNA-disease associations, and promote the diagnosis of circRNA-related diseases.
Assuntos
Trabalho de Parto , RNA Circular , Gravidez , Feminino , Humanos , RNA Circular/genética , Algoritmos , Área Sob a Curva , SemânticaRESUMO
By denaturing proteins and promoting the formation of multiprotein complexes, protein phosphorylation has important effects on the activity of protein functional molecules and cell signaling. The regulation of protein phosphorylation allows microbes to respond rapidly and reversibly to specific environmental stimuli or niches, which is closely related to the molecular mechanisms of bacterial drug resistance. Accurate prediction of phosphorylation sites (p-site) of prokaryotes can contribute to addressing bacterial resistance and providing new perspectives for developing novel antibacterial drugs. Most existing studies focus on human phosphorylation sites, while tools targeting phosphorylation site identification of prokaryotic proteins are still relatively scarce. This study designs a capsule network-based prediction technique for p-site in prokaryotes. To address the poor scalability and unreliability of dynamic routing processes in the output space of capsule networks, a more reliable way is introduced to learn the consistency between capsules. We incorporate a self-attention mechanism into the routing algorithm to capture the global information of the capsule, reducing the computational effort while enriching the representation capability of the capsule. Aiming at the weak robustness of the model, EcapsP improves the prediction accuracy and stability by introducing shortcuts and unconditional reconfiguration. In addition, the study compares and analyzes the prediction performance based on word vectors, physicochemical properties, and mixing characteristics in predicting serine (Ser/S), threonine (Thr/T), and tyrosine (Tyr/Y) p-site. The comprehensive experimental results show that the accuracy of the developed technique is close to 70% for the identification of the three phosphorylation sites in prokaryotes. Importantly, in side-by-side comparisons with other state-of-the-art predictors, our method improves the Matthews correlation coefficient (MCC) by approximately 7%. The results demonstrate the superiority of EcapsP in terms of high performance and reliability.
Assuntos
Células Procarióticas , Proteínas , Humanos , Fosforilação , Reprodutibilidade dos Testes , Proteínas/metabolismo , Tirosina/metabolismoRESUMO
Enhancers are non-coding DAN fragments that play key roles in gene regulations and can promote the transcription of structural genes, thereby affecting the expression of structural protein catalytic enzymes and regulatory proteins. Accurate identification of enhancers helps to understand the transcription of structural genes and the development of human tumorigenesis, diagnosis and treatment. The enhancer sequences have high position variations and dispersions, and the identification of enhancers is more challenging than other genetic factors. Based on word embedding and sequence generative adversarial networks, a deep learning framework for enhancer identification is proposed. Firstly, considering the small number of sequences in the benchmark dataset, RankGAN is used to amplify the dataset size while maintaining the data characteristics. Then, in view of the similarity between DNA sequence and natural language, DNA sequence is regarded as a sentence composed of multiple "words", and the word embedding technology FastText is applied to transform it into a numerical matrix. To extract the dependencies and highly abstract features of nucleotides in DNA sequences, a Long Short-Term Memory Convolutional Neural network (LSTM-CNN) is constructed to perform the identification task. On the independent test set, the accuracy and Matthew's correlation coefficient (MCC) for enhancer prediction are 0.7525 and 0.5051, respectively. For the enhancer type prediction, the accuracy and MCC of this method are 0.6972 and 0.3954, respectively. Compared with existing methods, this method achieves more satisfactory results for the prediction of enhancers and their types. This study will further enrich the application of natural language processing in bioinformatics.
Assuntos
Aprendizado Profundo , Biologia Computacional/métodos , Humanos , Redes Neurais de ComputaçãoRESUMO
The DNA replication influences the inheritance of genetic information in the DNA life cycle. As the distribution of replication origins (ORIs) is the major determinant to precisely regulate the replication process, the correct identification of ORIs is significant in giving an insightful understanding of DNA replication mechanisms and the regulatory mechanisms of genetic expressions. For eukaryotes in particular, multiple ORIs exist in each of their gene sequences to complete the replication in a reasonable period of time. To simplify the identification process of eukaryote's ORIs, most of existing methods are developed by traditional machine learning algorithms, and target to the gene sequences with a fixed length. Consequently, the identification results are not satisfying, i.e. there is still great room for improvement. To break through the limitations in previous studies, this paper develops sequence segmentation methods, and employs the word embedding technique, 'Word2vec', to convert gene sequences into word vectors, thereby grasping the inner correlations of gene sequences with different lengths. Then, a deep learning framework to perform the ORI identification task is constructed by a convolutional neural network with an embedding layer. On the basis of the analysis of similarity reduction dimensionality diagram, Word2vec can effectively transform the inner relationship among words into numerical feature. For four species in this study, the best models are obtained with the overall accuracy of 0.975, 0.765, 0.885, 0.967, the Matthew's correlation coefficient of 0.940, 0.530, 0.771, 0.934, and the AUC of 0.975, 0.800, 0.888, 0.981, which indicate that the proposed predictor has a stable ability and provide a high confidence coefficient to classify both of ORIs and non-ORIs. Compared with state-of-the-art methods, the proposed predictor can achieve ORI identification with significant improvement. It is therefore reasonable to anticipate that the proposed method will make a useful high throughput tool for genome analysis.
Assuntos
Replicação do DNA , Aprendizado Profundo , Kluyveromyces/genética , Origem de Replicação , Saccharomyces cerevisiae/genética , Saccharomycetales/genética , Schizosaccharomyces/genética , Algoritmos , Bases de Dados Genéticas , Redes Neurais de Computação , Transcrição GênicaRESUMO
BACKGROUND: The COVID-19 pandemic presents great challenges on transmission prevention, and rapid diagnosis is essential to reduce the disease spread. Various diagnostic methods are available to identify an ongoing infection by nasopharyngeal (NPH) swab sampling. However, the procedure requires handling by health care professionals, and therefore limits the application in household and community settings. OBJECTIVES: In this study, we aimed to determine if the detection of SARS-CoV-2 can be performed alternatively on saliva specimens by rapid antigen test. STUDY DESIGN: Saliva and NPH specimens were collected from 44 patients with confirmed COVID-19. To assess the diagnostic accuracy of point-of-care SARS-CoV-2 rapid antigen test on saliva specimens, we compared the performance of four test products. RESULTS: RT-qPCR was performed and NPH and saliva sampling had similar Ct values, which associated with disease duration. All four antigen tests showed similar trend in detecting SARS-CoV-2 in saliva, but with variation in the ability to detect positive cases. The rapid antigen test with the best performance could detect up to 67% of the positive cases with Ct values lower than 25, and disease duration shorter than 10 days. CONCLUSION: Our study therefore supports saliva testing as an alternative diagnostic procedure to NPH testing, and that rapid antigen test on saliva provides a potential complement to PCR test to meet increasing screening demand.
RESUMO
The coronavirus disease 2019 (COVID-19) pandemic has created a global health- and economic crisis. Detection of antibodies to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which causes COVID-19 by serological methods is important to diagnose a current or resolved infection. In this study, we applied a rapid COVID-19 IgM/IgG antibody test and performed serology assessment of antibody response to SARS-CoV-2. In PCR-confirmed COVID-19 patients (n = 45), the total antibody detection rate is 92% in hospitalized patients and 79% in non-hospitalized patients. The total IgM and IgG detection is 63% in patients with <2 weeks from disease onset; 85% in non-hospitalized patients with >2 weeks disease duration; and 91% in hospitalized patients with >2 weeks disease duration. We also compared different blood sample types and suggest a higher sensitivity by serum/plasma over whole blood. Test specificity was determined to be 97% on 69 sera/plasma samples collected between 2016-2018. Our study provides a comprehensive validation of the rapid COVID-19 IgM/IgG serology test, and mapped antibody detection patterns in association with disease progress and hospitalization. Our results support that the rapid COVID-19 IgM/IgG test may be applied to assess the COVID-19 status both at the individual and at a population level.
RESUMO
The function of a flavoprotein is determined to a great extent by the binding sites on its surface that interacts with flavin adenine dinucleotide (FAD). Malfunction or dysregulation of FAD binding leads to a series of diseases. Therefore, accurately identifying FAD interacting residues (FIRs) provides insights into the molecular mechanisms of flavoprotein-related biological processes and disease progression. In this paper, a new computational method is proposed for identifying FIRs from protein sequences. Various sequence-derived discriminative features are explored. We analyze the distinctions of these features between FIRs and non-FIRs. We also investigate the predictive capabilities of both individual features and combinations of features. A relief algorithm followed by incremental feature selection (relief-IFS) is then adopted to search the optimal features. Finally, a random forest (RF) module is used to predict FIRs based on the optimal features. Using a 5-fold cross-validation test, the proposed method performs well, with a sensitivity of 0.847, a specificity of 0.933, an accuracy of 0.890, and a Matthews correlation coefficient (MCC) of 0.782, thereby outperforming previous methods. These results indicate that our method is relatively successful at predicting FIRs.
Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Flavina-Adenina Dinucleotídeo/química , Algoritmos , Aminoácidos/química , Teorema de Bayes , Simulação por Computador , Bases de Dados de Proteínas , Mononucleotídeo de Flavina/química , Humanos , Ligantes , Ligação Proteica , Proteínas/química , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Solventes/químicaRESUMO
Anti-angiogenic peptides perform distinct physiological functions and potential therapies for angiogenesis-related diseases. Accurate identification of anti-angiogenic peptides may provide significant clues to understand the essential angiogenic homeostasis within tissues and develop antineoplastic therapies. In this study, an ensemble predictor is proposed for anti-angiogenic peptide prediction by fusing an individual classifier with the best sensitivity and another individual one with the best specificity. We investigate predictive capabilities of various feature spaces with respect to the corresponding optimal individual classifiers and ensemble classifiers. The accuracy and Matthew's Correlation Coefficient (MCC) of the ensemble classifier trained by Bi-profile Bayes (BpB) features are 0.822 and 0.649, respectively, which represents the highest prediction results among the investigated prediction models. Discriminative features are obtained from BpB using the Relief algorithm followed by the Incremental Feature Selection (IFS) method. The sensitivity, specificity, accuracy, and MCC of the ensemble classifier trained by the discriminative features reach up to 0.776, 0.888, 0.832, and 0.668, respectively. Experimental results indicate that the proposed method is far superior to the previous study for anti-angiogenic peptide prediction.