Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38975896

RESUMEN

Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein-DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.


Asunto(s)
Biología Computacional , Proteínas de Unión al ADN , ADN , Redes Neurales de la Computación , Sitios de Unión , ADN/metabolismo , ADN/química , Proteínas de Unión al ADN/metabolismo , Proteínas de Unión al ADN/química , Biología Computacional/métodos , Algoritmos , Unión Proteica
2.
BMC Bioinformatics ; 25(1): 224, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38918692

RESUMEN

Promoters are essential elements of DNA sequence, usually located in the immediate region of the gene transcription start sites, and play a critical role in the regulation of gene transcription. Its importance in molecular biology and genetics has attracted the research interest of researchers, and it has become a consensus to seek a computational method to efficiently identify promoters. Still, existing methods suffer from imbalanced recognition capabilities for positive and negative samples, and their recognition effect can still be further improved. We conducted research on E. coli promoters and proposed a more advanced prediction model, iProL, based on the Longformer pre-trained model in the field of natural language processing. iProL does not rely on prior biological knowledge but simply uses promoter DNA sequences as plain text to identify promoters. It also combines one-dimensional convolutional neural networks and bidirectional long short-term memory to extract both local and global features. Experimental results show that iProL has a more balanced and superior performance than currently published methods. Additionally, we constructed a novel independent test set following the previous specification and compared iProL with three existing methods on this independent test set.


Asunto(s)
Escherichia coli , Regiones Promotoras Genéticas , Escherichia coli/genética , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , Redes Neurales de la Computación , Algoritmos , Procesamiento de Lenguaje Natural
3.
Artículo en Inglés | MEDLINE | ID: mdl-37831572

RESUMEN

As a highly contagious disease, COVID-19 has not only had a great impact on the life, study and work of hundreds of millions of people around the world, but also had a huge impact on the global health care system. Therefore, any technical tool that allows for rapid screening and high-precision diagnosis of COVID-19 infections can be of vital help. In order to reduce the burden on health care system, the computer-aided diagnosis of COVID-19 has become a current research hotspot. X-ray imaging is a common and low-cost tool that can help with the COVID-19 diagnosis. The data used for this study has 15,153 CXR images, containing 10,192 normal lungs, 3,631 COVID-19 positive cases and 1,345 images of viral pneumonia. For this computer-aided task, we propose the dual-ended multiple attention learning model (DMAL). The model incorporates multiple attention learning into both networks, and the two networks are linked using an integration module. Specifically, in both networks, the backbone network is used to extract global features and the branch network captures local area information; the integration module combines multi-stage features; and the attention module containing element, channel and spatial attention prompts the model to focus on multi-scale information relevant to the disease. We evaluate the proposed DMAL network using relevant competitive methods as well as ten advanced deep learning models in the image domain and obtain the best performance with 99.67%, 99.53%, 99.66%, 99.60% and 99.76% in terms of Accuracy, Precision, Sensitivity, F1 Scores and Specificity. The proposed method will help in the rapid screening and high-precision diagnosis of COVID-19, given the general trend of such severe global infections. Our code and model are available in [https://github.com/Graziagh/DMALNet].

4.
BMC Bioinformatics ; 24(1): 333, 2023 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-37674125

RESUMEN

BACKGROUND: Hepatitis C is a prevalent disease that poses a high risk to the human liver. Early diagnosis of hepatitis C is crucial for treatment and prognosis. Therefore, developing an effective medical decision system is essential. In recent years, many computational methods have been proposed to identify hepatitis C patients. Although existing hepatitis prediction models have achieved good results in terms of accuracy, most of them are black-box models and cannot gain the trust of doctors and patients in clinical practice. As a result, this study aims to use various Machine Learning (ML) models to predict whether a patient has hepatitis C, while also using explainable models to elucidate the prediction process of the ML models, thus making the prediction process more transparent. RESULT: We conducted a study on the prediction of hepatitis C based on serological testing and provided comprehensive explanations for the prediction process. Throughout the experiment, we modeled the benchmark dataset, and evaluated model performance using fivefold cross-validation and independent testing experiments. After evaluating three types of black-box machine learning models, Random Forest (RF), Support Vector Machine (SVM), and AdaBoost, we adopted Bayesian-optimized RF as the classification algorithm. In terms of model interpretation, in addition to using common SHapley Additive exPlanations (SHAP) to provide global explanations for the model, we also utilized the Local Interpretable Model-Agnostic Explanations with stability (LIME_stabilitly) to provide local explanations for the model. CONCLUSION: Both the fivefold cross-validation and independent testing show that our proposed method significantly outperforms the state-of-the-art method. IHCP maintains excellent model interpretability while obtaining excellent predictive performance. This helps uncover potential predictive patterns of the model and enables clinicians to better understand the model's decision-making process.


Asunto(s)
Hepatitis C , Humanos , Teorema de Bayes , Hepatitis C/diagnóstico , Hepacivirus , Aprendizaje Automático
5.
PLoS One ; 18(9): e0291961, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37733828

RESUMEN

Coronaviruses have affected the lives of people around the world. Increasingly, studies have indicated that the virus is mutating and becoming more contagious. Hence, the pressing priority is to swiftly and accurately predict patient outcomes. In addition, physicians and patients increasingly need interpretability when building machine models in healthcare. We propose an interpretable machine framework(KISM) that can diagnose and prognose patients based on blood test datasets. First, we use k-nearest neighbors, isolated forests, and SMOTE to pre-process the original blood test datasets. Seven machine learning tools Support Vector Machine, Extra Tree, Random Forest, Gradient Boosting Decision Tree, eXtreme Gradient Boosting, Logistic Regression, and ensemble learning were then used to diagnose and predict COVID-19. In addition, we used SHAP and scikit-learn post-hoc interpretability to report feature importance, allowing healthcare professionals and artificial intelligence models to interact to suggest biomarkers that some doctors may have missed. The 10-fold cross-validation of two public datasets shows that the performance of KISM is better than that of the current state-of-the-art methods. In the diagnostic COVID-19 task, an AUC value of 0.9869 and an accuracy of 0.9787 were obtained, and ultimately Leukocytes, platelets, and Proteina C reativa mg/dL were found to be the most indicative biomarkers for the diagnosis of COVID-19. An AUC value of 0.9949 and an accuracy of 0.9677 were obtained in the prognostic COVID-19 task and Age, LYMPH, and WBC were found to be the most indicative biomarkers for identifying the severity of the patient.


Asunto(s)
COVID-19 , Humanos , COVID-19/diagnóstico , Inteligencia Artificial , Pronóstico , Aprendizaje Automático , Plaquetas , Prueba de COVID-19
6.
BMC Bioinformatics ; 24(1): 261, 2023 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-37349705

RESUMEN

BACKGROUND: Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders characterized by difficulty communicating with society and others, behavioral difficulties, and a brain that processes information differently than normal. Genetics has a strong impact on ASD associated with early onset and distinctive signs. Currently, all known ASD risk genes are able to encode proteins, and some de novo mutations disrupting protein-coding genes have been demonstrated to cause ASD. Next-generation sequencing technology enables high-throughput identification of ASD risk RNAs. However, these efforts are time-consuming and expensive, so an efficient computational model for ASD risk gene prediction is necessary. RESULTS: In this study, we propose DeepASDPerd, a predictor for ASD risk RNA based on deep learning. Firstly, we use K-mer to feature encode the RNA transcript sequences, and then fuse them with corresponding gene expression values to construct a feature matrix. After combining chi-square test and logistic regression to select the best feature subset, we input them into a binary classification prediction model constructed by convolutional neural network and long short-term memory for training and classification. The results of the tenfold cross-validation proved our method outperformed the state-of-the-art methods. Dataset and source code are available at https://github.com/Onebear-X/DeepASDPred is freely available. CONCLUSIONS: Our experimental results show that DeepASDPred has outstanding performance in identifying ASD risk RNA genes.


Asunto(s)
Trastorno del Espectro Autista , Aprendizaje Profundo , Humanos , Trastorno del Espectro Autista/genética , ARN/genética , Redes Neurales de la Computación , Programas Informáticos
7.
Pest Manag Sci ; 79(5): 1922-1930, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-36658467

RESUMEN

BACKGROUND: Succinate dehydrogenase inhibitor (SDHI) fungicides are an important class of agricultural fungicides with the advantages of high efficiency and a broad bactericidal spectrum. To pursue novel SDHIs, a series of N-substituted dithiin tetracarboximide derivatives were designed, synthesized, and characterized by 1 H NMR, 13 C NMR, and high resolution mass spectrum (HRMS). RESULTS: These engineered compounds displayed potent fungicidal activity against phytopathogens, including Sclerotinia sclerotiorum, Botrytis cinerea, and Rhizoctonia solani, comparable with that of the commercial SDHI fungicide boscalid. In particular, compound 18 stood out with prominent activity against S. sclerotiorum with a half-maximal effective concentration (EC50 ) value of 1.37 µg ml-1 . Compound 1 exhibited the most potent antifungal activity against B. cinerea with EC50 values of 5.02 µg ml-1 . As for R. solani, 12 and 13 exhibited remarkably inhibitory activity with EC50 values of 4.26 and 5.76 µg ml-1 , respectively. In the succinate dehydrogenase (SDH) inhibition assay, 13 presented significant inhibitory activity with a half-maximal inhibitory concentration (IC50 ) value of 15.3 µm, which was approximately equivalent to that of boscalid (14.2 µm). Furthermore, molecular docking studies revealed that 13 could anchor in the binding site of SDH. CONCLUSION: Taken together, results suggested that the dithiin tetracarboximide scaffold possessed a huge potential to be developed as novel fungicides and SDHIs. © 2023 Society of Chemical Industry.


Asunto(s)
Antifúngicos , Fungicidas Industriales , Antifúngicos/química , Fungicidas Industriales/química , Relación Estructura-Actividad , Simulación del Acoplamiento Molecular , Succinato Deshidrogenasa
8.
Artículo en Inglés | MEDLINE | ID: mdl-35536814

RESUMEN

N6-methyladenosine (m6A) is a universal post-transcriptional modification of RNAs, and it is widely involved in various biological processes. Identifying m6A modification sites accurately is indispensable to further investigate m6A-mediated biological functions. How to better represent RNA sequences is crucial for building effective computational methods for detecting m6A modification sites. However, traditional encoding methods require complex biological prior knowledge and are time-consuming. Furthermore, most of the existing m6A sites prediction methods are limited to single species, and few methods are able to predict m6A sites across different species and tissues. Thus, it is necessary to design a more efficient computational method to predict m6A sites across multiple species and tissues. In this paper, we proposed ELMo4m6A, a contextual language embedding-based method for predicting m6A sites from RNA sequences without any prior knowledge. ELMo4m6A first learns embeddings of RNA sequences using a language model ELMo, then uses a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) to identify m6A sites. The results of 5-fold cross-validation and independent testing demonstrate that ELMo4m6A is superior to state-of-the-art methods. Moreover, we applied integrated gradients to find potential sequence patterns contributing to m6A sites.


Asunto(s)
Adenosina , ARN , ARN/genética , Adenosina/genética , Redes Neurales de la Computación , Análisis de Secuencia de ARN/métodos
9.
BMC Bioinformatics ; 23(1): 272, 2022 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-35820811

RESUMEN

BACKGROUND: Understanding the regulatory role of enhancer-promoter interactions (EPIs) on specific gene expression in cells contributes to the understanding of gene regulation, cell differentiation, etc., and its identification has been a challenging task. On the one hand, using traditional wet experimental methods to identify EPIs often means a lot of human labor and time costs. On the other hand, although the currently proposed computational methods have good recognition effects, they generally require a long training time. RESULTS: In this study, we studied the EPIs of six human cell lines and designed a cell line-specific EPIs prediction method based on a stacking ensemble learning strategy, which has better prediction performance and faster training speed, called StackEPI. Specifically, by combining different encoding schemes and machine learning methods, our prediction method can extract the cell line-specific effective information of enhancer and promoter gene sequences comprehensively and in many directions, and make accurate recognition of cell line-specific EPIs. Ultimately, the source code to implement StackEPI and experimental data involved in the experiment are available at https://github.com/20032303092/StackEPI.git . CONCLUSIONS: The comparison results show that our model can deliver better performance on the problem of identifying cell line-specific EPIs and outperform other state-of-the-art models. In addition, our model also has a more efficient computation speed.


Asunto(s)
Comunicación Celular , Secuencias Reguladoras de Ácidos Nucleicos , Línea Celular , Humanos , Aprendizaje Automático , Regiones Promotoras Genéticas
10.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34486019

RESUMEN

Long noncoding RNAs (lncRNAs) play important roles in various biological regulatory processes, and are closely related to the occurrence and development of diseases. Identifying lncRNA-disease associations is valuable for revealing the molecular mechanism of diseases and exploring treatment strategies. Thus, it is necessary to computationally predict lncRNA-disease associations as a complementary method for biological experiments. In this study, we proposed a novel prediction method GCRFLDA based on the graph convolutional matrix completion. GCRFLDA first constructed a graph using the available lncRNA-disease association information. Then, it constructed an encoder consisting of conditional random field and attention mechanism to learn efficient embeddings of nodes, and a decoder layer to score lncRNA-disease associations. In GCRFLDA, the Gaussian interaction profile kernels similarity and cosine similarity were fused as side information of lncRNA and disease nodes. Experimental results on four benchmark datasets show that GCRFLDA is superior to other existing methods. Moreover, we conducted case studies on four diseases and observed that 70 of 80 predicted associated lncRNAs were confirmed by the literature.


Asunto(s)
ARN Largo no Codificante , Algoritmos , Biología Computacional/métodos , ARN Largo no Codificante/genética , Proyectos de Investigación
11.
Int J Health Plann Manage ; 37(1): 242-257, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34536240

RESUMEN

This study investigates the nexus between tourism, CO2 emissions and health spending in Mexico. We applied a nonlinear ARDL approach for the empirical analysis for the time period 1996-2018. Mexico receives a large number of tourists each year, tourism improves foreign exchange earnings and contributes positively to the economic growth. However, tourist activities impose a serious environmental cost in terms of CO2 emissions which increase health spending. The empirical findings suggest that tourism leads to CO2 emissions which resultantly causes a high level of health spending in Mexico. Both short-run and long-run findings reported a significant positive association between tourism, CO2 emissions, and health expenditures. Therefore, the government needs legislation to reduce CO2 emissions, besides the use of renewable energy could also help to reduce the CO2 emissions and health expenditures in society. This study does not support to reduce the health expenditure, rather it suggests optimal utilization of the funds allocated to the health sector.


Asunto(s)
Dióxido de Carbono , Turismo , Dióxido de Carbono/análisis , Desarrollo Económico , México , Energía Renovable
12.
BMC Bioinformatics ; 22(1): 516, 2021 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-34688247

RESUMEN

BACKGROUND: The origin is the starting site of DNA replication, an extremely vital part of the informational inheritance between parents and children. More importantly, accurately identifying the origin of replication has great application value in the diagnosis and treatment of diseases related to genetic information errors, while the traditional biological experimental methods are time-consuming and laborious. RESULTS: We carried out research on the origin of replication in a variety of eukaryotes and proposed a unique prediction method for each species. Throughout the experiment, we collected data from 7 species, including Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Kluyveromyces lactis, Pichia pastoris and Schizosaccharomyces pombe. In addition to the commonly used sequence feature extraction methods PseKNC-II and Base-content, we designed a feature extraction method based on TF-IDF. Then the two-step method was utilized for feature selection. After comparing a variety of traditional machine learning classification models, the multi-layer perceptron was employed as the classification algorithm. Ultimately, the data and codes involved in the experiment are available at https://github.com/Sarahyouzi/EukOriginPredict . CONCLUSIONS: The prediction accuracy of the training set of the above-mentioned seven species after 100 times fivefold cross validation reach 92.60%, 90.80%, 91.22%, 96.15%, 96.72%, 99.86%, 96.72%, respectively. It denotes that compared with other methods, the methods we designed could accomplish superior performance. In addition, our experiments reveals that the models of multiple species could predict each other with high accuracy, and the results of STREME shows that they have a certain common motif.


Asunto(s)
Drosophila melanogaster , Eucariontes , Animales , Drosophila melanogaster/genética , Kluyveromyces , Ratones , Redes Neurales de la Computación , Saccharomycetales
13.
Pest Manag Sci ; 77(11): 5109-5119, 2021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34240541

RESUMEN

BACKGROUND: The worldwide reduction in food production due to pests and diseases is still an important challenge facing today. Validoxylamine A (VAA) is a natural polyhydroxyl compound derived from validamycin, acting as an efficient trehalase inhibitor with insecticidal and antifungal activities. To extend the application and discover green pesticide, a series of ester derivatives were prepared based on VAA as a lead compound. Their biological activities were investigated against three typically agricultural disease, Rhizoctonia solani, Sclerotinia sclerotiorum and Aphis craccivora. RESULTS: This study involved 30 novel validoxylamine A fatty acid esters (VAFAEs) synthesized by Novozym 435 and they were characterized with high-resolution electrospray ionization mass spectrometry (HR-ESI-MS) and proton nuclear magnetic resonance (1 H-NMR). Of these 30 derivatives, most compounds showed improved antifungal activity, and 12 novel compounds showed improved insecticidal activity. When reacted with pentadecanoic acid, compound 14 showed the highest inhibitory activity against R. solani [median effective concentration (EC50 ) 0.01 µmol L-1 ], while the EC50 value of VAA was 34.99 µmol L-1 . Furthermore, 21 novel VAFAEs showed higher inhibitory activity against S. sclerotiorum. Validoxylamine A oleic acid ester, compound 21, exhibited the highest insecticidal activity against A. craccivora [median lethal concentration (LC50 ) 39.63 µmol L-1 ], while the LC50 value of Pymetrozine was 50.45 µmol L-1 , a commercialized pesticide against A. craccivora. CONCLUSION: Combining our results, esterification of VAA by introducing different acyl donors was beneficial for the development of new eco-friendly drugs in the field of pesticides.


Asunto(s)
Ésteres , Ascomicetos , Inositol/análogos & derivados , Rhizoctonia , Relación Estructura-Actividad
14.
BMC Bioinformatics ; 22(1): 14, 2021 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-33413088

RESUMEN

BACKGROUND: With the development of deep learning (DL), more and more methods based on deep learning are proposed and achieve state-of-the-art performance in biomedical image segmentation. However, these methods are usually complex and require the support of powerful computing resources. According to the actual situation, it is impractical that we use huge computing resources in clinical situations. Thus, it is significant to develop accurate DL based biomedical image segmentation methods which depend on resources-constraint computing. RESULTS: A lightweight and multiscale network called PyConvU-Net is proposed to potentially work with low-resources computing. Through strictly controlled experiments, PyConvU-Net predictions have a good performance on three biomedical image segmentation tasks with the fewest parameters. CONCLUSIONS: Our experimental results preliminarily demonstrate the potential of proposed PyConvU-Net in biomedical image segmentation with resources-constraint computing.


Asunto(s)
Aprendizaje Profundo , Interpretación de Imagen Asistida por Computador , Programas Informáticos
15.
Artículo en Inglés | MEDLINE | ID: mdl-32850711

RESUMEN

Plenty of microbes in our human body play a vital role in the process of cell physiology. In recent years, there is accumulating evidence indicating that microbes are closely related to many complex human diseases. In-depth investigation of disease-associated microbes can contribute to understanding the pathogenesis of diseases and thus provide novel strategies for the treatment, diagnosis, and prevention of diseases. To date, many computational models have been proposed for predicting microbe-disease associations using available similarity networks. However, these similarity networks are not effectively fused. In this study, we proposed a novel computational model based on multi-data integration and network consistency projection for Human Microbe-Disease Associations Prediction (HMDA-Pred), which fuses multiple similarity networks by a linear network fusion method. HMDA-Pred yielded AUC values of 0.9589 and 0.9361 ± 0.0037 in the experiments of leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. Furthermore, in case studies, 10, 8, and 10 out of the top 10 predicted microbes of asthma, colon cancer, and inflammatory bowel disease were confirmed by the literatures, respectively.

16.
PLoS One ; 15(5): e0228479, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32413030

RESUMEN

Terminator is a DNA sequence that gives the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method "iterb-PPse" for terminators by incorporating 47 nucleotide properties into PseKNC-Ⅰ and PseKNC-Ⅱ and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis. Combing with the preceding methods, we employed three new feature extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times five-fold cross-validation test. Its prediction accuracy for two independent datasets reached 94.24% and 99.45% respectively. For the convenience of users, we developed a software on the basis of "iterb-PPse" with the same name. The open software and source code of "iterb-PPse" are available at https://github.com/Sarahyouzi/iterb-PPse.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Programas Informáticos , Regiones Terminadoras Genéticas , Bacillus subtilis , ADN Bacteriano/química , ADN Bacteriano/genética , Escherichia coli , ARN Bacteriano/química , ARN Bacteriano/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Factor Rho/metabolismo , Terminación de la Transcripción Genética
17.
RSC Adv ; 10(20): 11634-11642, 2020 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-35496629

RESUMEN

LncRNA and miRNA are two non-coding RNA types that are popular in current research. LncRNA interacts with miRNA to regulate gene transcription, further affecting human health and disease. Accurate identification of lncRNA-miRNA interactions contributes to the in-depth study of the biological functions and mechanisms of non-coding RNA. However, relying on biological experiments to obtain interaction information is time-consuming and expensive. Considering the rapid accumulation of gene information and the few computational methods, it is urgent to supplement the effective computational models to predict lncRNA-miRNA interactions. In this work, we propose a heterogeneous graph inference method based on similarity network fusion (SNFHGILMI) to predict potential lncRNA-miRNA interactions. First, we calculated multiple similarity data, including lncRNA sequence similarity, miRNA sequence similarity, lncRNA Gaussian nuclear similarity, and miRNA Gaussian nuclear similarity. Second, the similarity network fusion method was employed to integrate the data and get the similarity network of lncRNA and miRNA. Then, we constructed a bipartite network by combining the known interaction network and similarity network of lncRNA and miRNA. Finally, the heterogeneous graph inference method was introduced to construct a prediction model. On the real dataset, the model SNFHGILMI achieved AUC of 0.9501 and 0.9426 ± 0.0035 based on LOOCV and 5-fold cross validation, respectively. Furthermore, case studies also demonstrate that SNFHGILMI is a high-performance prediction method that can accurately predict new lncRNA-miRNA interactions. The Matlab code and readme file of SNFHGILMI can be downloaded from https://github.com/cj-DaSE/SNFHGILMI.

18.
AMB Express ; 9(1): 94, 2019 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-31254161

RESUMEN

α-Arbutin is an effective skin-whitening cosmetic ingredient and hyperpigmentation therapy agent. It can be synthesized by one-step enzymatic glycosylation of hydroquinone (HQ), but limited by the low yield. Amylosucrase (Amy-1) from Xanthomonas campestris pv. campestris 8004 was recently identified with high HQ glycosylation activity. In this study, whole-cell transformation by Amy-1 was optimized and process scale-up was evaluated in 5000-L reactor. In comparison with purified Amy-1, whole-cell catalyst of recombinant E. coli displays better tolerance against inhibitors (oxidized products of HQ) and requires lower molar ratio of sucrose and HQ to reach high conversion rate (> 99%). Excess accumulation of glucose (0.6-1.0 M) derived from sucrose hydrolysis inhibits HQ glycosylation rate by 46-60%, which suggests the importance of balancing HQ glycosylation rate and sucrose hydrolysis rate by adjusting the activity of whole-cell catalyst and HQ-fed rate. Using optimal conditions, 540 mM of final concentration and 95% of molar conversion rate were obtained within 13-18 h in laboratory scale. For industrial scale-up production, 398 mM and 375 mM of final concentration with high conversion rates (~ 95%) were obtained in 3500-L and 4000-L of reaction volume, respectively. These yields and productivities (4.5-4.9 kg kL-1 h-1) were the highest by comparing to the best we known. Hence, high-yield production of α-arbutin by batch-feeding whole-cell biotransformation was successfully achieved in the 5000-L reaction scale.

19.
J Ind Microbiol Biotechnol ; 46(6): 759-767, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30820723

RESUMEN

α-Arbutin is an effective skin-whitening cosmetic ingredient and can be synthesized through hydroquinone glycosylation. In this study, amylosucrase (Amy-1) from Xanthomonas campestris pv. campestris 8004 was newly identified as a sucrose-utilizing glycosylating hydroquinone enzyme. Its kinetic parameters showed a seven-time higher affinity to hydroquinone than maltose-utilizing α-glycosidase. The glycosylation of HQ can be quickly achieved with over 99% conversion when a high molar ratio of glycoside donor to acceptor (80:1) was used. A batch-feeding catalysis method was designed to eliminate HQ inhibition with high productivity (> 36.4 mM h-1). Besides, to eliminate the serious inhibition caused by the accumulated hydroquinone oxidation products, the whole-cell catalysis was further proposed. 306 mM of α-arbutin was finally achieved with 95% molar conversion rate within 15 h. Hence, the batch-feeding whole-cell biocatalysis by Amy-1 is a promising technology for α-arbutin production with enhanced yield and molar conversion rate.


Asunto(s)
Arbutina/biosíntesis , Glucosiltransferasas/metabolismo , Hidroquinonas/metabolismo , Xanthomonas campestris/metabolismo , Biocatálisis , Cosméticos , Glicosilación , Oxidación-Reducción
20.
Methods Mol Biol ; 1915: 111-120, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30617800

RESUMEN

Calpains are a family of Ca2+-dependent cysteine proteases involved in many important biological processes, where they selectively cleave relevant substrates at specific cleavage sites to regulate the function of the substrate proteins. Presently, our knowledge about the function of calpains and the mechanism of substrate cleavage is still limited due to the fact that the experimental determination and validation on calpain bindings are usually laborious and expensive. This chapter describes LabCaS, an algorithm that is designed for predicting the calpain substrate cleavage sites from amino acid sequences. LabCaS is built on a conditional random field (CRF) statistic model, which trains the cleavage site prediction on multiple features of amino acid residue preference, solvent accessibility information, pair-wise alignment similarity score, secondary structure propensity, and physical-chemistry properties. Large-scale benchmark tests have shown that LabCaS can achieve a reliable recognition of the cleavage sites for most calpain proteins with an average AUC score of 0.862. Due to the fast speed and convenience of use, the protocol should find its usefulness in large-scale calpain-based function annotations of the newly sequenced proteins. The online web server of LabCaS is freely available at http://www.csbio.sjtu.edu.cn/bioinf/LabCaS .


Asunto(s)
Secuencia de Aminoácidos/genética , Calpaína/química , Modelos Estadísticos , Biología Molecular/métodos , Algoritmos , Sitios de Unión , Calpaína/genética , Proteolisis , Especificidad por Sustrato
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...