Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36528809

RESUMEN

MOTIVATION: Exploring the potential long noncoding RNA (lncRNA)-disease associations (LDAs) plays a critical role for understanding disease etiology and pathogenesis. Given the high cost of biological experiments, developing a computational method is a practical necessity to effectively accelerate experimental screening process of candidate LDAs. However, under the high sparsity of LDA dataset, many computational models hardly exploit enough knowledge to learn comprehensive patterns of node representations. Moreover, although the metapath-based GNN has been recently introduced into LDA prediction, it discards intermediate nodes along the meta-path and results in information loss. RESULTS: This paper presents a new multi-view contrastive heterogeneous graph attention network (GAT) for lncRNA-disease association prediction, MCHNLDA for brevity. Specifically, MCHNLDA firstly leverages rich biological data sources of lncRNA, gene and disease to construct two-view graphs, feature structural graph of feature schema view and lncRNA-gene-disease heterogeneous graph of network topology view. Then, we design a cross-contrastive learning task to collaboratively guide graph embeddings of the two views without relying on any labels. In this way, we can pull closer the nodes of similar features and network topology, and push other nodes away. Furthermore, we propose a heterogeneous contextual GAT, where long short-term memory network is incorporated into attention mechanism to effectively capture sequential structure information along the meta-path. Extensive experimental comparisons against several state-of-the-art methods show the effectiveness of proposed framework.The code and data of proposed framework is freely available at https://github.com/zhaoxs686/MCHNLDA.


Asunto(s)
ARN Largo no Codificante , ARN Largo no Codificante/genética , Aprendizaje
2.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36907654

RESUMEN

In recent years, many experiments have proved that microRNAs (miRNAs) play a variety of important regulatory roles in cells, and their abnormal expression can lead to the emergence of specific diseases. Therefore, it is greatly valuable to do research on the association between miRNAs and diseases, which can effectively help prevent and treat miRNA-related diseases. At present, effective computational methods still need to be developed to better identify potential miRNA-disease associations. Inspired by graph convolutional networks, in this study, we propose a new method based on Attention aware Multi-view similarity networks and Hypergraph learning for MiRNA-Disease Associations identification (AMHMDA). First, we construct multiple similarity networks for miRNAs and diseases, and exploit the graph convolutional networks fusion attention mechanism to obtain the important information from different views. Then, in order to obtain high-quality links and richer nodes information, we introduce a kind of virtual nodes called hypernodes to construct heterogeneous hypergraph of miRNAs and diseases. Finally, we employ the attention mechanism to fuse the outputs of graph convolutional networks, predicting miRNA-disease associations. To verify the effectiveness of this method, we carry out a series of experiments on the Human MicroRNA Disease Database (HMDD v3.2). The experimental results show that AMHMDA has good performance compared with other methods. In addition, the case study results also fully demonstrate the reliable predictive performance of AMHMDA.


Asunto(s)
MicroARNs , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Predisposición Genética a la Enfermedad , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas
3.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34585231

RESUMEN

MOTIVATION: Discovering long noncoding RNA (lncRNA)-disease associations is a fundamental and critical part in understanding disease etiology and pathogenesis. However, only a few lncRNA-disease associations have been identified because of the time-consuming and expensive biological experiments. As a result, an efficient computational method is of great importance and urgently needed for identifying potential lncRNA-disease associations. With the ability of exploiting node features and relationships in network, graph-based learning models have been commonly utilized by these biomolecular association predictions. However, the capability of these methods in comprehensively fusing node features, heterogeneous topological structures and semantic information is distant from optimal or even satisfactory. Moreover, there are still limitations in modeling complex associations between lncRNAs and diseases. RESULTS: In this paper, we develop a novel heterogeneous graph attention network framework based on meta-paths for predicting lncRNA-disease associations, denoted as HGATLDA. At first, we conduct a heterogeneous network by incorporating lncRNA and disease feature structural graphs, and lncRNA-disease topological structural graph. Then, for the heterogeneous graph, we conduct multiple metapath-based subgraphs and then utilize graph attention network to learn node embeddings from neighbors of these homogeneous and heterogeneous subgraphs. Next, we implement attention mechanism to adaptively assign weights to multiple metapath-based subgraphs and get more semantic information. In addition, we combine neural inductive matrix completion to reconstruct lncRNA-disease associations, which is applied for capturing complicated associations between lncRNAs and diseases. Moreover, we incorporate cost-sensitive neural network into the loss function to tackle the commonly imbalance problem in lncRNA-disease association prediction. Finally, extensive experimental results demonstrate the effectiveness of our proposed framework.


Asunto(s)
ARN Largo no Codificante , Biología Computacional/métodos , Redes Neurales de la Computación , ARN Largo no Codificante/genética
4.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35224614

RESUMEN

Accurate identification of drug-target interactions (DTIs) plays a crucial role in drug discovery. Compared with traditional experimental methods that are labor-intensive and time-consuming, computational methods are more and more popular in recent years. Conventional computational methods almost simply view heterogeneous networks which integrate diverse drug-related and target-related dataset instead of fully exploring drug and target similarities. In this paper, we propose a new method, named DTIHNC, for $\mathbf{D}$rug-$\mathbf{T}$arget $\mathbf{I}$nteraction identification, which integrates $\mathbf{H}$eterogeneous $\mathbf{N}$etworks and $\mathbf{C}$ross-modal similarities calculated by relations between drugs, proteins, diseases and side effects. Firstly, the low-dimensional features of drugs, proteins, diseases and side effects are obtained from original features by a denoising autoencoder. Then, we construct a heterogeneous network across drug, protein, disease and side-effect nodes. In heterogeneous network, we exploit the heterogeneous graph attention operations to update the embedding of a node based on information in its 1-hop neighbors, and for multi-hop neighbor information, we propose random walk with restart aware graph attention to integrate more information through a larger neighborhood region. Next, we calculate cross-modal drug and protein similarities from cross-scale relations between drugs, proteins, diseases and side effects. Finally, a multiple-layer convolutional neural network deeply integrates similarity information of drugs and proteins with the embedding features obtained from heterogeneous graph attention network. Experiments have demonstrated its effectiveness and better performance than state-of-the-art methods. Datasets and a stand-alone package are provided on Github with website https://github.com/ningq669/DTIHNC.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Redes Neurales de la Computación , Descubrimiento de Drogas , Interacciones Farmacológicas , Humanos , Proteínas/metabolismo
5.
J Theor Biol ; 467: 39-47, 2019 04 21.
Artículo en Inglés | MEDLINE | ID: mdl-30711452

RESUMEN

N6-methyladenosine (m6A) is the one of the most important RNA modifications, playing the role of splicing events, mRNA exporting and stability to cell differentiation. Because of wide distribution of m6A in genes, identification of m6A sites in RNA sequences has significant importance for basic biomedical research and drug development. High-throughput laboratory methods are time consuming and costly. Nowadays, effective computational methods are much desirable because of its convenience and fast speed. Thus, in this article, we proposed a new method to improve the performance of the m6A prediction by using the combined features of deep features and original features with extreme gradient boosting optimized by particle swarm optimization (PXGB). The proposed PXGB algorithm uses three kinds of features, i.e., position-specific nucleotide propensity (PSNP), position-specific dinucleotide propensity (PSDP), and the traditional nucleotide composition (NC). By 10-fold cross validation, the performance of PXGB was measured with an AUC of 0.8390 and an MCC of 0.5234. Additionally, PXGB was compared with the existing methods, and the higher MCC and AUC of PXGB demonstrated that PXGB was effective to predict m6A sites. The predictor proposed in this study might help to predict more m6A sites and guide related experimental validation.


Asunto(s)
Adenosina/análogos & derivados , Secuencia de Bases/genética , Biología Computacional/métodos , Adenosina/análisis , Algoritmos , Animales , Área Bajo la Curva , Humanos
6.
Molecules ; 22(11)2017 Nov 03.
Artículo en Inglés | MEDLINE | ID: mdl-29099805

RESUMEN

Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins so that the identification of the glycation sites may provide some useful guidelines to understand various biological functions of proteins. In this study, we proposed an accurate prediction tool, named Glypre, for lysine glycation. Firstly, we used multiple informative features to encode the peptides. These features included the position scoring function, secondary structure, AAindex, and the composition of k-spaced amino acid pairs. Secondly, the distribution of distinctive features of the residues surrounding the glycation and non-glycation sites was statistically analysed. Thirdly, based on the distribution of these features, we developed a new predictor by using different optimal window sizes for different properties and a two-step feature selection method, which utilized the maximum relevance minimum redundancy method followed by a greedy feature selection procedure. The performance of Glypre was measured with a sensitivity of 57.47%, a specificity of 90.78%, an accuracy of 79.68%, area under the receiver-operating characteristic (ROC) curve (AUC) of 0.86, and a Matthews's correlation coefficient (MCC) of 0.52 by 10-fold cross-validation. The detailed analysis results showed that our predictor may play a complementary role to other existing methods for identifying protein lysine glycation. The source code and datasets of the Glypre are available in the Supplementary File.


Asunto(s)
Aminoácidos/química , Simulación por Computador , Proteínas/química , Máquina de Vectores de Soporte , Algoritmos , Área Bajo la Curva , Sitios de Unión , Glicosilación , Lisina/química , Curva ROC , Sensibilidad y Especificidad
7.
Sci Total Environ ; 918: 170841, 2024 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-38340841

RESUMEN

The ecological effects of climate change and ocean acidification (OA) have been extensively studied. Various microalgae are ecologically important in the overall pelagic food web as key contributors to oceanic primary productivity. Additionally, no organism exists in isolation in a complex environment, and shifts in food quality may lead to indirect OA effects on consumers. This study aims to investigate the potential effects of OA on algal trophic composition and subsequent bivalve growth. Here, the growth and nutrient fractions of Chlorella sp., Phaeodactylum tricornutum and Chaetocetos muelleri were used to synthesize and assess the impact of OA on primary productivity. Total protein content, total phenolic compounds, and amino acid (AA) and fatty acid (FA) content were evaluated as nutritional indicators. The results demonstrated that the three microalgae responded positively to OA in the future environment, significantly enhancing growth performance and nutritional value as a food source. Additionally, certain macromolecular fractions found in consumers are closely linked to their dietary sources, such as phenylalanine, C14:0, C16:0, C16:1, C20:1n9, C18:0, and C18:3n. Our findings illustrate that OA affects a wide range of crucial primary producers in the oceans, which can disrupt nutrient delivery and have profound impacts on the entire marine ecosystem and human food health.


Asunto(s)
Chlorella , Microalgas , Humanos , Ecosistema , Concentración de Iones de Hidrógeno , Valor Nutritivo , Acidificación de los Océanos , Océanos y Mares , Agua de Mar/química , Mariscos , Animales
8.
Mar Pollut Bull ; 205: 116658, 2024 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-38964192

RESUMEN

Offshore coastal marine ranching ecosystems provide habitat for diverse and active bacterial communities. In this study, 16S rRNA gene sequencing and multiple bioinformatics methods were applied to investigate assembly dynamics and relationships in different habitats. The higher number of edges in the water network, more balanced ratio of positive and negative links, and more keystone species included in the co-occurrence network of water. Stochastic processes dominated in shaping gut and sediment community assembly (R2 < 0.5), while water bacterial community assembly were dominated by deterministic processes (R2 > 0.5). Dissimilarity-overlap curve model indicated that the communities in different habitats have general dynamics and interspecific interaction (P < 0.001). Bacterial source-tracking analysis revealed that the gut was more similar to the sediment than the water bacterial communities. In summary, this study provides basic data for the ecological study of marine ranching through the study of bacterial community dynamics.

9.
RSC Adv ; 13(37): 25888-25894, 2023 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-37655352

RESUMEN

Deep eutectic solvents (DESs) have been extensively studied as promising green solvents to attain a better removal efficiency of sulfide. A new DES system formed from choline chloride (ChCl), benzene sulfonic acid (BSA), and ethylene glycol (EG) as a class of ternary DESs was prepared and used in the oxidative desulfurization (ODS) of different sulfides. Ternary DESs have distinct advantages such as volatility and high activity compared with organic acid-based binary DESs. Under the optimum conditions with VDES/VOil = 1 : 5, O/S (molar ratio of oxygen to sulfur) = 5, and T = 25 °C, the desulfurization efficiencies of dibenzothiophene (DBT), 4,6-dimethyldibenzothiophene (4,6-DMDBT), and benzothiophene (BT) were all achieved to 100% in 2 h. Through experimental and density functional theory (DFT) calculation methods, this new system as a class of ternary DESs shows good stability and excellent desulfurization performance at room temperature. The investigation of this study could supply a new idea of ternary DESs for oxidative desulfurization.

10.
Mar Pollut Bull ; 197: 115739, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37925991

RESUMEN

Offshore coastal marine ranching ecosystems are one of the most productive ecosystems. The results showed that the composition and structure of the microbial communities varied considerably with the season. Co-occurrence network analysis demonstrated that the microbial network was more complex in summer and positively correlated links (cooperative or symbiotic) were dominated in autumn and winter. Null model indicated that the ecological processes of the bacterial communities were mainly governed by deterministic processes (mainly homogeneous selection) in summer. For microeukaryotic communities, assembly processes were more regulated by stochastic processes in all seasons. For rare taxa, assembly processes were regulated by stochastic processes and were not affected by seasonality. Changes in water temperature due to seasonal variations were the main, but not the only, environmental factor driving changes in microbial communities. This study will improve the understanding of offshore coastal ecosystems through the perspective of microbial ecology.


Asunto(s)
Microbiota , Estaciones del Año , Temperatura , Consorcios Microbianos , Bacterias
11.
Nat Commun ; 14(1): 7554, 2023 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-37985761

RESUMEN

Lunar surface chemistry is essential for revealing petrological characteristics to understand the evolution of the Moon. Existing chemistry mapping from Apollo and Luna returned samples could only calibrate chemical features before 3.0 Gyr, missing the critical late period of the Moon. Here we present major oxides chemistry maps by adding distinctive 2.0 Gyr Chang'e-5 lunar soil samples in combination with a deep learning-based inversion model. The inferred chemical contents are more precise than the Lunar Prospector Gamma-Ray Spectrometer (GRS) maps and are closest to returned samples abundances compared to existing literature. The verification of in situ measurement data acquired by Chang'e 3 and Chang'e 4 lunar rover demonstrated that Chang'e-5 samples are indispensable ground truth in mapping lunar surface chemistry. From these maps, young mare basalt units are determined which can be potential sites in future sample return mission to constrain the late lunar magmatic and thermal history.

12.
Int J Mol Sci ; 13(2): 2196-2207, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22408447

RESUMEN

Antifreeze proteins (AFPs) are ice-binding proteins. Accurate identification of new AFPs is important in understanding ice-protein interactions and creating novel ice-binding domains in other proteins. In this paper, an accurate method, called AFP_PSSM, has been developed for predicting antifreeze proteins using a support vector machine (SVM) and position specific scoring matrix (PSSM) profiles. This is the first study in which evolutionary information in the form of PSSM profiles has been successfully used for predicting antifreeze proteins. Tested by 10-fold cross validation and independent test, the accuracy of the proposed method reaches 82.67% for the training dataset and 93.01% for the testing dataset, respectively. These results indicate that our predictor is a useful tool for predicting antifreeze proteins. A web server (AFP_PSSM) that implements the proposed predictor is freely available.


Asunto(s)
Proteínas Anticongelantes , Biología Computacional/métodos , Evolución Molecular , Máquina de Vectores de Soporte , Algoritmos , Secuencia de Aminoácidos , Aminoácidos/química , Proteínas Anticongelantes/química , Proteínas Anticongelantes/genética , Bases de Datos de Proteínas , Interfaz Usuario-Computador
13.
Int J Mol Sci ; 13(3): 3650-3660, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22489173

RESUMEN

Bioluminescent proteins are important for various cellular processes, such as gene expression analysis, drug discovery, bioluminescent imaging, toxicity determination, and DNA sequencing studies. Hence, the correct identification of bioluminescent proteins is of great importance both for helping genome annotation and providing a supplementary role to experimental research to obtain insight into bioluminescent proteins' functions. However, few computational methods are available for identifying bioluminescent proteins. Therefore, in this paper we develop a new method to predict bioluminescent proteins using a model based on position specific scoring matrix and auto covariance. Tested by 10-fold cross-validation and independent test, the accuracy of the proposed model reaches 85.17% for the training dataset and 90.71% for the testing dataset respectively. These results indicate that our predictor is a useful tool to predict bioluminescent proteins. This is the first study in which evolutionary information and local sequence environment information have been successfully integrated for predicting bioluminescent proteins. A web server (BLPre) that implements the proposed predictor is freely available.


Asunto(s)
Secuencia de Aminoácidos , Biología Computacional/métodos , Proteínas Luminiscentes/química , Proteínas Luminiscentes/aislamiento & purificación , Animales , Evolución Biológica , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Predicción , Modelos Teóricos , Análisis de Secuencia de Proteína , Máquina de Vectores de Soporte
14.
Artículo en Inglés | MEDLINE | ID: mdl-32750881

RESUMEN

Protein succinylation is a type of post-translational modification (PTM) that occurs on lysine sites and plays a key role in protein conformation regulation and cellular function control. When training in computational method, it is difficult to designate negative samples because of the uncertainty of non-succinylation lysine sites, and if not handled properly, it may affect the performance of computational models dramatically. Therefore, we propose a new semi-supervised learning method to identify reliable non-succinylation lysine sites as negative samples. This method, named SSKM_Succ, also employs K-means clustering to divide data into 5 clusters. Besides, information of proximal PTMs and three kinds of sequence features (grey pseudo amino acid composition, K-space and position-special amino acid propensity) are utilized to formulate protein. Then, we perform a two-step feature selection to remove redundant features and construct the optimization model for each cluster. Finally, support vector machine is applied to construct a prediction model for each cluster. Promising results are obtained by this method with an accuracy of 80.18 percent for succinylation sites on the independent testing dataset. Meanwhile, we compare the result with other existing tools, and it shows that our method is promising for predicting succinylation sites. Through analysis, we further verify that succinylated protein has potential effects on amino acid degradation and fatty acid metabolism, and speculate that protein succinylation may be closely related to neurodegenerative diseases. The code of SSKM_Succ is available on the web https://github.com/yangyq505/SSKM_Succ.git.


Asunto(s)
Algoritmos , Proteínas , Análisis por Conglomerados , Lisina , Proteínas/genética , Aprendizaje Automático Supervisado
15.
Math Biosci Eng ; 19(3): 3202-3222, 2022 01 20.
Artículo en Inglés | MEDLINE | ID: mdl-35240827

RESUMEN

Combinatorial auction is an important type of market mechanism, which can help bidders to bid on the combination of items more efficiently. The winner determination problem (WDP) is one of the most challenging research topics on the combinatorial auction, which has been proven to be NP-hard. It has more attention from researchers in recent years and has a wide range of real-world applications. To solve the winner determination problem effectively, this paper proposes a hybrid ant colony algorithm called DHS-ACO, which combines an effective local search for exploitation and an ant colony algorithm for exploration, with two effective strategies. One is a hash tabu search strategy adopted to reduce the cycling problem in the local search procedure. Another is a deep scoring strategy which is introduced to consider the profound effects of the local operators. The experimental results on a broad range of benchmarks show that DHS-ACO outperforms the existing algorithms.


Asunto(s)
Algoritmos
16.
Chemosphere ; 291(Pt 1): 132703, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-34718024

RESUMEN

Microbial electrolysis cells (MECs) are widely considered as promising alternatives for degrading antibiotics. As one of the major operating parameters in MECs, voltage might affect the spread of antibiotic resistance genes (ARGs) given it can affect the physiological characteristics of bacteria. However, little is known about the impacts of voltage on the acceleration of bacterial mutation and the promotion of ARG dissemination via horizontal transfer in MECs. In this study, two voltages (0.9 V and 1.5 V) were applied to identify if electrical stimulation could increase bacterial mutation frequency. Three voltages (0.9 V, 1.5 V, and 2.5 V) were used to evaluate the conjugative transfer frequency of plasmid-encoded the ARGs from the donor (E. coli K-12) to the recipient (E. coli HB101) in MECs. After repeating subculture in MECs for 10 days, the mutation frequency of E. coli K-12 was promoted, consequently, the generated mutants became more resistant against tetracycline. When the voltage was higher than 0.9 V, conjugative ARG transfer frequency was significantly increased in the anode chamber (p < 0.05). The over-production of reactive oxygen species (ROS) (voltage >0.9 V) and cell membrane permeability (voltage >1.5 V) were significantly enhanced under electrical stimulations (p < 0.05). Genome-wide RNA sequencing indicated that the expressions of genes related to oxidative stress and cell membrane were upregulated with exposure to electrical stimulation. Electrical stimulations induced oxidative reactions, which triggered ROS over-production, SOS response, and enhancement of cell membrane permeability for both donor and recipient in the MECs. These findings provide insights into the potential role of voltage in the generation and spread of ARGs in MECs.


Asunto(s)
Antibacterianos , Transferencia de Gen Horizontal , Antibacterianos/farmacología , Farmacorresistencia Microbiana , Electrólisis , Escherichia coli , Genes Bacterianos , Mutación
17.
Int J Mol Sci ; 12(12): 8347-61, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22272076

RESUMEN

Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.


Asunto(s)
Conjuntos de Datos como Asunto , Lisina/metabolismo , Proteínas/química , Máquina de Vectores de Soporte , Ubiquitinación , Secuencias de Aminoácidos , Animales , Humanos , Proteínas/metabolismo
18.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2577-2585, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-32086216

RESUMEN

In the last few years, accumulating evidences had demonstrated that long non-coding RNAs (lncRNAs) participated in the regulation of target gene expression and played an important role in biological processes and human disease development. Thus, prediction of the associations between lncRNAs and disease had become a hot research in the fields of human sophisticated diseases. Most of these methods considered the information of two networks (lncRNA, disease) while neglected other networks. In this study, we designed a multi-layer network by integrating the similarity networks of lncRNAs, diseases and genes, and the known association networks of lncRNA-disease, lncRNAs-gene, and disease-gene, and then we developed a model called MHRWR for predicting the lncRNA-disease potential associations based on random walk with restart. The performance of MHRWR was evaluated by experimentally verified lncRNA-disease associations based on leave-one-out cross validation. MHRWR obtained a reliable AUC value of 0.91344, which significantly outperformed some previous methods. To further validate the reproducibility of performance, we used the model of MHRWR to verify related lncRNAs of colon cancer, colorectal cancer and lung adenocarcinoma in the case studies. The codes of MHRWR is available on: https://github.com/yangyq505/MHRWR.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Modelos Genéticos , Neoplasias , ARN Largo no Codificante , Algoritmos , Biología Computacional/métodos , Humanos , Neoplasias/genética , Neoplasias/metabolismo , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Transcriptoma/genética
19.
J Cheminform ; 12(1): 37, 2020 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-33430966

RESUMEN

Protein-ligand docking is an important approach for virtual screening and protein function annotation. Although many docking methods have been developed, most require a high-resolution crystal structure of the receptor and a user-specified binding site to start. This information is, however, not available for the majority of unknown proteins, including many pharmaceutically important targets. Developing blind docking methods without predefined binding sites and working with low-resolution receptor models from protein structure prediction is thus essential. In this manuscript, we propose a novel Monte Carlo based method, EDock, for blind protein-ligand docking. For a given protein, binding sites are first predicted by sequence-profile and substructure-based comparison searches with initial ligand poses generated by graph matching. Next, replica-exchange Monte Carlo (REMC) simulations are performed for ligand conformation refinement under the guidance of a physical force field coupled with binding-site distance constraints. The method was tested on two large-scale datasets containing 535 protein-ligand pairs. Without specifying binding pockets on the experimental receptor structures, EDock achieves on average a ligand RMSD of 2.03 Å, which compares favorably with state-of-the-art docking methods including DOCK6 (2.68 Å) and AutoDock Vina (3.92 Å). When starting with predicted models from I-TASSER, EDock still generates reasonable docking models, with a success rate 159% and 67% higher than DOCK6 and AutoDock Vina, respectively. Detailed data analyses show that the major advantage of EDock lies in reliable ligand binding site predictions and extensive REMC sampling, which allows for the implementation of multiple van der Waals weightings to accommodate different levels of steric clashes and cavity distortions and therefore enhances the robustness of low-resolution docking with predicted protein structures.

20.
Gene ; 658: 54-62, 2018 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-29524581

RESUMEN

Crassadoma gigantea is an important commercial marine bivalve species in Baja California and Mexico. In this study, we have applied RNA-Seq technology to profile the transcriptome of the C. gigantea for the first time. A total of 80,832,518 raw reads were produced from a Illumina HiSeq4000 platform, and 77,306,198 (95.64%) clean reads were generated after trimming the adaptor sequences. The transcriptome assembled into 158,855 transcripts with an N50 size of 1995 bp and an average size of 1008 bp. A number of DNA repair related genes, such as MSH3, EGF, TGF, IGF, FGF, encoding different groups of growth factors were found in the transcriptome data of C. gigantean. In addition, immune related genes Toll-like receptor (TLR) including TLR1, TLR2, TLR3, TLR4, TLR5, TLR6, TLR7, TLR8, and TLR9 was also observed in C. gigantean. A set of 12 polymorphic microsatellite loci were firstly developed and characterized in C. gigantea. The results show that the number of alleles and expected heterozygosity ranged from 3 to 9 and from 0.254 to 0.820, respectively. The average polymorphic information content was 0.790. These microsatellite loci will facilitate future studies of population structure and conservation genetics in this species.


Asunto(s)
Marcadores Genéticos , Anotación de Secuencia Molecular , Pectinidae/genética , Análisis de Secuencia de ADN/métodos , Transcriptoma , Animales , Bivalvos/genética , Bivalvos/metabolismo , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Ontología de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Repeticiones de Microsatélite , Pectinidae/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA