Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D98-D106, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953349

RESUMO

Long noncoding RNAs (lncRNAs) have emerged as crucial regulators across diverse biological processes and diseases. While high-throughput sequencing has enabled lncRNA discovery, functional characterization remains limited. The EVLncRNAs database is the first and exclusive repository for all experimentally validated functional lncRNAs from various species. After previous releases in 2018 and 2021, this update marks a major expansion through exhaustive manual curation of nearly 25 000 publications from 15 May 2020, to 15 May 2023. It incorporates substantial growth across all categories: a 154% increase in functional lncRNAs, 160% in associated diseases, 186% in lncRNA-disease associations, 235% in interactions, 138% in structures, 234% in circular RNAs, 235% in resistant lncRNAs and 4724% in exosomal lncRNAs. More importantly, it incorporated additional information include functional classifications, detailed interaction pathways, homologous lncRNAs, lncRNA locations, COVID-19, phase-separation and organoid-related lncRNAs. The web interface was substantially improved for browsing, visualization, and searching. ChatGPT was tested for information extraction and functional overview with its limitation noted. EVLncRNAs 3.0 represents the most extensive curated resource of experimentally validated functional lncRNAs and will serve as an indispensable platform for unravelling emerging lncRNA functions. The updated database is freely available at https://www.sdklab-biophysics-dzu.net/EVLncRNAs3/.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA Longo não Codificante , Gerenciamento de Dados , Armazenamento e Recuperação da Informação , RNA Longo não Codificante/genética
2.
Methods ; 210: 10-19, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36621557

RESUMO

Proteins encoded by small open reading frames (sORFs) can serve as functional elements playing important roles in vivo. Such sORFs also constitute the potential pool for facilitating the de novo gene birth, driving evolutionary innovation and species diversity. Therefore, their theoretical and experimental identification has become a critical issue. Herein, we proposed a protein-coding sORFs prediction method merely based on integrative sequence-derived features. Our prediction performance is better or comparable compared with other nine prevalent methods, which shows that our method can provide a relatively reliable research tool for the prediction of protein-coding sORFs. Our method allows users to estimate the potential expression of a queried sORF, which has been demonstrated by the correlation analysis between our possibility estimation and codon adaption index (CAI). Based on the features that we used, we demonstrated that the sequence features of the protein-coding sORFs in the two domains have significant differences implying that it might be a relatively hard task in terms of cross-domain prediction, hence domain-specific models were developed, which allowed users to predict protein-coding sORFs both in eukaryotes and prokaryotes. Finally, a web-server was developed and provided to boost and facilitate the study of the related field, which is freely available at http://guolab.whu.edu.cn/codingCapacity/index.html.


Assuntos
Algoritmo Florestas Aleatórias , Fases de Leitura Aberta/genética
3.
Front Biosci (Landmark Ed) ; 26(8): 272-278, 2021 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-34455759

RESUMO

Background: Small open reading frames (sORFs) with protein-coding ability present unprecedented challenge for genome annotation because of their short sequence and low expression level. In the past decade, only several prediction methods have been proposed for discovery of protein-coding sORFs and lack of objective and uniform negative datasets has become an important obstacle to sORFs prediction. The prediction efficiency of current sORFs prediction methods needs to be further evaluated to provide better research strategies for protein-coding sORFs discovery. Methods: In this work, nine mainstream existing methods for predicting protein-coding potential of ORFs are comprehensively evaluated based on a random sequence strategy. Results: The results show that the current methods perform poorly on different sORFs datasets. For comparison, a sequence based prediction algorithm trained on prokaryotic sORFs is proposed and its better prediction performance indicates that the random sequence strategy can provide feasible ideas for protein-coding sORFs predictions. Conclusions: As a kind of important functional genomic element, discovery of protein-coding sORFs has shed light on the dark proteomes. This evaluation work indicates that there is an urgent need for developing specialized prediction tools for protein-coding sORFs in both eukaryotes and prokaryotes. It is expected that the present work may provide novel ideas for future sORFs researches.


Assuntos
Genômica , Fases de Leitura Aberta/genética
4.
RNA Biol ; 16(11): 1555-1564, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31345106

RESUMO

High-throughput techniques have uncovered hundreds and thousands of long non-coding RNAs (lncRNAs). Among them, only a tiny fraction has experimentally validated functions (EVlncRNAs) by low-throughput methods. What fraction of lncRNAs from high-throughput experiments (HTlncRNAs) is truly functional is an active subject of debate. Here, we developed the first method to distinguish EVlncRNAs from HTlncRNAs and mRNAs by using Support Vector Machines and found that EVlncRNAs can be well separated from HTlncRNAs and mRNAs with 0.6 for Matthews correlation coefficient, 64% for sensitivity, and 81% for precision for the independent human test set. The most useful features for classification are related to sequence conservations at RNA (for separating from HTlncRNAs) and protein (for separating from mRNA) levels. The method is found to be robust as the human-RNA-trained model is applicable to independent mouse RNAs with similar accuracy and to a lesser extent to plant RNAs. The method can recover newly discovered EVlncRNAs with high sensitivity. Its application to randomly selected 2000 human HTlncRNAs indicates that the majority of HTlncRNAs is probably non-functional but a large portion (nearly 30%) are likely functional. In other words, there is an ample number of lncRNAs whose specific biological roles are yet to be discovered. The method developed here is expected to speed up and reduce the cost of the discovery by prioritizing potentially functional lncRNAs prior to experimental validation. EVlncRNA-pred is available as a web server at http://biophy.dzu.edu.cn/lncrnapred/index.html . All datasets used in this study can be obtained from the same website.


Assuntos
Biologia Computacional/métodos , RNA Longo não Codificante/genética , Análise de Sequência de RNA/métodos , Algoritmos , Animais , Humanos , Camundongos , Anotação de Sequência Molecular , Máquina de Vetores de Suporte
5.
Methods Mol Biol ; 1933: 431-437, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30945202

RESUMO

Plant long noncoding RNAs (lncRNAs) play important functional roles in various biological processes. Most databases deposit all plant lncRNA candidates produced by high-throughput experimental and/or computational techniques. There are several databases for experimentally validated lncRNAs. However, these databases are small in scale (with a few hundreds of lncRNAs only) and specific in their focuses (plants, diseases, or interactions). Thus, we established EVLncRNAs by curating lncRNAs validated by low-throughput experiments (up to May 1, 2016) and integrating specific databases (lncRNAdb, LncRANDisease, Lnc2Cancer, and PLNIncRBase) with additional functional and disease-specific information not covered previously. The current version of EVLncRNAs contains 1543 lncRNAs from 77 species, including 428 plant lncRNAs from 44 plant species. Compared to PLNIncRBase, our dataset does not contain any lncRNAs from microarray and deep sequencing. Moreover, 40% of entries contain new information (interaction and additional information from NCBI and Ensembl). The database allows users to browse, search, and download as well as to submit experimentally validated lncRNAs. The database is available at http://biophy.dzu.edu.cn/EVLncRNAs .


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Genoma de Planta , Plantas/genética , RNA Longo não Codificante/genética , RNA de Plantas/genética , Reprodutibilidade dos Testes , Ferramenta de Busca
6.
Nucleic Acids Res ; 46(D1): D100-D105, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-28985416

RESUMO

Long non-coding RNAs (lncRNAs) play important functional roles in various biological processes. Early databases were utilized to deposit all lncRNA candidates produced by high-throughput experimental and/or computational techniques to facilitate classification, assessment and validation. As more lncRNAs are validated by low-throughput experiments, several databases were established for experimentally validated lncRNAs. However, these databases are small in scale (with a few hundreds of lncRNAs only) and specific in their focuses (plants, diseases or interactions). Thus, it is highly desirable to have a comprehensive dataset for experimentally validated lncRNAs as a central repository for all of their structures, functions and phenotypes. Here, we established EVLncRNAs by curating lncRNAs validated by low-throughput experiments (up to 1 May 2016) and integrating specific databases (lncRNAdb, LncRANDisease, Lnc2Cancer and PLNIncRBase) with additional functional and disease-specific information not covered previously. The current version of EVLncRNAs contains 1543 lncRNAs from 77 species that is 2.9 times larger than the current largest database for experimentally validated lncRNAs. Seventy-four percent lncRNA entries are partially or completely new, comparing to all existing experimentally validated databases. The established database allows users to browse, search and download as well as to submit experimentally validated lncRNAs. The database is available at http://biophy.dzu.edu.cn/EVLncRNAs.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA Longo não Codificante/genética , Animais , Doença/genética , Genoma Humano , Humanos , Internet , Análise de Sequência de RNA
7.
BMC Bioinformatics ; 18(1): 206, 2017 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-28381244

RESUMO

BACKGROUND: Intrinsically unstructured or disordered proteins function via interacting with other molecules. Annotation of these binding sites is the first step for mapping functional impact of genetic variants in coding regions of human and other genomes, considering that a significant portion of eukaryotic genomes code for intrinsically disordered regions in proteins. RESULTS: DisBind (available at http://biophy.dzu.edu.cn/DisBind ) is a collection of experimentally supported binding sites in intrinsically disordered proteins and proteins with both structured and disordered regions. There are a total of 226 IDPs with functional site annotations. These IDPs contain 465 structured regions (ORs) and 428 IDRs according to annotation by DisProt. The database contains a total of 4232 binding residues (from UniProt and PDB structures) in which 2836 residues are in ORs and 1396 in IDRs. These binding sites are classified according to their interacting partners including proteins, RNA, DNA, metal ions and others with 2984, 258, 383, 350, and 262 annotated binding sites, respectively. Each entry contains site-specific annotations (structured regions, intrinsically disordered regions, and functional binding regions) that are experimentally supported according to PDB structures or annotations from UniProt. CONCLUSION: The searchable DisBind provides a reliable data resource for functional classification of intrinsically disordered proteins at the residue level.


Assuntos
Proteínas Intrinsicamente Desordenadas/metabolismo , Interface Usuário-Computador , Sítios de Ligação , Bases de Dados Factuais , Humanos , Internet , Proteínas Intrinsicamente Desordenadas/química , Estrutura Terciária de Proteína
8.
Int J Mol Sci ; 17(6)2016 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-27240358

RESUMO

Drug resistance of mutations in HIV-1 protease (PR) is the most severe challenge to the long-term efficacy of HIV-1 PR inhibitor in highly active antiretroviral therapy. To elucidate the molecular mechanism of drug resistance associated with mutations (D30N, I50V, I54M, and V82A) and inhibitor (GRL-0519) complexes, we have performed five molecular dynamics (MD) simulations and calculated the binding free energies using the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method. The ranking of calculated binding free energies is in accordance with the experimental data. The free energy spectra of each residue and inhibitor interaction for all complexes show a similar binding model. Analysis based on the MD trajectories and contribution of each residues show that groups R2 and R3 mainly contribute van der Waals energies, while groups R1 and R4 contribute electrostatic interaction by hydrogen bonds. The drug resistance of D30N can be attributed to the decline in binding affinity of residues 28 and 29. The size of Val50 is smaller than Ile50 causes the residue to move, especially in chain A. The stable hydrophobic core, including the side chain of Ile54 in the wild type (WT) complex, became unstable in I54M because the side chain of Met54 is flexible with two alternative conformations. The binding affinity of Ala82 in V82A decreases relative to Val82 in WT. The present study could provide important guidance for the design of a potent new drug resisting the mutation inhibitors.


Assuntos
Farmacorresistência Viral , Furanos/farmacologia , Protease de HIV/química , Protease de HIV/genética , HIV-1/enzimologia , Sulfonamidas/farmacologia , Sítios de Ligação , Biologia Computacional/métodos , Entropia , Furanos/química , Protease de HIV/metabolismo , Inibidores da Protease de HIV/química , Inibidores da Protease de HIV/farmacologia , HIV-1/química , HIV-1/genética , Ligação de Hidrogênio , Modelos Moleculares , Simulação de Dinâmica Molecular , Mutação , Ligação Proteica , Sulfonamidas/química
9.
J Chem Inf Model ; 55(6): 1261-70, 2015 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-25945398

RESUMO

The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Proteínas/química , Sequência de Aminoácidos , Mutação , NADH Desidrogenase/química , NADH Desidrogenase/genética , Proteínas/genética
10.
BMC Bioinformatics ; 10 Suppl 1: S44, 2009 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-19208146

RESUMO

BACKGROUND: The studies on protein folding/unfolding indicate that the native state topology is an important determinant of protein folding mechanism. The folding/unfolding behaviors of proteins which have similar topologies have been studied under Cartesian space and the results indicate that some proteins share the similar folding/unfolding characters. RESULTS: We construct physical property space with twelve different physical properties. By studying the unfolding process of the protein G and protein L under the property space, we find that the two proteins have the similar unfolding pathways that can be divided into three types and the one which with the umbrella-shape represents the preferred pathway. Moreover, the unfolding simulation time of the two proteins is different and protein L unfolding faster than protein G. Additionally, the distributing area of unfolded state ensemble of protein L is larger than that of protein G. CONCLUSION: Under the physical property space, the protein G and protein L have the similar folding/unfolding behaviors, which agree with the previous results obtained from the studies under Cartesian coordinate space. At the same time, some different unfolding properties can be detected easily, which can not be analyzed under Cartesian coordinate space.


Assuntos
Proteínas de Bactérias/química , Sítios de Ligação , Simulação por Computador , Cinética , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína
11.
J Biomol Struct Dyn ; 25(6): 609-19, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18399694

RESUMO

Forty nine molecular dynamics simulations of unfolding trajectories of the segment B1 of streptococcal protein G (GB1) provide a direct demonstration of the diversity of unfolding pathway and give a statistically utmost unfolding pathway under the physical property space. Twelve physical properties of the protein were chosen to construct a 12-dimensional property space. Then the 12-dimensional property space was reduced to a 3-dimensional principle component property space. Under the property space, the multiple unfolding trajectories look like "trees", which have some common characters. The "root of the tree" corresponds to the native state, the "bole" homologizes the partially unfolded conformations, and the "crown" is in correspondence to the unfolded state. These unfolding trajectories can be divided into three types. The first one has the characters of straight "bole" and "crown" corresponding to a fast two-state unfolding pathway of GB1. The second one has the character of "the standstill in the middle tree bole", which may correspond to a three-state unfolding pathway. The third one has the character of "the circuitous bole" corresponding to a slow two-state unfolding pathway. The fast two-state unfolding pathway is a statistically utmost unfolding pathway or preferred pathway of GB1, which occupies 53% of 49 unfolding trajectories. In the property space all the unfolding trajectories construct a thermal unfolding pathway ensemble of GB1. The unfolding pathway ensemble resembles a funnel that is gradually emanative from the native state ensemble to the unfolded state ensemble. In the property space, the thermal unfolded state distribution looks like electronic cloud in quantum mechanics. The unfolded states of the independent unfolding simulation trajectories have substantial overlaps, indicating that the thermal unfolded states are confined by the physical property values, and the number of protein unfolded state are much less than that was believed before.


Assuntos
Conformação Proteica , Proteínas de Bactérias/química , Biologia Computacional , Dobramento de Proteína , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA