Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 135
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-37991246

RESUMO

Today, pharmaceutical industry faces great pressure to employ more efficient and systematic ways in drug discovery and development process. However, conventional formulation studies still strongly rely on personal experiences by trial-and-error experiments, resulting in a labor-consuming, tedious and costly pipeline. Thus, it is highly required to develop intelligent and efficient methods for formulation development to keep pace with the progress of the pharmaceutical industry. Here, we developed a comprehensive web-based platform (FormulationAI) for in silico formulation design. First, the most comprehensive datasets of six widely used drug formulation systems in the pharmaceutical industry were collected over 10 years, including cyclodextrin formulation, solid dispersion, phospholipid complex, nanocrystals, self-emulsifying and liposome systems. Then, intelligent prediction and evaluation of 16 important properties from the six systems were investigated and implemented by systematic study and comparison of different AI algorithms and molecular representations. Finally, an efficient prediction platform was established and validated, which enables the formulation design just by inputting basic information of drugs and excipients. FormulationAI is the first freely available comprehensive web-based platform, which provides a powerful solution to assist the formulation design in pharmaceutical industry. It is available at https://formulationai.computpharm.org/.


Assuntos
Algoritmos , Inteligência Artificial , Composição de Medicamentos/métodos , Desenho de Fármacos , Internet
2.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37609950

RESUMO

Ion mobility coupled to mass spectrometry informs on the shape and size of protein structures in the form of a collision cross section (CCSIM). Although there are several computational methods for predicting CCSIM based on protein structures, including our previously developed projection approximation using rough circular shapes (PARCS), the process usually requires prior experience with the command-line interface. To overcome this challenge, here we present a web application on the Rosetta Online Server that Includes Everyone (ROSIE) webserver to predict CCSIM from protein structure using projection approximation with PARCS. In this web interface, the user is only required to provide one or more PDB files as input. Results from our case studies suggest that CCSIM predictions (with ROSIE-PARCS) are highly accurate with an average error of 6.12%. Furthermore, the absolute difference between CCSIM and CCSPARCS can help in distinguishing accurate from inaccurate AlphaFold2 protein structure predictions. ROSIE-PARCS is designed with a user-friendly interface, is available publicly and is free to use. The ROSIE-PARCS web interface is supported by all major web browsers and can be accessed via this link (https://rosie.graylab.jhu.edu).


Assuntos
Proteínas , Software , Proteínas/química , Navegador
3.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35722704

RESUMO

Rapid progresses in RNA-Seq and computational methods have assisted in quantifying A-to-I RNA editing and altered RNA editing sites have been widely observed in various diseases. Nevertheless, functional characterization of the altered RNA editing sites still remains a challenge. Here, we developed perturbations of RNA editing sites (PRES; http://bio-bigdata.hrbmu.edu.cn/PRES/) as the webserver for decoding functional perturbations of RNA editing sites based on editome profiling. After uploading an editome profile among samples of different groups, PRES will first annotate the editing sites to various genomic elements and detect differential editing sites under the user-selected method and thresholds. Next, the downstream functional perturbations of differential editing sites will be characterized from gain or loss miRNA/RNA binding protein regulation, RNA and protein structure changes, and the perturbed biological pathways. A prioritization module was developed to rank genes based on their functional consequences of RNA editing events. PRES provides user-friendly functionalities, ultra-efficient calculation, intuitive table and figure visualization interface to display the annotated RNA editing events, filtering options and elaborate application notebooks. We anticipate PRES will provide an opportunity for better understanding the regulatory mechanisms of RNA editing in human complex diseases.


Assuntos
MicroRNAs , Edição de RNA , Humanos , MicroRNAs/genética
4.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34864888

RESUMO

Post-translational modification (PTM) refers to the covalent and enzymatic modification of proteins after protein biosynthesis, which orchestrates a variety of biological processes. Detecting PTM sites in proteome scale is one of the key steps to in-depth understanding their regulation mechanisms. In this study, we presented an integrated method based on eXtreme Gradient Boosting (XGBoost), called iRice-MS, to identify 2-hydroxyisobutyrylation, crotonylation, malonylation, ubiquitination, succinylation and acetylation in rice. For each PTM-specific model, we adopted eight feature encoding schemes, including sequence-based features, physicochemical property-based features and spatial mapping information-based features. The optimal feature set was identified from each encoding, and their respective models were established. Extensive experimental results show that iRice-MS always display excellent performance on 5-fold cross-validation and independent dataset test. In addition, our novel approach provides the superiority to other existing tools in terms of AUC value. Based on the proposed model, a web server named iRice-MS was established and is freely accessible at http://lin-group.cn/server/iRice-MS.


Assuntos
Oryza , Processamento de Proteína Pós-Traducional , Acetilação , Biologia Computacional , Modelos Biológicos , Oryza/metabolismo , Processamento de Proteína Pós-Traducional/fisiologia , Proteoma/metabolismo , Ubiquitinação
5.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35580866

RESUMO

Predicting the native or near-native binding pose of a small molecule within a protein binding pocket is an extremely important task in structure-based drug design, especially in the hit-to-lead and lead optimization phases. In this study, fastDRH, a free and open accessed web server, was developed to predict and analyze protein-ligand complex structures. In fastDRH server, AutoDock Vina and AutoDock-GPU docking engines, structure-truncated MM/PB(GB)SA free energy calculation procedures and multiple poses based per-residue energy decomposition analysis were well integrated into a user-friendly and multifunctional online platform. Benefit from the modular architecture, users can flexibly use one or more of three features, including molecular docking, docking pose rescoring and hotspot residue prediction, to obtain the key information clearly based on a result analysis panel supported by 3Dmol.js and Apache ECharts. In terms of protein-ligand binding mode prediction, the integrated structure-truncated MM/PB(GB)SA rescoring procedures exhibit a success rate of >80% in benchmark, which is much better than the AutoDock Vina (~70%). For hotspot residue identification, our multiple poses based per-residue energy decomposition analysis strategy is a more reliable solution than the one using only a single pose, and the performance of our solution has been experimentally validated in several drug discovery projects. To summarize, the fastDRH server is a useful tool for predicting the ligand binding mode and the hotspot residue of protein for ligand binding. The fastDRH server is accessible free of charge at http://cadd.zju.edu.cn/fastdrh/.


Assuntos
Proteínas , Sítios de Ligação , Entropia , Ligantes , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química
6.
Proteomics ; 23(13-14): e2200409, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37021401

RESUMO

Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.


Assuntos
Elementos Facilitadores Genéticos , Genoma Humano , Humanos , Biologia Computacional/métodos , Algoritmos , Aprendizado de Máquina
7.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33751027

RESUMO

DNase I hypersensitive site (DHS) refers to the hypersensitive region of chromatin for the DNase I enzyme. It is an important part of the noncoding region and contains a variety of regulatory elements, such as promoter, enhancer, and transcription factor-binding site, etc. Moreover, the related locus of disease (or trait) are usually enriched in the DHS regions. Therefore, the detection of DHS region is of great significance. In this study, we develop a deep learning-based algorithm to identify whether an unknown sequence region would be potential DHS. The proposed method showed high prediction performance on both training datasets and independent datasets in different cell types and developmental stages, demonstrating that the method has excellent superiority in the identification of DHSs. Furthermore, for the convenience of related wet-experimental researchers, the user-friendly web-server iDHS-Deep was established at http://lin-group.cn/server/iDHS-Deep/, by which users can easily distinguish DHS and non-DHS and obtain the corresponding developmental stage ofDHS.


Assuntos
Arabidopsis/genética , DNA/genética , Aprendizado Profundo , Desoxirribonuclease I/genética , Oryza/genética , Software , Arabidopsis/metabolismo , Cromatina/metabolismo , Cromatina/ultraestrutura , DNA/química , DNA/metabolismo , Conjuntos de Dados como Assunto , Desoxirribonuclease I/metabolismo , Elementos Facilitadores Genéticos , Loci Gênicos , Humanos , Internet , Oryza/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica
8.
Brief Bioinform ; 22(2): 1940-1950, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32065211

RESUMO

The locations of the initiation of genomic DNA replication are defined as origins of replication sites (ORIs), which regulate the onset of DNA replication and play significant roles in the DNA replication process. The study of ORIs is essential for understanding the cell-division cycle and gene expression regulation. Accurate identification of ORIs will provide important clues for DNA replication research and drug development by developing computational methods. In this paper, the first integrated predictor named iORI-Euk was built to identify ORIs in multiple eukaryotes and multiple cell types. In the predictor, seven eukaryotic (Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Pichia pastoris, Schizosaccharomyces pombe and Kluyveromyces lactis) ORI data was collected from public database to construct benchmark datasets. Subsequently, three feature extraction strategies which are k-mer, binary encoding and combination of k-mer and binary were used to formulate DNA sequence samples. We also compared the different classification algorithms' performance. As a result, the best results were obtained by using support vector machine in 5-fold cross-validation test and independent dataset test. Based on the optimal model, an online web server called iORI-Euk (http://lin-group.cn/server/iORI-Euk/) was established for the novel ORI identification.


Assuntos
Origem de Replicação , Algoritmos , Animais , Linhagem Celular , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Eucariotos/genética , Humanos , Máquina de Vetores de Suporte
9.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32393981

RESUMO

With the advances of next-generation sequencing technology, the field of disease research has been revolutionized. However, pinpointing the disease-causing variants from millions of revealed variants is still a tough task. Here, we have reviewed the existing linkage analysis tools and presented PedMiner, a web-based application designed to narrow down candidate variants from family based whole-exome sequencing (WES) data through linkage analysis. PedMiner integrates linkage analysis, variant annotation and prioritization in one automated pipeline. It provides graphical visualization of the linked regions along with comprehensive annotation of variants and genes within these linked regions. This efficient and comprehensive application will be helpful for the scientific community working on Mendelian inherited disorders using family based WES data.


Assuntos
Sequenciamento do Exoma/métodos , Família , Doenças Genéticas Inatas/genética , Ligação Genética , Algoritmos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Masculino , Linhagem
10.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33201188

RESUMO

BACKGROUND: Fluorescent detection methods are indispensable tools for chemical biology. However, the frequent appearance of potential fluorescent compound has greatly interfered with the recognition of compounds with genuine activity. Such fluorescence interference is especially difficult to identify as it is reproducible and possesses concentration-dependent characteristic. Therefore, the development of a credible screening tool to detect fluorescent compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS: In this study, we developed a webserver ChemFLuo for fluorescent compound detection, based on two large and high-quality training datasets containing 4906 blue and 8632 green fluorescent compounds. These molecules were used to construct a group of prediction models based on the combination of three machine learning algorithms and seven types of molecular representations. The best blue fluorescence prediction model achieved with balanced accuracy (BA) = 0.858 and area under the receiver operating characteristic curve (AUC) = 0.931 for the validation set, and BA = 0.823 and AUC = 0.903 for the test set. The best green fluorescence prediction model achieved the prediction accuracy with BA = 0.810 and AUC = 0.887 for the validation set, and BA = 0.771 and AUC = 0.852 for the test set. Besides prediction model, 22 blue and 16 green representative fluorescent substructures were summarized for the screening of potential fluorescent compounds. The comparison with other fluorescence detection tools and theapplication to external validation sets and large molecule libraries have demonstrated the reliability of prediction model for fluorescent compound detection. CONCLUSION: ChemFLuo is a public webserver to filter out compounds with undesirable fluorescent properties, which will benefit the design of high-quality chemical libraries for drug discovery. It is freely available at http://admet.scbdd.com/chemfluo/index/.


Assuntos
Descoberta de Drogas , Corantes Fluorescentes/química , Aprendizado de Máquina , Modelos Químicos , Bibliotecas de Moléculas Pequenas , Fluorescência
11.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32382739

RESUMO

Reversible post-translational modification (PTM) orchestrates various biological processes by changing the properties of proteins. Since many proteins are multiply modified by PTMs, identification of PTM crosstalk site has emerged to be an intriguing topic and attracted much attention. In this study, we systematically deciphered the in situ crosstalk of ubiquitylation and SUMOylation that co-occurs on the same lysine residue. We first collected 3363 ubiquitylation-SUMOylation (UBS) crosstalk site on 1302 proteins and then investigated the prime sequence motifs, the local evolutionary degree and the distribution of structural annotations at the residue and sequence levels between the UBS crosstalk and the single modification sites. Given the properties of UBS crosstalk sites, we thus developed the mUSP classifier to predict UBS crosstalk site by integrating different types of features with two-step feature optimization by recursive feature elimination approach. By using various cross-validations, the mUSP model achieved an average area under the curve (AUC) value of 0.8416, indicating its promising accuracy and robustness. By comparison, the mUSP has significantly better performance with the improvement of 38.41 and 51.48% AUC values compared to the cross-results by the previous single predictor. The mUSP was implemented as a web server available at http://bioinfo.ncu.edu.cn/mUSP/index.html to facilitate the query of our high-accuracy UBS crosstalk results for experimental design and validation.


Assuntos
Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Aminoácidos/metabolismo , Evolução Biológica , Humanos , Sumoilação , Ubiquitinação
12.
Methods ; 204: 132-141, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35367597

RESUMO

With over 40 years of research, researchers in the intrinsic disorder prediction field developed over 100 computational predictors. This review offers a holistic perspective of this field by highlighting accurate and popular disorder predictors and introducing a wide range of practical resources that support collection, interpretation and application of disorder predictions. These resources include meta webservers that expedite collection of multiple disorder predictions, large databases of pre-computed disorder predictions that ease collection of predictions particularly for large datasets of proteins, and modern quality assessment tools. The latter methods facilitate identification of accurate predictions in a specific protein sequence, reducing uncertainty associated to the use of the putative disorder. Altogether, we review eleven predictors, four meta webservers, three databases and two quality assessment tools, all of which are conveniently available online. We also offer a perspective on future developments of the disorder prediction and the quality assessment tools. The availability of this comprehensive toolbox of useful resources should stimulate further growth in the application of the disorder predictions across many areas including rational drug design, systems medicine, structural bioinformatics and structural genomics.


Assuntos
Proteínas Intrinsicamente Desordenadas , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Desenho de Fármacos , Proteínas Intrinsicamente Desordenadas/química
13.
Methods ; 203: 322-327, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35091075

RESUMO

Epitranscriptomic m6A methylation is shown to mediate extensive regulations under the context of various RNA binding protein (RBP) readers. With m6A methylation data has reached a sizable scale, the functional context-aware analysis of m6A profiles is becoming more feasible and demanded. In this study, we employed graph regularized non-negative matrix factorization (GNMF) for m6A profile analysis and comparison, where the RBP binding preference of m6A sites were incorporated as the functional context-based graph constraint term. Compared to the baseline non-negative matrix factorization (NMF) method, this GNMF-based method could better capture the distinctions in multiple functional characteristics between different group of m6A sites, including but not limited to the associated biological pathways and disease genes. We further established m6Adecom, an online tool that can be used for correlation and enrichment analysis of m6A profiles using the matrix decomposition result from GNMF, and gene set enrichment analysis based on the high-score m6A sites. m6Adecom is freely accessible at http://www.rnanut.net/m6adecom.


Assuntos
Algoritmos , Proteínas de Ligação a RNA , Proteínas de Ligação a RNA/genética
14.
J Biomed Inform ; 143: 104423, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37308034

RESUMO

OBJECTIVE: Genotype imputation is a commonly used technique that infers un-typed variants into a study's genotype data, allowing better identification of causal variants in disease studies. However, due to overrepresentation of Caucasian studies, there's a lack of understanding of genetic basis of health-outcomes in other ethnic populations. Therefore, facilitating imputation of missing key-predictor-variants that can potentially improve a risk health-outcome prediction model, specifically for Asian ancestry, is of utmost relevance. METHODS: We aimed to construct an imputation and analysis web-platform, that primarily facilitates, but is not limited to genotype imputation on East-Asians. The goal is to provide a collaborative imputation platform for researchers in the public domain towards rapidly and efficiently conducting accurate genotype imputation. RESULTS: We present an online genotype imputation platform, Multi-ethnic Imputation System (MI-System) (https://misystem.cgm.ntu.edu.tw/), that offers users 3 established pipelines, SHAPEIT2-IMPUTE2, SHAPEIT4-IMPUTE5, and Beagle5.1 for conducting imputation analyses. In addition to 1000 Genomes and Hapmap3, a new customized Taiwan Biobank (TWB) reference panel, specifically created for Taiwanese-Chinese ancestry is provided. MI-System further offers functions to create customized reference panels to be used for imputation, conduct quality control, split whole genome data into chromosomes, and convert genome builds. CONCLUSION: Users can upload their genotype data and perform imputation with minimum effort and resources. The utility functions further can be utilized to preprocess user uploaded data with easy clicks. MI-System potentially contributes to Asian-population genetics research, while eliminating the requirement for high performing computational resources and bioinformatics expertise. It will enable an increased pace of research and provide a knowledge-base for genetic carriers of complex diseases, therefore greatly enhancing patient-driven research. STATEMENT OF SIGNIFICANCE: Multi-ethnic Imputation System (MI-System), primarily facilitates, but is not limited to, imputation on East-Asians, through 3 established prephasing-imputation pipelines, SHAPEIT2-IMPUTE2, SHAPEIT4-IMPUTE5, and Beagle5.1, where users can upload their genotype data and perform imputation and other utility functions with minimum effort and resources. A new customized Taiwan Biobank (TWB) reference panel, specifically created for Taiwanese-Chinese ancestry is provided. Utility functions include (a) create customized reference panels, (b) conduct quality control, (c) split whole genome data into chromosomes, and (d) convert genome builds. Users can also combine 2 reference panels using the system and use combined panels as reference to conduct imputation using MI-System.


Assuntos
Genética Populacional , Genoma , Humanos , Frequência do Gene , Genótipo , Computadores , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único
15.
Molecules ; 28(21)2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37959801

RESUMO

The lymphocyte-specific protein tyrosine kinase (LCK) is a critical target in leukemia treatment. However, potential off-target interactions involving LCK can lead to unintended consequences. This underscores the importance of accurately predicting the inhibitory reactions of drug molecules with LCK during the research and development stage. To address this, we introduce an advanced ensemble machine learning technique designed to estimate the binding affinity between molecules and LCK. This comprehensive method includes the generation and selection of molecular fingerprints, the design of the machine learning model, hyperparameter tuning, and a model ensemble. Through rigorous optimization, the predictive capabilities of our model have been significantly enhanced, raising test R2 values from 0.644 to 0.730 and reducing test RMSE values from 0.841 to 0.732. Utilizing these advancements, our refined ensemble model was employed to screen an MCE -like drug library. Through screening, we selected the top ten scoring compounds, and tested them using the ADP-Glo bioactivity assay. Subsequently, we employed molecular docking techniques to further validate the binding mode analysis of these compounds with LCK. The exceptional predictive accuracy of our model in identifying LCK inhibitors not only emphasizes its effectiveness in projecting LCK-related safety panel predictions but also in discovering new LCK inhibitors. For added user convenience, we have also established a webserver, and a GitHub repository to share the project.


Assuntos
Proteína Tirosina Quinase p56(lck) Linfócito-Específica , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Proteína Tirosina Quinase p56(lck) Linfócito-Específica/química
16.
Brief Bioinform ; 21(3): 982-995, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-31157855

RESUMO

5-Methylcytosine (m5C) plays an extremely important role in the basic biochemical process. With the great increase of identified m5C sites in a wide variety of organisms, their epigenetic roles become largely unknown. Hence, accurate identification of m5C site is a key step in understanding its biological functions. Over the past several years, more attentions have been paid on the identification of m5C sites in multiple species. In this work, we firstly summarized the current progresses in computational prediction of m5C sites and then constructed a more powerful and reliable model for identifying m5C sites. To train the model, we collected experimentally confirmed m5C data from Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Arabidopsis thaliana, and compared the performances of different feature extraction methods and classification algorithms for optimizing prediction model. Based on the optimal model, a novel predictor called iRNA-m5C was developed for the recognition of m5C sites. Finally, we critically evaluated the performance of iRNA-m5C and compared it with existing methods. The result showed that iRNA-m5C could produce the best prediction performance. We hope that this paper could provide a guide on the computational identification of m5C site and also anticipate that the proposed iRNA-m5C will become a powerful tool for large scale identification of m5C sites.


Assuntos
5-Metilcitosina/metabolismo , Biologia Computacional/métodos , Algoritmos , Animais , Arabidopsis/metabolismo , Conjuntos de Dados como Assunto , Humanos , Camundongos , Saccharomyces cerevisiae/metabolismo
17.
J Comput Aided Mol Des ; 36(5): 341-354, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-34143323

RESUMO

The concept of chemical space is a cornerstone in chemoinformatics, and it has broad conceptual and practical applicability in many areas of chemistry, including drug design and discovery. One of the most considerable impacts is in the study of structure-property relationships where the property can be a biological activity or any other characteristic of interest to a particular chemistry discipline. The chemical space is highly dependent on the molecular representation that is also a cornerstone concept in computational chemistry. Herein, we discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces. We emphasize the development of methods and freely available tools focusing on drug discovery applications. We also comment on the general advantages and shortcomings of using freely available and easy-to-use tools and discuss the value of using such open resources for research, education, and scientific dissemination.


Assuntos
Quimioinformática , Descoberta de Drogas , Desenho de Fármacos , Descoberta de Drogas/métodos
18.
BMC Bioinformatics ; 22(1): 1, 2021 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-33388027

RESUMO

BACKGROUND: Protein-peptide interactions play a fundamental role in a wide variety of biological processes, such as cell signaling, regulatory networks, immune responses, and enzyme inhibition. Peptides are characterized by low toxicity and small interface areas; therefore, they are good targets for therapeutic strategies, rational drug planning and protein inhibition. Approximately 10% of the ethical pharmaceutical market is protein/peptide-based. Furthermore, it is estimated that 40% of protein interactions are mediated by peptides. Despite the fast increase in the volume of biological data, particularly on sequences and structures, there remains a lack of broad and comprehensive protein-peptide databases and tools that allow the retrieval, characterization and understanding of protein-peptide recognition and consequently support peptide design. RESULTS: We introduce Propedia, a comprehensive and up-to-date database with a web interface that permits clustering, searching and visualizing of protein-peptide complexes according to varied criteria. Propedia comprises over 19,000 high-resolution structures from the Protein Data Bank including structural and sequence information from protein-peptide complexes. The main advantage of Propedia over other peptide databases is that it allows a more comprehensive analysis of similarity and redundancy. It was constructed based on a hybrid clustering algorithm that compares and groups peptides by sequences, interface structures and binding sites. Propedia is available through a graphical, user-friendly and functional interface where users can retrieve, and analyze complexes and download each search data set. We performed case studies and verified that the utility of Propedia scores to rank promissing interacting peptides. In a study involving predicting peptides to inhibit SARS-CoV-2 main protease, we showed that Propedia scores related to similarity between different peptide complexes with SARS-CoV-2 main protease are in agreement with molecular dynamics free energy calculation. CONCLUSIONS: Propedia is a database and tool to support structure-based rational design of peptides for special purposes. Protein-peptide interactions can be useful to predict, classifying and scoring complexes or for designing new molecules as well. Propedia is up-to-date as a ready-to-use webserver with a friendly and resourceful interface and is available at: https://bioinfo.dcc.ufmg.br/propedia.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Peptídeos/química , Proteínas/química , Algoritmos , Humanos
19.
Plant J ; 103(5): 1894-1909, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32445587

RESUMO

Soybean (Glycine max [L.] Merr.) is a major crop in animal feed and human nutrition, mainly for its rich protein and oil contents. The remarkable rise in soybean transcriptome studies over the past 5 years generated an enormous amount of RNA-seq data, encompassing various tissues, developmental conditions and genotypes. In this study, we have collected data from 1298 publicly available soybean transcriptome samples, processed the raw sequencing reads and mapped them to the soybean reference genome in a systematic fashion. We found that 94% of the annotated genes (52 737/56 044) had detectable expression in at least one sample. Unsupervised clustering revealed three major groups, comprising samples from aerial, underground and seed/seed-related parts. We found 452 genes with uniform and constant expression levels, supporting their roles as housekeeping genes. On the other hand, 1349 genes showed heavily biased expression patterns towards particular tissues. A transcript-level analysis revealed that 95% (70 963 of 74 490) of the assembled transcripts have intron chains exactly matching those from known transcripts, whereas 3256 assembled transcripts represent potentially novel splicing isoforms. The dataset compiled here constitute a new resource for the community, which can be downloaded or accessed through a user-friendly web interface at http://venanciogroup.uenf.br/resources/. This comprehensive transcriptome atlas will likely accelerate research on soybean genetics and genomics.


Assuntos
Atlas como Assunto , Glycine max/genética , RNA de Plantas/genética , Transcriptoma/genética , Perfilação da Expressão Gênica , Biblioteca Gênica , Genes Essenciais/genética , Genes de Plantas/genética
20.
Amino Acids ; 53(2): 239-251, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33486591

RESUMO

Enzymes have been proven to play considerable roles in disease diagnosis and biological functions. The feature extraction that truly reflects the intrinsic properties of protein is the most critical step for the automatic identification of enzymes. Although lots of feature extraction methods have been proposed, some challenges remain. In this study, we developed a predictor called IHEC_RAAC, which has the capability to identify whether a protein is a human enzyme and distinguish the function of the human enzyme. To improve the feature representation ability, protein sequences were encoded by a new feature-vector called 'reduced amino acid cluster'. We calculated 673 amino acid reduction alphabets to determine the optimal feature representative scheme. The tenfold cross-validation test showed that the accuracy of IHEC_RAAC to identify human enzymes was 74.66% and further discriminate the human enzyme classes with an accuracy of 54.78%, which was 2.06% and 8.68% higher than the state-of-the-art predictors, respectively. Additionally, the results from the independent dataset indicated that IHEC_RAAC can effectively predict human enzymes and human enzyme classes to further provide guidance for protein research. A user-friendly web server, IHEC_RAAC, is freely accessible at http://bioinfor.imu.edu.cn/ihecraac .


Assuntos
Aminoácidos/química , Biologia Computacional/métodos , Bases de Dados de Proteínas , Enzimas/química , Algoritmos , Humanos , Sistemas On-Line , Proteínas/química , Software , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA