Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39110476

RESUMO

Bacteriophages are the viruses that infect bacterial cells. They are the most diverse biological entities on earth and play important roles in microbiome. According to the phage lifestyle, phages can be divided into the virulent phages and the temperate phages. Classifying virulent and temperate phages is crucial for further understanding of the phage-host interactions. Although there are several methods designed for phage lifestyle classification, they merely either consider sequence features or gene features, leading to low accuracy. A new computational method, DeePhafier, is proposed to improve classification performance on phage lifestyle. Built by several multilayer self-attention neural networks, a global self-attention neural network, and being combined by protein features of the Position Specific Scoring Matrix matrix, DeePhafier improves the classification accuracy and outperforms two benchmark methods. The accuracy of DeePhafier on five-fold cross-validation is as high as 87.54% for sequences with length >2000bp.


Assuntos
Bacteriófagos , Redes Neurais de Computação , Bacteriófagos/genética , Biologia Computacional/métodos , Proteínas Virais/genética , Proteínas Virais/metabolismo , Algoritmos
2.
Int J Mol Sci ; 25(8)2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38674091

RESUMO

Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.


Assuntos
Proteínas , Proteínas/metabolismo , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Matrizes de Pontuação de Posição Específica , Bases de Dados de Proteínas , Humanos , Algoritmos
3.
BMC Bioinformatics ; 25(1): 145, 2024 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-38580921

RESUMO

BACKGROUND: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .


Assuntos
Proteínas , Software , Sequência de Aminoácidos , Matrizes de Pontuação de Posição Específica , Evolução Biológica , Biologia Computacional/métodos
4.
J Equine Vet Sci ; 136: 105052, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38531516

RESUMO

Quarter horses (QH), a prominent athletic breed in Brazil, are affected by muscular genetic disorders such as myosin-heavy chain myopathy (MYHM), polysaccharide storage myopathy (PSSM1), hyperkalemic periodic paralysis (HyPP), and malignant hyperthermia (MH). Bull-catching (vaquejada), primarily involving QH, is a significant equestrian sport in Brazil. Since the allele frequencies (AF) of MYHM, PSSM1, HyPP, and MH in vaquejada QH remain unknown, this study evaluated the AF in 129 QH vaquejada athletes, specifically from the Brazilian Northeast. These variants were exclusively observed in heterozygosity. The MYHM exhibited the highest AF (0.04 ±0.01), followed by PSSM1 (0.01 ±0.01) and the HyPP variant (0.004 ±0.01), while the MH variant was not identified in this study. This study represents the first identification of these variants in vaquejada QH, emphasizing the need to implement measures to prevent the transmission of pathogenic alleles and reduce the occurrence of clinical cases of these genetic diseases.


Assuntos
Frequência do Gene , Doenças dos Cavalos , Cavalos , Doenças Musculares , Doenças Musculares/congênito , Doenças Musculares/genética , Doenças Musculares/veterinária , Animais , Cavalos/genética , Doenças dos Cavalos/genética , Masculino , Feminino , Brasil , Paralisia Periódica Hiperpotassêmica/genética , Paralisia Periódica Hiperpotassêmica/veterinária , Hipertermia Maligna/genética , Hipertermia Maligna/veterinária , Polissacarídeos/metabolismo , Testes Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA