Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39003530

RESUMO

Protein function prediction is critical for understanding the cellular physiological and biochemical processes, and it opens up new possibilities for advancements in fields such as disease research and drug discovery. During the past decades, with the exponential growth of protein sequence data, many computational methods for predicting protein function have been proposed. Therefore, a systematic review and comparison of these methods are necessary. In this study, we divide these methods into four different categories, including sequence-based methods, 3D structure-based methods, PPI network-based methods and hybrid information-based methods. Furthermore, their advantages and disadvantages are discussed, and then their performance is comprehensively evaluated and compared. Finally, we discuss the challenges and opportunities present in this field.


Assuntos
Biologia Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Humanos , Análise de Sequência de Proteína/métodos , Algoritmos
2.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35134113

RESUMO

Protein remote homology detection is one of the most fundamental research tool for protein structure and function prediction. Most search methods for protein remote homology detection are evaluated based on the Structural Classification of Proteins-extended (SCOPe) benchmark, but the diverse hierarchical structure relationships between the query protein and candidate proteins are ignored by these methods. In order to further improve the predictive performance for protein remote homology detection, a search framework based on the predicted protein hierarchical relationships (PHR-search) is proposed. In the PHR-search framework, the superfamily level prediction information is obtained by extracting the local and global features of the Hidden Markov Model (HMM) profile through a convolution neural network and it is converted to the fold level and class level prediction information according to the hierarchical relationships of SCOPe. Based on these predicted protein hierarchical relationships, filtering strategy and re-ranking strategy are used to construct the two-level search of PHR-search. Experimental results show that the PHR-search framework achieves the state-of-the-art performance by employing five basic search methods, including HHblits, JackHMMER, PSI-BLAST, DELTA-BLAST and PSI-BLASTexB. Furthermore, the web server of PHR-search is established, which can be accessed at http://bliulab.net/PHR-search.


Assuntos
Algoritmos , Proteínas , Proteínas/química , Análise de Sequência de Proteína/métodos
3.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36883697

RESUMO

MOTIVATION: Protein function annotation is fundamental to understanding biological mechanisms. The abundant genome-scale protein-protein interaction (PPI) networks, together with other protein biological attributes, provide rich information for annotating protein functions. As PPI networks and biological attributes describe protein functions from different perspectives, it is highly challenging to cross-fuse them for protein function prediction. Recently, several methods combine the PPI networks and protein attributes via the graph neural networks (GNNs). However, GNNs may inherit or even magnify the bias caused by noisy edges in PPI networks. Besides, GNNs with stacking of many layers may cause the over-smoothing problem of node representations. RESULTS: We develop a novel protein function prediction method, CFAGO, to integrate single-species PPI networks and protein biological attributes via a multi-head attention mechanism. CFAGO is first pre-trained with an encoder-decoder architecture to capture the universal protein representation of the two sources. It is then fine-tuned to learn more effective protein representations for protein function prediction. Benchmark experiments on human and mouse datasets show CFAGO outperforms state-of-the-art single-species network-based methods by at least 7.59%, 6.90%, 11.68% in terms of m-AUPR, M-AUPR, and Fmax, respectively, demonstrating cross-fusion by multi-head attention mechanism can greatly improve the protein function prediction. We further evaluate the quality of captured protein representations in terms of Davies Bouldin Score, whose results show that cross-fused protein representations by multi-head attention mechanism are at least 2.7% better than that of original and concatenated representations. We believe CFAGO is an effective tool for protein function prediction. AVAILABILITY AND IMPLEMENTATION: The source code of CFAGO and experiments data are available at: http://bliulab.net/CFAGO/.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas , Animais , Humanos , Camundongos , Mapeamento de Interação de Proteínas/métodos , Redes Neurais de Computação , Software , Mapas de Interação de Proteínas , Proteínas/metabolismo
4.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32427287

RESUMO

Protein remote homology detection is a fundamental and important task for protein structure and function analysis. Several search methods have been proposed to improve the detection performance of the remote homologues and the accuracy of ranking lists. The position-specific scoring matrix (PSSM) profile and hidden Markov model (HMM) profile can contribute to improving the performance of the state-of-the-art search methods. In this paper, we improved the profile-link (PL) information for constructing PSSM or HMM profiles, and proposed a PL-based search method (PL-search). In PL-search, more robust PLs are constructed through the double-link and iterative extending strategies, and an accurate similarity score of sequence pairs is calculated from the two-level Jaccard distance for remote homologues. We tested our method on two widely used benchmark datasets. Our results show that whether HHblits, JackHMMER or position-specific iterated-BLAST is used, PL-search obviously improves the search performance in terms of ranking quality as well as the number of detected remote homologues. For ease of use of PL-search, both its stand-alone tool and the web server are constructed, which can be accessed at http://bliulab.net/PL-search/.


Assuntos
Proteínas/metabolismo , Algoritmos , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Matrizes de Pontuação de Posição Específica , Conformação Proteica , Proteínas/química , Análise de Sequência de Proteína/métodos
5.
Bioinformatics ; 37(23): 4321-4327, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34170287

RESUMO

MOTIVATION: Protein remote homology detection is a challenging task for the studies of protein evolutionary relationships. PSI-BLAST is an important and fundamental search method for detecting homology proteins. Although many improved versions of PSI-BLAST have been proposed, their performance is limited by the search processes of PSI-BLAST. RESULTS: For further improving the performance of PSI-BLAST for protein remote homology detection, a supervised two-layer search framework based on PSI-BLAST (S2L-PSIBLAST) is proposed. S2L-PSIBLAST consists of a two-level search: the first-level search provides high-quality search results by using SMI-BLAST framework and double-link strategy to filter the non-homology protein sequences, the second-level search detects more homology proteins by profile-link similarity, and more accurate ranking lists for those detected protein sequences are obtained by learning to rank strategy. Experimental results on the updated version of Structural Classification of Proteins-extended benchmark dataset show that S2L-PSIBLAST not only obviously improves the performance of PSI-BLAST, but also achieves better performance on two improved versions of PSI-BLAST: DELTA-BLAST and PSI-BLASTexB. AVAILABILITY AND IMPLEMENTATION: http://bliulab.net/S2L-PSIBLAST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Alinhamento de Sequência , Proteínas/química , Sequência de Aminoácidos
6.
Bioinformatics ; 37(7): 913-920, 2021 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-32898222

RESUMO

MOTIVATION: As one of the most important and widely used mainstream iterative search tool for protein sequence search, an accurate Position-Specific Scoring Matrix (PSSM) is the key of PSI-BLAST. However, PSSMs containing non-homologous information obviously reduce the performance of PSI-BLAST for protein remote homology. RESULTS: To further study this problem, we summarize three types of Incorrectly Selected Homology (ISH) errors in PSSMs. A new search tool Supervised-Manner-based Iterative BLAST (SMI-BLAST) is proposed based on PSI-BLAST for solving these errors. SMI-BLAST obviously outperforms PSI-BLAST on the Structural Classification of Proteins-extended (SCOPe) dataset. Compared with PSI-BLAST on the ISH error subsets of SCOPe dataset, SMI-BLAST detects 1.6-2.87 folds more remote homologous sequences, and outperforms PSI-BLAST by 35.66% in terms of ROC1 scores. Furthermore, this framework is applied to JackHMMER, DELTA-BLAST and PSI-BLASTexB, and their performance is further improved. AVAILABILITY AND IMPLEMENTATION: User-friendly webservers for SMI-BLAST, JackHMMER, DELTA-BLAST and PSI-BLASTexB are established at http://bliulab.net/SMI-BLAST/, by which the users can easily get the results without the need to go through the mathematical details. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Sequência de Aminoácidos , Matrizes de Pontuação de Posição Específica , Alinhamento de Sequência , Análise de Sequência de Proteína
7.
Entropy (Basel) ; 25(1)2022 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-36673229

RESUMO

In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks.

9.
Artigo em Inglês | MEDLINE | ID: mdl-38568769

RESUMO

As the most common complication of diabetes, diabetic retinopathy (DR) is one of the main causes of irreversible blindness. Automatic DR grading plays a crucial role in early diagnosis and intervention, reducing the risk of vision loss in people with diabetes. In these years, various deep-learning approaches for DR grading have been proposed. Most previous DR grading models are trained using the dataset of single-field fundus images, but the entire retina cannot be fully visualized in a single field of view. There are also problems of scattered location and great differences in the appearance of lesions in fundus images. To address the limitations caused by incomplete fundus features, and the difficulty in obtaining lesion information. This work introduces a novel multi-view DR grading framework, which solves the problem of incomplete fundus features by jointly learning fundus images from multiple fields of view. Furthermore, the proposed model combines multi-view inputs such as fundus images and lesion snapshots. It utilizes heterogeneous convolution blocks (HCB) and scalable self-attention classes (SSAC), which enhance the ability of the model to obtain lesion information. The experimental results show that our proposed method performs better than the benchmark methods on the large-scale dataset.

10.
Med Phys ; 50(9): 5897-5912, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37470489

RESUMO

BACKGROUND: Electrocardiogram (ECG) is a powerful tool for studying cardiac activity and diagnosing various cardiovascular diseases, including arrhythmia. While machine learning and deep learning algorithms have been applied to ECG interpretation, there is still room for improvement. For instance, the commonly used Recurrent Neural Networks (RNNs), reply on its previous state to update and is therefore ineffective for parallel computing. RNN also struggles to efficiently address the issue of long-distance reliance. PURPOSE: To reduce computational complexity by dimensionality reduction of ECG signals we constructed a Stacked Auto-encoders model using Transformer for ECG-based arrhythmia detection. And overcome the challenges of long-term dependencies and limited parallelizability in traditional RNNs when applied to ECG signal processing. METHODS: In this paper, a Transformer-Based ECG Dimensionality Reduction Stacked Auto-encoders model is proposed for ECG-based arrhythmia detection. The transformer is used to encode ECG signals into a feature matrix, which is then dimensionally reduced using unsupervised greedy training through the four linear layers. This resulted in a low-dimensional representation of ECG features, which are subsequently classified using support vector machines (SVM) to minimize overfitting. RESULTS: The proposed method is benchmarked on the MIT-BIH Arrhythmia database. In the 10-fold cross validation of beat-based arrhythmia detection, the average accuracy, sensitivity, specificity and F1 score of the proposed method are 99.83%, 98.84%, 99.84% and 99.13%, respectively, for the record-based arrhythmia detection which refers to the approach where the training and testing sets use ECG data from independent recorded patients are 88.10%, 49.79%, 91.56% and 39.95%, respectively. CONCLUSIONS: Compared to other existing ECG-based arrhythmia detection methods, our proposed approach exhibits improved detection accuracy and stronger generalization for arrhythmia beats. Additionally, the use of the record-based data division method makes our approach more suitable for clinical practice.


Assuntos
Algoritmos , Eletrocardiografia , Humanos , Redes Neurais de Computação , Processamento de Sinais Assistido por Computador , Arritmias Cardíacas/diagnóstico
11.
Sci Rep ; 6: 33483, 2016 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-27641752

RESUMO

Meiotic recombination presents an uneven distribution across the genome. Genomic regions that exhibit at relatively high frequencies of recombination are called hotspots, whereas those with relatively low frequencies of recombination are called coldspots. Therefore, hotspots and coldspots would provide useful information for the study of the mechanism of recombination. In this study, we proposed a computational predictor called iRSpot-DACC to predict hot/cold spots across the yeast genome. It combined Support Vector Machines (SVMs) and a feature called dinucleotide-based auto-cross covariance (DACC), which is able to incorporate the global sequence-order information and fifteen local DNA properties into the predictor. Combined with Principal Component Analysis (PCA), its performance was further improved. Experimental results on a benchmark dataset showed that iRSpot-DACC can achieve an accuracy of 82.7%, outperforming some highly related methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA