Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-39008396

RESUMO

Protein classification is a crucial field in bioinformatics. The development of a comprehensive tool that can perform feature evaluation, visualization, automated machine learning, and model interpretation would significantly advance research in protein classification. However, there is a significant gap in the literature regarding tools that integrate all these essential functionalities. This paper presents iProps, a novel Python-based software package, meticulously crafted to fulfill these multifaceted requirements. iProps is distinguished by its proficiency in feature extraction, evaluation, automated machine learning, and interpretation of classification models. Firstly, iProps fully leverages evolutionary information and amino acid reduction information to propose or extend several numerical protein features that are independent of sequence length, including SC-PSSM, ORDip, TRC, CTDC-E, CKSAAGP-E, and so forth; at the same time, it also implements the calculation of 17 other numerical features within the software. iProps also provides feature combination operations for the aforementioned features to generate more hybrid features, and has added data balancing sampling processing as well as built-in classifier settings, among other functionalities. Thus, It can discern the most effective protein class recognition feature from a multitude of candidates, utilizing three automated machine learning algorithms to identify the most optimal classifiers and parameter settings. Furthermore, iProps generates a detailed explanatory report that includes 23 informative graphs derived from three interpretable models. To assess the performance of iProps, a series of numerical experiments were conducted using two well-established datasets. The results demonstrated that our software achieved superior recognition performance in every case. Beyond its contributions to bioinformatics, iProps broadens its applicability by offering robust data analysis tools that are beneficial across various disciplines, capitalizing on its automated machine learning and model interpretation capabilities. As an open-source platform, iProps is readily accessible and features an intuitive user interface, ensuring ease of use for individuals, even those without a background in programming. The source code of the software is available for download at the following website: https://github.com/LigosQ/iProps and https://gitee.com/LigosQ/iProps.

2.
Comput Biol Med ; 176: 108534, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38754217

RESUMO

Antifreeze proteins have wide applications in the medical and food industries. In this study, we propose a stacking-based classifier that can effectively identify antifreeze proteins. Initially, feature extraction was performed in three aspects: reduction properties, scalable pseudo amino acid composition, and physicochemical properties. A hybrid feature set comprised of the combined information from these three categories was obtained. Subsequently, we trained the training set based on LightGBM, XGBoost, and RandomForest algorithms, and the training outcomes were passed to the Logistic algorithm for matching, thereby establishing a stacking algorithm. The proposed algorithm was tested on the test set and an independent validation set. Experimental data indicates that the algorithm achieved a recognition accuracy of 98.3 %, and an accuracy of 98.5 % on the validation set. Lastly, we analyzed the reasons why numerical features achieved high recognition capabilities from multiple aspects. Data dimensionality reduction and the analysis from two-dimensional and three-dimensional views revealed separability between positive and negative samples, and the protein three-dimensional structure further demonstrated significant differences in related features between the two samples. Analysis of the classifier revealed that Hr*Hr, HrHr, and Sc-PseAAC_1, 188D(152,116,57,183) were among the seven most important numerical features affecting algorithm recognition. For Hr*Hr and HrHr, supportive sequence level evidence for the reduction dictionary was found in terms of conservation area analysis, multiple sequence alignment, and amino acid conservative substitution. Moreover, the importance of the reduction dictionary was recognized through a comparative analysis of importance before and after the reduction, realizing the effectiveness of the dictionary in improving feature importance. A decision tree model has been utilized to discern the distinctions between dipeptides associated with the physical and chemical properties of His(H), Iso(I), Leu(L), and Lys(K) and other dipeptides. We finally analyzed the other seven features of importance, and data analysis confirmed that hydrophobicity, secondary structure, charge properties, van der Waals forces, and solvent accessibility are also factors affecting the antifreeze capability of proteins.


Assuntos
Algoritmos , Proteínas Anticongelantes , Proteínas Anticongelantes/química , Aminoácidos/química , Bases de Dados de Proteínas , Biologia Computacional/métodos
3.
PLoS One ; 18(2): e0282107, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36854040

RESUMO

Juxtapleural nodules were excluded from the segmented lung region in the Hounsfield unit threshold-based segmentation method. To re-include those regions in the lung region, a new approach was presented using scale-invariant feature transform and gradient vector flow models in this study. First, the scale-invariant feature transform method was utilized to detect all scale-invariant points in the binary lung region. The boundary points in the neighborhood of a scale-invariant point were collected to form the supportive boundary lines. Then, we utilized a Fourier descriptor to obtain a character representation of each supportive boundary line. Spectrum energy recognizes supportive boundaries that must be corrected. Third, the gradient vector flow-snake method was presented to correct the recognized supportive borders with a smooth profile curve, giving an ideal correction edge in those regions. Finally, the performance of the proposed method was evaluated through experiments on multiple authentic computed tomography images. The perfect results and robustness proved that the proposed method could correct the juxtapleural region precisely.


Assuntos
Vermis Cerebelar , Tórax , Tomografia Computadorizada por Raios X , Pulmão/diagnóstico por imagem
4.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2149-2157, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34061749

RESUMO

Malaria is a mosquito-borne disease that results in millions of cases and deaths annually. The development of a fast computational method that identifies secretory proteins of the malaria parasite is important for research on antimalarial drugs and vaccines. Thus, a method was developed to identify the secretory proteins of malaria parasites. In this method, a reduced alphabet was selected to recode the original protein sequence. A feature synthesis method was used to synthesise three different types of feature information. Finally, the random forest method was used as a classifier to identify the secretory proteins. In addition, a web server was developed to share the proposed algorithm. Experiments using the benchmark dataset demonstrated that the overall accuracy achieved by the proposed method was greater than 97.8 percent using the 10-fold cross-validation method. Furthermore, the reduced schemes and characteristic performance analyses are discussed.


Assuntos
Malária , Parasitos , Algoritmos , Sequência de Aminoácidos , Animais , Malária/parasitologia , Proteínas
5.
Proteomics ; 21(15): e2100017, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34009737

RESUMO

Antioxidant proteins can terminate a chain of reactions caused by free radicals and protect cells from damage. To identify antioxidant proteins rapidly, a computational model was proposed based on the optimized recoding scheme, sequence information and machine learning methods. First, over 600 recoding schemes were collected to build a scheme set. Then, the original sequence was recoded as a reduced expression whose g-gap dipeptides (g = 0, 1, 2) were used as the features of proteins. Furthermore, a random forest method was used to evaluate the classification ability of the obtained dipeptide features. After going through all schemes, the best predictive performance scheme was chosen as the optimized reduction scheme. Finally, for the RF method, a grid search strategy was used to select a better parameter combination to identify antioxidant proteins. In the experiment, the present method correctly recognized 90.13-99.87% of the antioxidant samples. Other experimental results also proved that the present method was efficient to identify antioxidant proteins. Finally, we also developed a web server that was freely accessible to researchers.


Assuntos
Antioxidantes , Proteínas , Eletrólitos , Aprendizado de Máquina
6.
Artigo em Inglês | MEDLINE | ID: mdl-32432088

RESUMO

The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design.

7.
Comput Math Methods Med ; 2015: 789485, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26089976

RESUMO

In order to get the extracted lung region from CT images more accurately, a model that contains lung region extraction and edge boundary correction is proposed. Firstly, a new edge detection function is presented with the help of the classic structure tensor theory. Secondly, the initial lung mask is automatically extracted by an improved active contour model which combines the global intensity information, local intensity information, the new edge information, and an adaptive weight. It is worth noting that the objective function of the improved model is converted to a convex model, which makes the proposed model get the global minimum. Then, the central airway was excluded according to the spatial context messages and the position relationship between every segmented region and the rib. Thirdly, a mesh and the fractal theory are used to detect the boundary that surrounds the juxtapleural nodule. Finally, the geometric active contour model is employed to correct the detected boundary and reinclude juxtapleural nodules. We also evaluated the performance of the proposed segmentation and correction model by comparing with their popular counterparts. Efficient computing capability and robustness property prove that our model can correct the lung boundary reliably and reproducibly.


Assuntos
Pulmão/diagnóstico por imagem , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Tomografia Computadorizada por Raios X/estatística & dados numéricos , Algoritmos , Biologia Computacional , Humanos , Imageamento Tridimensional/métodos , Imageamento Tridimensional/estatística & dados numéricos , Modelos Anatômicos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA