Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Appl Radiat Isot ; 197: 110803, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37054662

RESUMO

Ferrites are ceramic oxide materials consisting of mainly iron oxide and they have become massively important materials commercially and technologically, having a multitude of uses and applications. The protection against neutron-gamma mixed radiation is crucial in several nuclear applications. From this standpoint, mass attenuation coefficient, radiation protection efficiency and transmission factor of some ferrites namely barium, strontium, manganese, copper and cadmium ferrite has been computed using Geant4 and FLUKA simulations. Based on the simulated mass attenuation coefficient, other significant parameters such as linear attenuation coefficient, effective atomic and electron number, conductivity, half value layer, and mean free path were calculated for the selected ferrite materials. The validation of Monte Carlo geometry has been provided by comparing the mass attenuation coefficient results with standard WinXCom data. Gamma ray exposure buildup factors were computed using geometric progression fitting formula for the chosen ferrites in the energy range 0.015-15 MeV at penetration depths up to 40 mfp. The findings of the present work reveal that among the studied ferrites, barium ferrite and copper ferrite possess superior gamma ray and fast neutron attenuation capability, respectively. The present work provides a comprehensive investigation of the selected iron oxides in the field of neutron and gamma ray.

2.
Sensors (Basel) ; 23(2)2023 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-36679455

RESUMO

Many individuals worldwide pass away as a result of inadequate procedures for prompt illness identification and subsequent treatment. A valuable life can be saved or at least extended with the early identification of serious illnesses, such as various cancers and other life-threatening conditions. The development of the Internet of Medical Things (IoMT) has made it possible for healthcare technology to offer the general public efficient medical services and make a significant contribution to patients' recoveries. By using IoMT to diagnose and examine BreakHis v1 400× breast cancer histology (BCH) scans, disorders may be quickly identified and appropriate treatment can be given to a patient. Imaging equipment having the capability of auto-analyzing acquired pictures can be used to achieve this. However, the majority of deep learning (DL)-based image classification approaches are of a large number of parameters and unsuitable for application in IoMT-centered imaging sensors. The goal of this study is to create a lightweight deep transfer learning (DTL) model suited for BCH scan examination and has a good level of accuracy. In this study, a lightweight DTL-based model "MobileNet-SVM", which is the hybridization of MobileNet and Support Vector Machine (SVM), for auto-classifying BreakHis v1 400× BCH images is presented. When tested against a real dataset of BreakHis v1 400× BCH images, the suggested technique achieved a training accuracy of 100% on the training dataset. It also obtained an accuracy of 91% and an F1-score of 91.35 on the test dataset. Considering how complicated BCH scans are, the findings are encouraging. The MobileNet-SVM model is ideal for IoMT imaging equipment in addition to having a high degree of precision. According to the simulation findings, the suggested model requires a small computation speed and time.


Assuntos
Internet das Coisas , Máquina de Vetores de Suporte , Humanos , Diagnóstico por Imagem , Cintilografia , Internet
3.
Med Biol Eng Comput ; 60(10): 2877-2897, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35948841

RESUMO

Numerous studies have been conducted to elucidate the relation of tumor proximity to cancer prognosis and treatment efficacy in colorectal cancer. However, the molecular pathways and prognoses of left- and right-sided colorectal cancers are different, and this difference has not been fully investigated at the genomic level. In this study, a set of data science approaches, including six feature selection methods and three classification models, were used in predicting tumor location from gene expression profiles. Specificity, sensitivity, accuracy, and Mathew's correlation coefficient (MCC) evaluation metrics were used to evaluate the classification ability. Gene ontology enrichment analysis was applied by the Gene Ontology PANTHER Classification System. For the most significant 50 genes, protein-protein interactions and drug-gene interactions were analyzed using the GeneMANIA, CytoScape, CytoHubba, MCODE, and DGIdb databases. The highest classification accuracy (90%) is achieved with the most significant 200 genes when the ensemble-decision tree classification model is used with the ReliefF feature selection method. Molecular pathways and drug interactions are investigated for the most significant 50 genes. It is concluded that a machine-learning-based approach could be useful to discover the significant genes that may have an important role in the development of new therapies and drugs for colorectal cancer.


Assuntos
Neoplasias Colorretais , Aprendizado de Máquina , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/genética , Ontologia Genética , Humanos
4.
Breast Cancer Res Treat ; 193(2): 331-348, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35338412

RESUMO

PURPOSE: Triple-negative breast cancer (TNBC) is the most aggressive subtype of breast cancer that is frequently treated with chemotherapy. However, many patients exhibit either de novo chemoresistance or ultimately develop resistance to chemotherapy, leading to significantly high mortality rates. Therefore, increasing the efficacy of chemotherapy has potential to improve patient outcomes. METHODS: Here, we performed whole transcriptome sequencing (both RNA and small RNA-sequencing), coupled with network simulations and patient survival data analyses to build a novel miRNA-mRNA interaction network governing chemoresistance in TNBC. We performed cell proliferation assay, Western blotting, RNAi/miRNA mimic experiments, FN coating, 3D cultures, and ChIP assays to validate the interactions in the network, and their functional roles in chemoresistance. We developed xenograft models to test the therapeutic potential of the identified key miRNA/proteins in potentiating chemoresponse in vivo. We also analyzed several patient datasets to evaluate the clinical relevance of our findings. RESULTS: We identified fibronectin (FN1) as a central chemoresistance driver gene. Overexpressing miR-326 reversed FN1-driven chemoresistance by targeting FN1 receptor, ITGA5. miR-326 was downregulated by increased hypoxia/HIF1A and ECM stiffness in chemoresistant tumors, leading to upregulation of ITGA5 and activation of the downstream FAK/Src signaling pathways. Overexpression of miR-326 or inhibition of ITGA5 overcame FN1-driven chemotherapy resistance in vitro by inhibiting FAK/Src pathway and potentiated the efficacy of chemotherapy in vivo. Importantly, lower expression of miR-326 or higher levels of predicted miR-326 target genes was significantly associated with worse overall survival in chemotherapy-treated TNBC patients. CONCLUSION: FN1 is central in chemoresistance. In chemoresistant tumors, hypoxia and resulting ECM stiffness repress the expression of the tumor suppressor miRNA, miR-326. Hence, re-expression of miR-326 or inhibition of its target ITGA5 reverses FN1-driven chemoresistance making them attractive therapeutic approaches to enhance chemotherapy response in TNBCs.


Assuntos
Subunidade alfa do Fator 1 Induzível por Hipóxia , Integrinas , MicroRNAs , Neoplasias de Mama Triplo Negativas , Linhagem Celular Tumoral , Proliferação de Células , Regulação Neoplásica da Expressão Gênica , Humanos , Hipóxia/genética , Subunidade alfa do Fator 1 Induzível por Hipóxia/genética , Integrinas/genética , MicroRNAs/genética , Transdução de Sinais , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/genética
5.
Curr Drug Deliv ; 18(10): 1595-1610, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33645482

RESUMO

OBJECTIVE: The outbreak of COVID-19 caused by SARS-CoV-2 has promptly spread worldwide. This study aimed to predict mature miRNA sequences in the SARS-CoV-2 genome, their effects on protein-protein interactions in the affected cells, and gene-drug relationships to detect possible drug candidates. METHODS: Viral hairpin structure prediction, classification of hairpins, mutational examination of precursor miRNA candidate sequences, Minimum Free Energy (MFE) and regional entropy analysis, mature miRNA sequences, target gene prediction, gene ontology enrichment, and Protein-Protein Interaction (PPI) analysis, and gene-drug interactions were performed. RESULTS: A total of 62 candidate hairpins were detected by VMir analysis. Three hairpin structures were classified as true precursor miRNAs by miRBoost. Five different mutations were detected in precursor miRNA sequences in 100 SARS-CoV-2 viral genomes. Mutations slightly elevated MFE values and entropy in precursor miRNAs. Gene ontology terms associated with fibrotic pathways and immune system were found to be enriched in PANTHER, KEGG and Wiki pathway analysis. PPI analysis showed a network between 60 genes. CytoHubba analysis showed SMAD1 as a hub gene in the network. The targets of the predicted miRNAs, FAM214A, PPM1E, NUFIP2 and FAT4, were downregulated in SARS-CoV-2 infected A549 cells. CONCLUSION: miRNAs in the SARS-CoV-2 virus genome may contribute to the emergence of the Covid-19 infection by activating pathways associated with fibrosis in the cells infected by the virus and modulating the innate immune system. The hub protein between these pathways may be the SMAD1, which has an effective role in TGF signal transduction.


Assuntos
Antivirais/farmacologia , Epigênese Genética , MicroRNAs , SARS-CoV-2/efeitos dos fármacos , Células A549 , Caderinas , Humanos , MicroRNAs/genética , Proteínas Nucleares , Proteína Fosfatase 2C , Proteínas de Ligação a RNA , Proteínas Supressoras de Tumor , Tratamento Farmacológico da COVID-19
6.
Comput Methods Programs Biomed ; 198: 105816, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33157471

RESUMO

BACKGROUND AND OBJECTIVE: Sepsis occurs in response to an infection in the body and can progress to a fatal stage. Detection and monitoring of sepsis require multi-step analysis, which is time-consuming, costly and requires medically trained personnel. A metric called Sequential Organ Failure Assessment (SOFA) score is used to determine the severity of sepsis. This score depends heavily on laboratory measurements. In this study, we offer a computational solution for quantitatively monitoring sepsis symptoms and organ systems state without laboratory test. To this end, we propose to employ a regression-based analysis by using only seven vital signs that can be acquired from bedside in Intensive Care Unit (ICU) to predict the exact value of SOFA score of patients before sepsis occurrence. METHODS: A model called Deep SOFA-Sepsis Prediction Algorithm (DSPA) is introduced. In this model, we combined Convolutional Neural Networks (CNN) features with Random Forest (RF) algorithm to predict SOFA scores of sepsis patients. A subset of Medical Information Mart in Intensive Care (MIMIC) III dataset is used in experiments. 5154 samples are extracted as input. Ten-fold cross validation test are carried out for experiments. RESULTS: We demonstrated that our model has achieved a Correlation Coefficient (CC) of 0.863, a Mean Absolute Error (MAE) of 0.659, a Root Mean Square Error (RMSE) of 1.23 for predictions at sepsis onset. The accuracies of SOFA score predictions for 6 hours before sepsis onset were 0.842, 0.697, and 1.308, in terms of CC, MAE and RMSE, respectively. Our model outperformed traditional machine learning and deep learning models in regression analysis. We also evaluated our model's prediction performance for identifying sepsis patients in a binary classification setup. Our model achieved up to 0.982 AUC (Area Under Curve) for sepsis onset and 0.972 AUC for 6 hours before sepsis, which are higher than those reported by previous studies. CONCLUSIONS: By utilizing SOFA scores, our framework facilitates the prognose of sepsis and infected organ systems state. While previous studies focused only on predicting presence of sepsis, our model aims at providing a prognosis solution for sepsis. SOFA score estimation process in ICU depends on laboratory environment. This dependence causes delays in treating patients, which in turn may increase the risk of complications. By using easily accessible non-invasive vital signs that are routinely collected in ICU, our framework can eliminate this delay. We believe that the estimation of the SOFA score will also help health professionals to monitor organ states.


Assuntos
Aprendizado Profundo , Sepse , Humanos , Unidades de Terapia Intensiva , Escores de Disfunção Orgânica , Prognóstico , Curva ROC , Estudos Retrospectivos , Sepse/diagnóstico
7.
Mol Inform ; 38(7): e1800169, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30977960

RESUMO

Metals have crucial roles for many physiological, pathological and diagnostic processes. Metal binding proteins or metalloproteins are important for metabolism functions. The proteins that reach the three-dimensional structure by folding show which vital function is fulfilled. The prediction of metal-binding in proteins will be considered as a step-in function assignment for new proteins, which helps to obtain functional proteins in genomic studies, is critical to protein function annotation and drug discovery. Computational predictions made by using machine learning methods from the data obtained from amino acid sequences are widely used in the protein metal-binding and various bioinformatics fields. In this work, we present three different deep learning architectures for prediction of metal-binding of Histidines (HIS) and Cysteines (CYS) amino acids. These architectures are as follows: 2D Convolutional Neural Network, Long-Short Term Memory and Recurrent Neural Network. Their comparison is carried out on the three different sets of attributes derived from a public dataset of protein sequences. These three sets of features extracted from the protein sequence were obtained using the PAM scoring matrix, protein composition server, and binary representation methods. The results show that a better performance for prediction of protein metal- binding sites is obtained through Convolutional Neural Network architecture.


Assuntos
Aprendizado Profundo , Metais/química , Proteínas/química , Sítios de Ligação
8.
J Integr Bioinform ; 15(4)2018 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-30367805

RESUMO

Finding similarities and differences between metagenomic samples within large repositories has been rather a significant issue for researchers. Over the recent years, content-based retrieval has been suggested by various studies from different perspectives. In this study, a content-based retrieval framework for identifying relevant metagenomic samples is developed. The framework consists of feature extraction, selection methods and similarity measures for whole metagenome sequencing samples. Performance of the developed framework was evaluated on given samples. A ground truth was used to evaluate the system performance such that if the system retrieves patients with the same disease, -called positive samples-, they are labeled as relevant samples otherwise irrelevant. The experimental results show that relevant experiments can be detected by using different fingerprinting approaches. We observed that Latent Semantic Analysis (LSA) Method is a promising fingerprinting approach for representing metagenomic samples and finding relevance among them. Source codes and executable files are available at www.baskent.edu.tr/∼hogul/WMS_retrieval.rar.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Microbiota , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos
9.
IET Syst Biol ; 10(3): 87-93, 2016 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-27187987

RESUMO

Understanding time-course regulation of genes in response to a stimulus is a major concern in current systems biology. The problem is usually approached by computational methods to model the gene behaviour or its networked interactions with the others by a set of latent parameters. The model parameters can be estimated through a meta-analysis of available data obtained from other relevant experiments. The key question here is how to find the relevant experiments which are potentially useful in analysing current data. In this study, the authors address this problem in the context of time-course gene expression experiments from an information retrieval perspective. To this end, they introduce a computational framework that takes a time-course experiment as a query and reports a list of relevant experiments retrieved from a given repository. These retrieved experiments can then be used to associate the environmental factors of query experiment with the findings previously reported. The model is tested using a set of time-course Arabidopsis microarrays. The experimental results show that relevant experiments can be successfully retrieved based on content similarity.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Transdução de Sinais/fisiologia , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Fatores de Tempo
10.
Comput Methods Programs Biomed ; 127: 174-84, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26775736

RESUMO

A major difficulty with chest radiographic analysis is the invisibility of abnormalities caused by the superimposition of normal anatomical structures, such as ribs, over the main tissue to be examined. Suppressing the ribs with no information loss about the original tissue would therefore be helpful during manual identification or computer-aided detection of nodules on a chest radiographic image. In this study, we introduce a two-step algorithm for eliminating rib shadows in chest radiographic images. The algorithm first delineates the ribs using a novel hybrid self-template approach and then suppresses these delineated ribs using an unsupervised regression model that takes into account the change in proximal thickness (depth) of bone in the vertical axis. The performance of the system is evaluated using a benchmark set of real chest radiographic images. The experimental results determine that proposed method for rib delineation can provide higher accuracy than existing methods. The knowledge of rib delineation can remarkably improve the nodule detection performance of a current computer-aided diagnosis (CAD) system. It is also shown that the rib suppression algorithm can increase the nodule visibility by eliminating rib shadows while mostly preserving the nodule intensity.


Assuntos
Radiografia Torácica/métodos , Costelas , Diagnóstico por Computador , Humanos
11.
Biosystems ; 134: 71-8, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26116091

RESUMO

Content-based retrieval of biological experiments in large public repositories is a recent challenge in computational biology and bioinformatics. The task is, in general, to search in a database using a query-by-example without any experimental meta-data annotation. Here, we consider a more specific problem that seeks a solution for retrieving relevant microRNA experiments from microarray repositories. A computational framework is proposed with this objective. The framework adapts a normal-uniform mixture model for identifying differentially expressed microRNAs in microarray profiling experiments. A rank-based thresholding scheme is offered to binarize real-valued experiment fingerprints based on differential expression. An effective similarity metric is introduced to compare categorical fingerprints, which in turn infers the relevance between two experiments. Two different views of experimental relevance are evaluated, one for disease association and another for embryonic germ layer, to discern the retrieval ability of the proposed model. To the best of our knowledge, the experiment retrieval task is investigated for the first time in the context of microRNA microarrays.


Assuntos
MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , Pesquisa Empírica
12.
Biosystems ; 134: 37-42, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26093049

RESUMO

UNLABELLED: We introduce a novel web-based tool, miSEA, for evaluating the enrichment of relevant microRNA sets from microarray and miRNA-Seq experiments on paired samples, e.g. control vs. TREATMENT: In addition to a group of previously annotated microRNA sets embedded in the system, this tool enables users to import new microRNA sets obtained from their own research. miSEA allows users to select from a large variety of microRNA grouping categories, such as family classification, disease association, common regulation, and genome coordinates, based on their requirements. miSEA therefore provides a knowledge-driven representation scheme for microRNA experiments. The usability of this platform was discerned with a cancer type-classification task performed on a set of real microRNA expression profiling experiments. The miSEA web server is available at http://www.baskent.edu.tr/∼hogul/misea.


Assuntos
MicroRNAs/genética
13.
Methods Mol Biol ; 1107: 243-56, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24272442

RESUMO

Inferring microRNA (miRNA) functions and activities has been extremely important to understand their system-level roles and the mechanisms behind the cellular behaviors of their target genes. This chapter first details methodologies necessary for prediction of function and activity. It then introduces the computational methods available for investigation of sequence and experimental data and for analysis of the information flow mediated through miRNAs.


Assuntos
Biologia Computacional , MicroRNAs/fisiologia
14.
Mol Inform ; 33(5): 382-7, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-27485893

RESUMO

We present a software tool, called TriClust, for multi-way analysis of gene expression data from paired conditions of multiple organisms. The analysis is based on a new concept called triclustering, which is an extension of biclustering over a third dimension that represents the organism where the microarray experiment is performed. TriClust provides a comprehensive analysis of co-regulated genes under a subset of experimental conditions over multiple organisms. The results are visualized using heat-maps and the Gene Ontology (GO) term enrichment statistics. The experimental results indicate that TriClust can successfully identify biologically significant triclusters and promote a useful tool for cross species analysis of gene regulation from microarray expression data. The statistical results suggest that, when available, triclustering on multi-organism data can result in better gene clusters in comparison to biclustering on single-organism data. The TriClust software is publicly available as a standalone program.

15.
Protein Pept Lett ; 20(10): 1108-14, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23544665

RESUMO

Classifying sequences is one of the central problems in computational biosciences. Several tools have been released to map an unknown molecular entity to one of the known classes using solely its sequence data. However, all of the existing tools are problem-specific and restricted to an alphabet constrained by relevant biological structure. Here, we introduce TRAINER, a new online tool designed to serve as a generic sequence classification platform to enable users provide their own training data with any alphabet therein defined. TRAINER allows users to select among several feature representation schemes and supervised machine learning methods with relevant parameters. Trained models can be saved for future use without retraining by other users. Two case studies are reported for effective use of the system for DNA and protein sequences; candidate effector prediction and nucleolar localization signal prediction. Biological relevance of the results is discussed.


Assuntos
Inteligência Artificial , Análise de Sequência de Proteína/métodos , Software , Bases de Dados de Proteínas , Proteínas/química
16.
Biochem Biophys Res Commun ; 413(1): 111-5, 2011 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-21875575

RESUMO

Elucidation of microRNA activity is a crucial step in understanding gene regulation. One key problem in this effort is how to model the pairwise interactions of microRNAs with their targets. As this interaction is strongly mediated by their sequences, it is desired to set-up a probabilistic model to explain the binding preferences between a microRNA sequence and the sequence of a putative target. To this end, we introduce a new model of microRNA-target binding, which transforms an aligned duplex to a new sequence and defines the likelihood of this sequence using a Variable Length Markov Chain. It offers a complementary representation of microRNA-mRNA pairs for microRNA target prediction tools or other probabilistic frameworks of integrative gene regulation analysis. The performance of present model is evaluated by its ability to predict microRNA-target mRNA interaction given a mature microRNA sequence and a putative mRNA binding site. In regard to classification accuracy, it outperforms two recent methods based on thermodynamic stability and sequence complementarity. The experiments can also unveil the effects of base pairing types and non-seed region in duplex formation.


Assuntos
Simulação por Computador , MicroRNAs/química , Modelos Químicos , RNA Mensageiro/química , Probabilidade
17.
Biosystems ; 96(3): 246-50, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19758550

RESUMO

Deciphering the knowledge of HIV protease specificity and developing computational tools for detecting its cleavage sites in protein polypeptide chain are very desirable for designing efficient and specific chemical inhibitors to prevent acquired immunodeficiency syndrome. In this study, we developed a generative model based on a generalization of variable order Markov chains (VOMC) for peptide sequences and adapted the model for prediction of their cleavability by certain proteases. The new method, called variable context Markov chains (VCMC), attempts to identify the context equivalence based on the evolutionary similarities between individual amino acids. It was applied for HIV-1 protease cleavage site prediction problem and shown to outperform existing methods in terms of prediction accuracy on a common dataset. In general, the method is a promising tool for prediction of cleavage sites of all proteases and encouraged to be used for any kind of peptide classification problem as well.


Assuntos
Protease de HIV/química , Modelos Químicos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Ativação Enzimática , Cadeias de Markov , Dados de Sequência Molecular , Ligação Proteica
18.
Artigo em Inglês | MEDLINE | ID: mdl-17473316

RESUMO

Subcellular localization is one of the key properties in functional annotation of proteins. Support vector machines (SVMs) have been widely used for automated prediction of subcellular localizations. Existing methods differ in the protein encoding schemes used. In this study, we present two methods for protein encoding to be used for SVM-based subcellular localization prediction: n-peptide compositions with reduced amino acid alphabets for larger values of n and pairwise sequence similarity scores based on whole sequence and N-terminal sequence. We tested the methods on a common benchmarking data set that consists of 2,427 eukaryotic proteins with four localization sites. As a result of 5-fold cross-validation tests, the encoding with n-peptide compositions provided the accuracies of 84.5, 88.9, 66.3, and 94.3 percent for cytoplasmic, extracellular, mitochondrial, and nuclear proteins, where the overall accuracy was 87.1 percent. The second method provided 83.6, 87.7, 87.9, and 90.5 percent accuracies for individual locations and 87.8 percent overall accuracy. A hybrid system, which we called PredLOC, makes a final decision based on the results of the two presented methods which achieved an overall accuracy of 91.3 percent, which is better than the achievements of many of the existing methods. The new system also outperformed the recent methods in the experiments conducted on a new-unique SWISSPROT test set.


Assuntos
Proteínas/química , Proteínas/metabolismo , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Frações Subcelulares/química , Frações Subcelulares/metabolismo , Algoritmos , Sequência de Aminoácidos , Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Proteínas/classificação
19.
Biosystems ; 87(1): 75-81, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-16753255

RESUMO

In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP family database and compared against many of the existing homology detection methods including the most popular generative methods; SAM-98 and PSI-BLAST and the recent SVM methods; SVM-Fisher, SVM-BLAST and SVM-Pairwise. The results have demonstrated that the new method significantly outperforms SVM-Fisher, SVM-BLAST, SAM-98 and PSI-BLAST, while achieving a comparable accuracy with SVM-Pairwise. In terms of efficiency, it performs much better than SVM-Pairwise. It is shown that the information of n-peptide compositions with reduced amino acid alphabets provides an accurate and efficient means of protein vectorization for SVM-based sequence classification.


Assuntos
Aminoácidos/química , Peptídeos/química , Sequência de Aminoácidos , Modelos Teóricos , Dados de Sequência Molecular
20.
Comput Biol Chem ; 30(4): 292-9, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16880118

RESUMO

A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared with existing discriminative methods SVM-BLAST and SVM-Pairwise, which use BLAST similarity scores and dynamic-programming-based alignment scores, respectively. Results have shown that SVM-PST is more accurate than SVM-BLAST and competitive with SVM-Pairwise. In terms of computational efficiency, PST-based comparison is much better than dynamic-programming-based alignment. We also compared our results with the original family-based PST approach from which we were inspired. The present method provides a significantly better solution for protein classification in comparison with the family-based PST model.


Assuntos
Modelos Estatísticos , Proteínas/química , Análise de Sequência de Proteína/métodos , Análise de Sequência de Proteína/estatística & dados numéricos , Homologia de Sequência de Aminoácidos , Algoritmos , Biologia Computacional , Bases de Dados Factuais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA