Search | VHL Regional Portal

1.

An effective colorectal polyp classification for histopathological images based on supervised contrastive learning.

Yengec-Tasdemir, Sena Busra; Aydin, Zafer; Akay, Ebru; Dogan, Serkan; Yilmaz, Bulent.

Comput Biol Med ; 172: 108267, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38479197

ABSTRACT

Early detection of colon adenomatous polyps is pivotal in reducing colon cancer risk. In this context, accurately distinguishing between adenomatous polyp subtypes, especially tubular and tubulovillous, from hyperplastic variants is crucial. This study introduces a cutting-edge computer-aided diagnosis system optimized for this task. Our system employs advanced Supervised Contrastive learning to ensure precise classification of colon histopathology images. Significantly, we have integrated the Big Transfer model, which has gained prominence for its exemplary adaptability to visual tasks in medical imaging. Our novel approach discerns between in-class and out-of-class images, thereby elevating its discriminatory power for polyp subtypes. We validated our system using two datasets: a specially curated one and the publicly accessible UniToPatho dataset. The results reveal that our model markedly surpasses traditional deep convolutional neural networks, registering classification accuracies of 87.1% and 70.3% for the custom and UniToPatho datasets, respectively. Such results emphasize the transformative potential of our model in polyp classification endeavors.

Subject(s)

Adenomatous Polyps , Colonic Polyps , Humans , Colonic Polyps/diagnostic imaging , Neural Networks, Computer , Diagnosis, Computer-Assisted/methods , Diagnostic Imaging

2.

Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity.

Isik, Yunus Emre; Aydin, Zafer.

PeerJ ; 11: e15552, 2023.

Article in English | MEDLINE | ID: mdl-37404475

ABSTRACT

Respiratory diseases are among the major health problems causing a burden on hospitals. Diagnosis of infection and rapid prediction of severity without time-consuming clinical tests could be beneficial in preventing the spread and progression of the disease, especially in countries where health systems remain incapable. Personalized medicine studies involving statistics and computer technologies could help to address this need. In addition to individual studies, competitions are also held such as Dialogue for Reverse Engineering Assessment and Methods (DREAM) challenge which is a community-driven organization with a mission to research biology, bioinformatics, and biomedicine. One of these competitions was the Respiratory Viral DREAM Challenge, which aimed to develop early predictive biomarkers for respiratory virus infections. These efforts are promising, however, the prediction performance of the computational methods developed for detecting respiratory diseases still has room for improvement. In this study, we focused on improving the performance of predicting the infection and symptom severity of individuals infected with various respiratory viruses using gene expression data collected before and after exposure. The publicly available gene expression dataset in the Gene Expression Omnibus, named GSE73072, containing samples exposed to four respiratory viruses (H1N1, H3N2, human rhinovirus (HRV), and respiratory syncytial virus (RSV)) was used as input data. Various preprocessing methods and machine learning algorithms were implemented and compared to achieve the best prediction performance. The experimental results showed that the proposed approaches obtained a prediction performance of 0.9746 area under the precision-recall curve (AUPRC) for infection (i.e., shedding) prediction (SC-1), 0.9182 AUPRC for symptom class prediction (SC-2), and 0.6733 Pearson correlation for symptom score prediction (SC-3) by outperforming the best leaderboard scores of Respiratory Viral DREAM Challenge (a 4.48% improvement for SC-1, a 13.68% improvement for SC-2, and a 13.98% improvement for SC-3). Additionally, over-representation analysis (ORA), which is a statistical method for objectively determining whether certain genes are more prevalent in pre-defined sets such as pathways, was applied using the most significant genes selected by feature selection methods. The results show that pathways associated with the 'adaptive immune system' and 'immune disease' are strongly linked to pre-infection and symptom development. These findings contribute to our knowledge about predicting respiratory infections and are expected to facilitate the development of future studies that concentrate on predicting not only infections but also the associated symptoms.

Subject(s)

Influenza A Virus, H1N1 Subtype , Respiratory Syncytial Virus, Human , Virus Diseases , Humans , Influenza A Virus, H3N2 Subtype , Virus Diseases/diagnosis , Machine Learning

3.

Improved classification of colorectal polyps on histopathological images with ensemble learning and stain normalization.

Yengec-Tasdemir, Sena Busra; Aydin, Zafer; Akay, Ebru; Dogan, Serkan; Yilmaz, Bulent.

Comput Methods Programs Biomed ; 232: 107441, 2023 Apr.

Article in English | MEDLINE | ID: mdl-36905748

ABSTRACT

BACKGROUND AND OBJECTIVE: Early detection of colon adenomatous polyps is critically important because correct detection of it significantly reduces the potential of developing colon cancers in the future. The key challenge in the detection of adenomatous polyps is differentiating it from its visually similar counterpart, non-adenomatous tissues. Currently, it solely depends on the experience of the pathologist. To assist the pathologists, the objective of this work is to provide a novel non-knowledge-based Clinical Decision Support System (CDSS) for improved detection of adenomatous polyps on colon histopathology images. METHODS: The domain shift problem arises when the train and test data are coming from different distributions of diverse settings and unequal color levels. This problem, which can be tackled by stain normalization techniques, restricts the machine learning models to attain higher classification accuracies. In this work, the proposed method integrates stain normalization techniques with ensemble of competitively accurate, scalable and robust variants of CNNs, ConvNexts. The improvement is empirically analyzed for five widely employed stain normalization techniques. The classification performance of the proposed method is evaluated on three datasets comprising more than 10k colon histopathology images. RESULTS: The comprehensive experiments demonstrate that the proposed method outperforms the state-of-the-art deep convolutional neural network based models by attaining 95% classification accuracy on the curated dataset, and 91.1% and 90% on EBHI and UniToPatho public datasets, respectively. CONCLUSIONS: These results show that the proposed method can accurately classify colon adenomatous polyps on histopathology images. It retains remarkable performance scores even for different datasets coming from different distributions. This indicates that the model has a notable generalization ability.

Subject(s)

Adenomatous Polyps , Colonic Polyps , Humans , Colonic Polyps/diagnostic imaging , Coloring Agents , Neural Networks, Computer , Machine Learning

4.

IGPRED-MultiTask: A Deep Learning Model to Predict Protein Secondary Structure, Torsion Angles and Solvent Accessibility.

Gormez, Yasin; Aydin, Zafer.

IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1104-1113, 2023.

Article in English | MEDLINE | ID: mdl-35849663

ABSTRACT

Protein secondary structure, solvent accessibility and torsion angle predictions are preliminary steps to predict 3D structure of a protein. Deep learning approaches have achieved significant improvements in predicting various features of protein structure. In this study, IGPRED-Multitask, a deep learning model with multi task learning architecture based on deep inception network, graph convolutional network and a bidirectional long short-term memory is proposed. Moreover, hyper-parameters of the model are fine-tuned using Bayesian optimization, which is faster and more effective than grid search. The same benchmark test data sets as in the OPUS-TASS paper including TEST2016, TEST2018, CASP12, CASP13, CASPFM, HARD68, CAMEO93, CAMEO93_HARD, as well as the train and validation sets, are used for fair comparison with the literature. Statistically significant improvements are observed in secondary structure prediction on 4 datasets, in phi angle prediction on 2 datasets and in psi angel prediction on 3 datasets compared to the state-of-the-art methods. For solvent accessibility prediction, TEST2016 and TEST2018 datasets are used only to assess the performance of the proposed model.

Subject(s)

Deep Learning , Neural Networks, Computer , Solvents/chemistry , Bayes Theorem , Proteins/chemistry

5.

The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behçet's Disease.

Isik, Yunus Emre; Gormez, Yasin; Aydin, Zafer; Bakir-Gungor, Burcu.

IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1909-1918, 2022.

Article in English | MEDLINE | ID: mdl-33476272

ABSTRACT

Behçet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behçet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.

Subject(s)

Behcet Syndrome , Polymorphism, Single Nucleotide , Behcet Syndrome/diagnosis , Behcet Syndrome/genetics , Genetic Predisposition to Disease/genetics , Humans , Polymorphism, Single Nucleotide/genetics

6.

A Continuously Benchmarked and Crowdsourced Challenge for Rapid Development and Evaluation of Models to Predict COVID-19 Diagnosis and Hospitalization.

Yan, Yao; Schaffter, Thomas; Bergquist, Timothy; Yu, Thomas; Prosser, Justin; Aydin, Zafer; Jabeer, Amhar; Brugere, Ivan; Gao, Jifan; Chen, Guanhua; Causey, Jason; Yao, Yuxin; Bryson, Kevin; Long, Dustin R; Jarvik, Jeffrey G; Lee, Christoph I; Wilcox, Adam; Guinney, Justin; Mooney, Sean.

JAMA Netw Open ; 4(10): e2124946, 2021 10 01.

Article in English | MEDLINE | ID: mdl-34633425

ABSTRACT

Importance: Machine learning could be used to predict the likelihood of diagnosis and severity of illness. Lack of COVID-19 patient data has hindered the data science community in developing models to aid in the response to the pandemic. Objectives: To describe the rapid development and evaluation of clinical algorithms to predict COVID-19 diagnosis and hospitalization using patient data by citizen scientists, provide an unbiased assessment of model performance, and benchmark model performance on subgroups. Design, Setting, and Participants: This diagnostic and prognostic study operated a continuous, crowdsourced challenge using a model-to-data approach to securely enable the use of regularly updated COVID-19 patient data from the University of Washington by participants from May 6 to December 23, 2020. A postchallenge analysis was conducted from December 24, 2020, to April 7, 2021, to assess the generalizability of models on the cumulative data set as well as subgroups stratified by age, sex, race, and time of COVID-19 test. By December 23, 2020, this challenge engaged 482 participants from 90 teams and 7 countries. Main Outcomes and Measures: Machine learning algorithms used patient data and output a score that represented the probability of patients receiving a positive COVID-19 test result or being hospitalized within 21 days after receiving a positive COVID-19 test result. Algorithms were evaluated using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC) scores. Ensemble models aggregating models from the top challenge teams were developed and evaluated. Results: In the analysis using the cumulative data set, the best performance for COVID-19 diagnosis prediction was an AUROC of 0.776 (95% CI, 0.775-0.777) and an AUPRC of 0.297, and for hospitalization prediction, an AUROC of 0.796 (95% CI, 0.794-0.798) and an AUPRC of 0.188. Analysis on top models submitting to the challenge showed consistently better model performance on the female group than the male group. Among all age groups, the best performance was obtained for the 25- to 49-year age group, and the worst performance was obtained for the group aged 17 years or younger. Conclusions and Relevance: In this diagnostic and prognostic study, models submitted by citizen scientists achieved high performance for the prediction of COVID-19 testing and hospitalization outcomes. Evaluation of challenge models on demographic subgroups and prospective data revealed performance discrepancies, providing insights into the potential bias and limitations in the models.

Subject(s)

Algorithms , Benchmarking , COVID-19/diagnosis , Clinical Decision Rules , Crowdsourcing , Hospitalization/statistics & numerical data , Machine Learning , Adolescent , Adult , Aged , Aged, 80 and over , Area Under Curve , COVID-19/epidemiology , COVID-19/therapy , COVID-19 Testing , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Middle Aged , Models, Statistical , Prognosis , ROC Curve , Severity of Illness Index , Washington/epidemiology , Young Adult

7.

IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction.

Görmez, Yasin; Sabzekar, Mostafa; Aydin, Zafer.

Proteins ; 89(10): 1277-1288, 2021 10.

Article in English | MEDLINE | ID: mdl-33993559

ABSTRACT

There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.

Subject(s)

Computational Biology/methods , Protein Conformation , Proteins/chemistry , Deep Learning , Neural Networks, Computer

8.

Crowdsourcing digital health measures to predict Parkinson's disease severity: the Parkinson's Disease Digital Biomarker DREAM Challenge.

Sieberts, Solveig K; Schaff, Jennifer; Duda, Marlena; Pataki, Bálint Ármin; Sun, Ming; Snyder, Phil; Daneault, Jean-Francois; Parisi, Federico; Costante, Gianluca; Rubin, Udi; Banda, Peter; Chae, Yooree; Chaibub Neto, Elias; Dorsey, E Ray; Aydin, Zafer; Chen, Aipeng; Elo, Laura L; Espino, Carlos; Glaab, Enrico; Goan, Ethan; Golabchi, Fatemeh Noushin; Görmez, Yasin; Jaakkola, Maria K; Jonnagaddala, Jitendra; Klén, Riku; Li, Dongmei; McDaniel, Christian; Perrin, Dimitri; Perumal, Thanneer M; Rad, Nastaran Mohammadian; Rainaldi, Erin; Sapienza, Stefano; Schwab, Patrick; Shokhirev, Nikolai; Venäläinen, Mikko S; Vergara-Diaz, Gloria; Zhang, Yuqian; Wang, Yuanjia; Guan, Yuanfang; Brunner, Daniela; Bonato, Paolo; Mangravite, Lara M; Omberg, Larsson.

NPJ Digit Med ; 4(1): 53, 2021 Mar 19.

Article in English | MEDLINE | ID: mdl-33742069

ABSTRACT

Consumer wearables and sensors are a rich source of data about patients' daily disease and symptom burden, particularly in the case of movement disorders like Parkinson's disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires sufficient data and unbiased evaluation methods. Here we describe the use of crowdsourcing to specifically evaluate and benchmark features derived from accelerometer and gyroscope data in two different datasets to predict the presence of PD and severity of three PD symptoms: tremor, dyskinesia, and bradykinesia. Forty teams from around the world submitted features, and achieved drastically improved predictive performance for PD status (best AUROC = 0.87), as well as tremor- (best AUPR = 0.75), dyskinesia- (best AUPR = 0.48) and bradykinesia-severity (best AUPR = 0.95).

9.

Prevalence, etiology, and biopsychosocial risk factors of cervicogenic dizziness in patients with neck pain: A multi-center, cross-sectional study.

Vural, Meltem; Karan, Ayse; Albayrak Gezer, Ilknur; Çaliskan, Ahmet; Atar, Sevgi; Yildiz Aydin, Filiz; Coskun Benlidayi, Ilke; Göksen, Aylin; Koldas Dogan, Sebnem; Karacan, Gülçin; Erdem, Rana; Eda Kurt, Emine; Kesiktas, Fatma Nur; Aydin, Tugba; Sahin, Nilay; Aydin, Zafer; Ordahan, Banu; Türkoglu, Gözde; Resorlu, Hatice; Döner, Davut; Yilmaz, Figen; Bertan, Hüseyin; Dülgeroglu, Deniz; Karaahmet, Özgür Zeliha; Sonel Tur, Birkan; Moustafa, Esra; Borman, Pinar; Iskender, Öner; Ay, Saime; Kurtaran, Aydan; Sirzai, Hülya; Evcik, Deniz; Çapan, Nalan; Erhan, Belgin; Alptekin, Hasan Kerem; Ural, Halil Ibrahim.

Turk J Phys Med Rehabil ; 67(4): 399-408, 2021 Dec.

Article in English | MEDLINE | ID: mdl-35141479

ABSTRACT

OBJECTIVES: This study aims to investigate the prevalence, etiology, and risk factors of cervicogenic dizziness in patients with neck pain. PATIENTS AND METHODS: Between June 2016 and April 2018, a total of 2,361 patients (526 males, 1,835 females; mean age: 45.0±13.3 years; range, 18 to 75 years) who presented with the complaint of neck pain lasting for at least one month were included in this prospective, cross-sectional study. Data including concomitant dizziness, severity, and quality of life (QoL) impact of vertigo (via Numeric Dizziness Scale [NDS]), QoL (via Dizziness Handicap Inventory [DHI]), mobility (via Timed Up-and-Go [TUG] test), balance performance [via Berg Balance Scale [BBS]), and emotional status (via Hospital Anxiety- Depression Scale [HADS]) were recorded. RESULTS: Dizziness was evident in 40.1% of the patients. Myofascial pain syndrome (MPS) was the most common etiology for neck pain (58.5%) and accompanied with cervicogenic dizziness in 59.7% of the patients. Female versus male sex (odds ratio [OR]: 1.641, 95% CI: 1.241 to 2.171, p=0.001), housewifery versus other occupations (OR: 1.285, 95% CI: 1.006 to 1.642, p=0.045), and lower versus higher education (OR: 1.649-2.564, p<0.001) significantly predicted the increased risk of dizziness in neck pain patients. Patient with dizziness due to MPS had lower dizziness severity scores (p=0.034) and milder impact of dizziness on QoL (p=0.005), lower DHI scores (p=0.004), shorter time to complete the TUG test (p=0.001) and higher BBS scores (p=0.001). CONCLUSION: Our findings suggest a significant impact of biopsychosocial factors on the likelihood and severity of dizziness and association of dizziness due to MPS with better clinical status.

10.

Structural profile matrices for predicting structural properties of proteins.

Azginoglu, Nuh; Aydin, Zafer; Celik, Mete.

J Bioinform Comput Biol ; 18(4): 2050022, 2020 08.

Article in English | MEDLINE | ID: mdl-32649260

ABSTRACT

Predicting structural properties of proteins plays a key role in predicting the 3D structure of proteins. In this study, new structural profile matrices (SPM) are developed for protein secondary structure, solvent accessibility and torsion angle class predictions, which could be used as input to 3D prediction algorithms. The structural templates employed in computing SPMs are detected by eight alignment methods in LOMETS server, gap affine alignment method, ScanProsite, PfamScan, and HHblits. The contribution of each template is weighted by its similarity to target, which is assessed by several sequence alignment scores. For comparison, the SPMs are also computed using Homolpro, which uses BLAST for target template alignments and does not assign weights to templates. Incorporating the SPMs into DSPRED classifier, the prediction accuracy improves significantly as demonstrated by cross-validation experiments on two difficult benchmarks. The most accurate predictions are obtained using the SPMs derived by threading methods in LOMETS server. On the other hand, the computational cost of computing these SPMs was the highest.

Subject(s)

Computational Biology/methods , Proteins/chemistry , Algorithms , Databases, Protein , Protein Structure, Secondary , Sequence Alignment , Software , Solvents/chemistry

11.

Developing structural profile matrices for protein secondary structure and solvent accessibility prediction.

Aydin, Zafer; Azginoglu, Nuh; Bilgin, Halil Ibrahim; Celik, Mete.

Bioinformatics ; 35(20): 4004-4010, 2019 10 15.

Article in English | MEDLINE | ID: mdl-30937435

ABSTRACT

MOTIVATION: Predicting secondary structure and solvent accessibility of proteins are among the essential steps that preclude more elaborate 3D structure prediction tasks. Incorporating class label information contained in templates with known structures has the potential to improve the accuracy of prediction methods. Building a structural profile matrix is one such technique that provides a distribution for class labels at each amino acid position of the target. RESULTS: In this paper, a new structural profiling technique is proposed that is based on deriving PFAM families and is combined with an existing approach. Cross-validation experiments on two benchmark datasets and at various similarity intervals demonstrate that the proposed profiling strategy performs significantly better than Homolpro, a state-of-the-art method for incorporating template information, as assessed by statistical hypothesis tests. AVAILABILITY AND IMPLEMENTATION: The DSPRED method can be accessed by visiting the PSP server at http://psp.agu.edu.tr. Source code and binaries are freely available at https://github.com/yusufzaferaydin/dspred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Software , Computers , Protein Structure, Secondary , Proteins , Solvents

12.

A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection.

Fourati, Slim; Talla, Aarthi; Mahmoudian, Mehrad; Burkhart, Joshua G; Klén, Riku; Henao, Ricardo; Yu, Thomas; Aydin, Zafer; Yeung, Ka Yee; Ahsen, Mehmet Eren; Almugbel, Reem; Jahandideh, Samad; Liang, Xiao; Nordling, Torbjörn E M; Shiga, Motoki; Stanescu, Ana; Vogel, Robert; Pandey, Gaurav; Chiu, Christopher; McClain, Micah T; Woods, Christopher W; Ginsburg, Geoffrey S; Elo, Laura L; Tsalik, Ephraim L; Mangravite, Lara M; Sieberts, Solveig K.

Nat Commun ; 9(1): 4418, 2018 10 24.

Article in English | MEDLINE | ID: mdl-30356117

ABSTRACT

The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.

Subject(s)

Gene Expression/genetics , Healthy Volunteers , Heme/metabolism , Humans , Influenza A Virus, H1N2 Subtype/immunology , Influenza A Virus, H1N2 Subtype/pathogenicity , Influenza A Virus, H3N2 Subtype/immunology , Influenza A Virus, H3N2 Subtype/pathogenicity , Respiratory Syncytial Viruses/immunology , Respiratory Syncytial Viruses/pathogenicity , Rhinovirus/immunology , Rhinovirus/pathogenicity

13.

Dimensionality reduction for protein secondary structure and solvent accesibility prediction.

Aydin, Zafer; Kaynar, Oguz; Görmez, Yasin.

J Bioinform Comput Biol ; 16(5): 1850020, 2018 10.

Article in English | MEDLINE | ID: mdl-30353781

ABSTRACT

Secondary structure and solvent accessibility prediction provide valuable information for estimating the three dimensional structure of a protein. As new feature extraction methods are developed the dimensionality of the input feature space increases steadily. Reducing the number of dimensions provides several advantages such as faster model training, faster prediction and noise elimination. In this work, several dimensionality reduction techniques have been employed including various feature selection methods, autoencoders and PCA for protein secondary structure and solvent accessibility prediction. The reduced feature set is used to train a support vector machine at the second stage of a hybrid classifier. Cross-validation experiments on two difficult benchmarks demonstrate that the dimension of the input space can be reduced substantially while maintaining the prediction accuracy. This will enable the incorporation of additional informative features derived for predicting the structural properties of proteins without reducing the accuracy due to overfitting.

Subject(s)

Computational Biology/methods , Proteins/chemistry , Solvents/chemistry , Algorithms , Neural Networks, Computer , Principal Component Analysis , Protein Structure, Secondary , Reproducibility of Results , Support Vector Machine

14.

Protein ß-sheet prediction using an efficient dynamic programming algorithm.

Sabzekar, Mostafa; Naghibzadeh, Mahmoud; Eghdami, Mahdie; Aydin, Zafer.

Comput Biol Chem ; 70: 142-155, 2017 Oct.

Article in English | MEDLINE | ID: mdl-28881217

ABSTRACT

Predicting the ß-sheet structure of a protein is one of the most important intermediate steps towards the identification of its tertiary structure. However, it is regarded as the primary bottleneck due to the presence of non-local interactions between several discontinuous regions in ß-sheets. To achieve reliable long-range interactions, a promising approach is to enumerate and rank all ß-sheet conformations for a given protein and find the one with the highest score. The problem with this solution is that the search space of the problem grows exponentially with respect to the number of ß-strands. Additionally, brute-force calculation in this conformational space leads to dealing with a combinatorial explosion problem with intractable computational complexity. The main contribution of this paper is to generate and search the space of the problem efficiently to reduce the time complexity of the problem. To achieve this, two tree structures, called sheet-tree and grouping-tree, are proposed. They model the search space by breaking it into sub-problems. Then, an advanced dynamic programming is proposed that stores the intermediate results, avoids repetitive calculation by repeatedly uses them efficiently in successive steps and reduces the space of the problem by removing those intermediate results that will no longer be required in later steps. As a consequence, the following contributions have been made. Firstly, more accurate ß-sheet structures are found by searching all possible conformations, and secondly, the time complexity of the problem is reduced by searching the space of the problem efficiently which makes the proposed method applicable to predict ß-sheet structures with high number of ß-strands. Experimental results on the BetaSheet916 dataset showed significant improvements of the proposed method in both execution time and the prediction accuracy in comparison with the state-of-the-art ß-sheet structure prediction methods Moreover, we investigate the effect of different contact map predictors on the performance of the proposed method using BetaSheet1452 dataset. The source code is available at http://www.conceptsgate.com/BetaTop.rar.

Subject(s)

Algorithms , Computational Biology , Protein Structure, Secondary

15.

Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure.

Aydin, Zafer; Singh, Ajit; Bilmes, Jeff; Noble, William S.

BMC Bioinformatics ; 12: 154, 2011 May 13.

Article in English | MEDLINE | ID: mdl-21569525

ABSTRACT

BACKGROUND: Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight. RESULTS: In this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements. CONCLUSIONS: We present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at http://noble.gs.washington.edu/proj/pssp.

Subject(s)

Algorithms , Models, Statistical , Protein Structure, Secondary , Proteins/chemistry , Amino Acids/chemistry , Bayes Theorem , Databases, Protein

16.

Bayesian models and algorithms for protein ß-sheet prediction.

Aydin, Zafer; Altunbasak, Yucel; Erdogan, Hakan.

IEEE/ACM Trans Comput Biol Bioinform ; 8(2): 395-409, 2011.

Article in English | MEDLINE | ID: mdl-21233522

ABSTRACT

Prediction of the 3D structure greatly benefits from the information related to secondary structure, solvent accessibility, and nonlocal contacts that stabilize a protein's structure. We address the problem of \beta-sheet prediction defined as the prediction of \beta--strand pairings, interaction types (parallel or antiparallel), and \beta-residue interactions (or contact maps). We introduce a Bayesian approach for proteins with six or less \beta-strands in which we model the conformational features in a probabilistic framework by combining the amino acid pairing potentials with a priori knowledge of \beta-strand organizations. To select the optimum \beta-sheet architecture, we significantly reduce the search space by heuristics that enforce the amino acid pairs with strong interaction potentials. In addition, we find the optimum pairwise alignment between \beta-strands using dynamic programming in which we allow any number of gaps in an alignment to model \beta-bulges more effectively. For proteins with more than six \beta-strands, we first compute \beta-strand pairings using the BetaPro method. Then, we compute gapped alignments of the paired \beta-strands and choose the interaction types and \beta--residue pairings with maximum alignment scores. We performed a 10-fold cross-validation experiment on the BetaSheet916 set and obtained significant improvements in the prediction accuracy.

Subject(s)

Algorithms , Bayes Theorem , Protein Structure, Secondary , Amino Acid Sequence , Computational Biology/methods , Models, Molecular , Molecular Sequence Data , Proteins/chemistry , Sequence Alignment

17.

Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo.

Aydin, Zafer; Murray, John I; Waterston, Robert H; Noble, William S.

BMC Bioinformatics ; 11: 84, 2010 Feb 11.

Article in English | MEDLINE | ID: mdl-20146825

ABSTRACT

BACKGROUND: Image analysis is an essential component in many biological experiments that study gene expression, cell cycle progression, and protein localization. A protocol for tracking the expression of individual C. elegans genes was developed that collects image samples of a developing embryo by 3-D time lapse microscopy. In this protocol, a program called StarryNite performs the automatic recognition of fluorescently labeled cells and traces their lineage. However, due to the amount of noise present in the data and due to the challenges introduced by increasing number of cells in later stages of development, this program is not error free. In the current version, the error correction (i.e., editing) is performed manually using a graphical interface tool named AceTree, which is specifically developed for this task. For a single experiment, this manual annotation task takes several hours. RESULTS: In this paper, we reduce the time required to correct errors made by StarryNite. We target one of the most frequent error types (movements annotated as divisions) and train a support vector machine (SVM) classifier to decide whether a division call made by StarryNite is correct or not. We show, via cross-validation experiments on several benchmark data sets, that the SVM successfully identifies this type of error significantly. A new version of StarryNite that includes the trained SVM classifier is available at http://starrynite.sourceforge.net. CONCLUSIONS: We demonstrate the utility of a machine learning approach to error annotation for StarryNite. In the process, we also provide some general methodologies for developing and validating a classifier with respect to a given pattern recognition task.

Subject(s)

Artificial Intelligence , Caenorhabditis elegans/embryology , Caenorhabditis elegans/genetics , Embryo, Nonmammalian/metabolism , Imaging, Three-Dimensional/methods , Animals , Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans Proteins/metabolism , Gene Expression Profiling/methods , Image Interpretation, Computer-Assisted/methods

18.

Training set reduction methods for protein secondary structure prediction in single-sequence condition.

Aydin, Zafer; Altunbasak, Yucel; Pakatci, Isa Kemal; Erdogan, Hakan.

Annu Int Conf IEEE Eng Med Biol Soc ; 2007: 5025-8, 2007.

Article in English | MEDLINE | ID: mdl-18003135

ABSTRACT

Orphan proteins are characterized by the lack of significant sequence similarity to database proteins. To infer the functional properties of the orphans, more elaborate techniques that utilize structural information are required. In this regard, the protein structure prediction gains considerable importance. Secondary structure prediction algorithms designed for orphan proteins (also known as single-sequence algorithms) cannot utilize multiple alignments or alignment profiles, which are derived from similar proteins. This is a limiting factor for the prediction accuracy. One way to improve the performance of a single-sequence algorithm is to perform re-training. In this approach, first, the models used by the algorithm are trained by a representative set of proteins and a secondary structure prediction is computed. Then, using a distance measure, the original training set is refined by removing proteins that are dissimilar to the given protein. This step is followed by the re-estimation of the model parameters and the prediction of the secondary structure. In this paper, we compare training set reduction methods that are used to re-train the hidden semi-Markov models employed by the IPSSP algorithm [1]. We found that the composition based reduction method has the highest performance compared to the alignment based and the Chou-Fasman based reduction methods. In addition, threshold-based reduction performed better than the reduction technique that selects the first 80% of the dataset proteins.

Subject(s)

Protein Structure, Secondary , Proteins/chemistry , Algorithms , Amino Acid Sequence , Predictive Value of Tests , Sequence Alignment

19.

Protein secondary structure prediction for a single-sequence using hidden semi-Markov models.

Aydin, Zafer; Altunbasak, Yucel; Borodovsky, Mark.

BMC Bioinformatics ; 7: 178, 2006 Mar 30.

Article in English | MEDLINE | ID: mdl-16571137

ABSTRACT

BACKGROUND: The accuracy of protein secondary structure prediction has been improving steadily towards the 88% estimated theoretical limit. There are two types of prediction algorithms: Single-sequence prediction algorithms imply that information about other (homologous) proteins is not available, while algorithms of the second type imply that information about homologous proteins is available, and use it intensively. The single-sequence algorithms could make an important contribution to studies of proteins with no detected homologs, however the accuracy of protein secondary structure prediction from a single-sequence is not as high as when the additional evolutionary information is present. RESULTS: In this paper, we further refine and extend the hidden semi-Markov model (HSMM) initially considered in the BSPSS algorithm. We introduce an improved residue dependency model by considering the patterns of statistically significant amino acid correlation at structural segment borders. We also derive models that specialize on different sections of the dependency structure and incorporate them into HSMM. In addition, we implement an iterative training method to refine estimates of HSMM parameters. The three-state-per-residue accuracy and other accuracy measures of the new method, IPSSP, are shown to be comparable or better than ones for BSPSS as well as for PSIPRED, tested under the single-sequence condition. CONCLUSIONS: We have shown that new dependency models and training methods bring further improvements to single-sequence protein secondary structure prediction. The results are obtained under cross-validation conditions using a dataset with no pair of sequences having significant sequence similarity. As new sequences are added to the database it is possible to augment the dependency structure and obtain even higher accuracy. Current and future advances should contribute to the improvement of function prediction for orphan proteins inscrutable to current similarity search methods.

Subject(s)

Algorithms , Artificial Intelligence , Pattern Recognition, Automated/methods , Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computer Simulation , Markov Chains , Models, Chemical , Models, Molecular , Models, Statistical , Molecular Sequence Data , Protein Structure, Secondary

20.

Electron microscopic study of the progeny of ependymal stem cells in the normal and injured spinal cord.

Attar, Ayhan; Kaptanoglu, Erkan; Aydin, Zafer; Ayten, Murat; Sargon, Mustafa F.

Surg Neurol ; 64 Suppl 2: S28-32, 2005.

Article in English | MEDLINE | ID: mdl-16256837

ABSTRACT

BACKGROUND: Spinal cord injury (SCI) is a common and often irreversible lesion that can incapacitate patients. Precursor cells in the spinal cord proliferate in response to trauma, and this proliferation can be enhanced by exogenous stimuli such as specific growth factors. In the present study, we examined electron microscopic detection of the proliferation, distribution, and phenotypic fate of these precursor cells in the injured adult rat spinal cord. METHODS: Adult female Sprague-Dawley rats weighing 250 to 300 g were divided into 3 groups. The first group consisted of spinal cord-injured animals with application of a 2.4-g clip extradurally around the spinal cord at the T1 level. A 26-g clip was applied in the second group. The third group included normal uninjured animals. Rats were sacrificed at 3 days, 3 weeks, and 6 weeks after injury. A segment of the spinal cord, 0.4 cm in length, encompassing the injury site was removed and was prepared for electron microscopy. RESULTS: Three days after mild injury (2.4-g clip), ependymal cells had begun to proliferate and had migrated from the central canal. They had a tendency to surround perivascular spaces close to the axons. The central canal rostral to the lesion site was widely dilated at 6 weeks postoperative in the moderately injured groups (26-g clip). The layer of ependymal cells lining the dilated canal showed reduction in cell height. Traumatic syringomyelic cavities were observed in all of the animals. There was an active proliferative response of the ependymal cells to injury. Large clusters of displaced ependymal cells associated with the dilated central canal were observed. Rests of ependymal cells were observed remote from the central canal with a tendency to form rosettes and accessory lumina 6 weeks after trauma. Fascicles of 3 to 8 fibers enclosed within an ependymal cell were a common finding among the ependymal clusters. There were also debris and some ependymal cells in the lumen. CONCLUSION: Trauma induces active proliferation of precursor cells in the ependymal region. These cells may replace neural tissue lost to SCI and may assist in axonal regeneration.

Subject(s)

Ependyma/ultrastructure , Spinal Cord Injuries/pathology , Stem Cells/physiology , Stem Cells/ultrastructure , Animals , Cell Proliferation , Ependyma/physiopathology , Female , Microscopy, Electron , Rats , Rats, Sprague-Dawley , Spinal Cord Injuries/physiopathology , Thoracic Vertebrae , Time Factors

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL