Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Med Inform Decis Mak ; 21(1): 156, 2021 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-33985483

RESUMO

BACKGROUND: Severity scores assess the acuity of critical illness by penalizing for the deviation of physiologic measurements from normal and aggregating these penalties (also called "weights" or "subscores") into a final score (or probability) for quantifying the severity of critical illness (or the likelihood of in-hospital mortality). Although these simple additive models are human readable and interpretable, their predictive performance needs to be further improved. METHODS: We present OASIS +, a variant of the Oxford Acute Severity of Illness Score (OASIS) in which an ensemble of 200 decision trees is used to predict in-hospital mortality based on the 10 same clinical variables in OASIS. RESULTS: Using a test set of 9566 admissions extracted from the MIMIC-III database, we show that OASIS + outperforms nine previously developed severity scoring methods (including OASIS) in predicting in-hospital mortality. Furthermore, our results show that the supervised learning algorithms considered in our experiments demonstrated higher predictive performance when trained using the observed clinical variables as opposed to OASIS subscores. CONCLUSIONS: Our results suggest that there is room for improving the prognostic accuracy of the OASIS severity scores by replacing the simple linear additive scoring function with more sophisticated non-linear machine learning models such as RF and XGB.


Assuntos
Unidades de Terapia Intensiva , Aprendizado de Máquina , Mortalidade Hospitalar , Humanos , Prognóstico , Estudos Retrospectivos
2.
Proteins ; 87(3): 198-211, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30536635

RESUMO

RNA-protein interactions play essential roles in regulating gene expression. While some RNA-protein interactions are "specific", that is, the RNA-binding proteins preferentially bind to particular RNA sequence or structural motifs, others are "non-RNA specific." Deciphering the protein-RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein-RNA interfaces, there is a need for computational methods to identify RNA-binding residues in proteins. While most of the existing computational methods for predicting RNA-binding residues in RNA-binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner-specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner-specific protein-RNA interface prediction tools, PS-PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA-specificity metric (RSM), for quantifying the RNA-specificity of the RNA binding residues predicted by such tools. Our results show that the RNA-binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner-agnostic metrics, RNA partner-specific methods are outperformed by the state-of-the-art partner-agnostic methods. We conjecture that either (a) the protein-RNA complexes in PDB are not representative of the protein-RNA interactions in nature, or (b) the current methods for partner-specific prediction of RNA-binding residues in proteins fail to account for the differences in RNA partner-specific versus partner-agnostic protein-RNA interactions, or both.


Assuntos
Biologia Computacional , Proteínas/química , Proteínas de Ligação a RNA/genética , RNA/genética , Sequência de Aminoácidos/genética , Sequência de Bases/genética , Sítios de Ligação/genética , Modelos Moleculares , Ligação Proteica/genética , Conformação Proteica , Proteínas/genética , RNA/química , Motivos de Ligação ao RNA/genética , Proteínas de Ligação a RNA/química , Análise de Sequência de Proteína , Software
3.
Proteomics ; 16(23): 2967-2976, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27714937

RESUMO

Accurate and comprehensive identification of surface-exposed proteins (SEPs) in parasites is a key step in developing novel subunit vaccines. However, the reliability of MS-based high-throughput methods for proteome-wide mapping of SEPs continues to be limited due to high rates of false positives (i.e., proteins mistakenly identified as surface exposed) as well as false negatives (i.e., SEPs not detected due to low expression or other technical limitations). We propose a framework called PlasmoSEP for the reliable identification of SEPs using a novel semisupervised learning algorithm that combines SEPs identified by high-throughput experiments and expert annotation of high-throughput data to augment labeled data for training a predictive model. Our experiments using high-throughput data from the Plasmodium falciparum surface-exposed proteome provide several novel high-confidence predictions of SEPs in P. falciparum and also confirm expert annotations for several others. Furthermore, PlasmoSEP predicts that 25 of 37 experimentally identified SEPs in Plasmodium yoelii salivary gland sporozoites are likely to be SEPs. Finally, PlasmoSEP predicts several novel SEPs in P. yoelii and Plasmodium vivax malaria parasites that can be validated for further vaccine studies. Our computational framework can be easily adapted to improve the interpretation of data from high-throughput studies.


Assuntos
Algoritmos , Proteínas de Membrana/análise , Plasmodium falciparum/química , Proteômica/métodos , Proteínas de Protozoários/análise , Ensaios de Triagem em Larga Escala/métodos , Humanos , Proteínas de Membrana/metabolismo , Modelos Teóricos , Plasmodium vivax/metabolismo , Plasmodium vivax/patogenicidade , Plasmodium yoelii/química , Proteínas de Protozoários/metabolismo , Glândulas Salivares/metabolismo
4.
Proteins ; 82(2): 250-67, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23873600

RESUMO

Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. DockRank uses interface residues predicted by partner-specific sequence homology-based protein-protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/.


Assuntos
Simulação de Acoplamento Molecular , Software , Ligantes , Domínios e Motivos de Interação entre Proteínas , Estrutura Quaternária de Proteína , Receptores de Superfície Celular/química , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína , Termodinâmica
5.
BMC Bioinformatics ; 13: 41, 2012 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-22424103

RESUMO

BACKGROUND: Identification of the residues in protein-protein interaction sites has a significant impact in problems such as drug discovery. Motivated by the observation that the set of interface residues of a protein tend to be conserved even among remote structural homologs, we introduce PrISE, a family of local structural similarity-based computational methods for predicting protein-protein interface residues. RESULTS: We present a novel representation of the surface residues of a protein in the form of structural elements. Each structural element consists of a central residue and its surface neighbors. The PrISE family of interface prediction methods uses a representation of structural elements that captures the atomic composition and accessible surface area of the residues that make up each structural element. Each of the members of the PrISE methods identifies for each structural element in the query protein, a collection of similar structural elements in its repository of structural elements and weights them according to their similarity with the structural element of the query protein. PrISEL relies on the similarity between structural elements (i.e. local structural similarity). PrISEG relies on the similarity between protein surfaces (i.e. general structural similarity). PrISEC, combines local structural similarity and general structural similarity to predict interface residues. These predictors label the central residue of a structural element in a query protein as an interface residue if a weighted majority of the structural elements that are similar to it are interface residues, and as a non-interface residue otherwise. The results of our experiments using three representative benchmark datasets show that the PrISEC outperforms PrISEL and PrISEG; and that PrISEC is highly competitive with state-of-the-art structure-based methods for predicting protein-protein interface residues. Our comparison of PrISEC with PredUs, a recently developed method for predicting interface residues of a query protein based on the known interface residues of its (global) structural homologs, shows that performance superior or comparable to that of PredUs can be obtained using only local surface structural similarity. PrISEC is available as a Web server at http://prise.cs.iastate.edu/ CONCLUSIONS: Local surface structural similarity based methods offer a simple, efficient, and effective approach to predict protein-protein interface residues.


Assuntos
Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Software , Algoritmos , Modelos Moleculares , Conformação Proteica , Proteínas/metabolismo
6.
BMC Bioinformatics ; 13: 89, 2012 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-22574904

RESUMO

BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.


Assuntos
Inteligência Artificial , Proteínas de Ligação a RNA/química , RNA/química , Algoritmos , Aminoácidos/química , Teorema de Bayes , Humanos , Matrizes de Pontuação de Posição Específica , Conformação Proteica , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de Proteína , Máquina de Vetores de Suporte
7.
Front Hum Neurosci ; 16: 960991, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36310845

RESUMO

Autism Spectrum Disorder (ASD) is extremely heterogeneous clinically and genetically. There is a pressing need for a better understanding of the heterogeneity of ASD based on scientifically rigorous approaches centered on systematic evaluation of the clinical and research utility of both phenotype and genotype markers. This paper presents a holistic PheWAS-inspired method to identify meaningful associations between ASD phenotypes and genotypes. We generate two types of phenotype-phenotype (p-p) graphs: a direct graph that utilizes only phenotype data, and an indirect graph that incorporates genotype as well as phenotype data. We introduce a novel methodology for fusing the direct and indirect p-p networks in which the genotype data is incorporated into the phenotype data in varying degrees. The hypothesis is that the heterogeneity of ASD can be distinguished by clustering the p-p graph. The obtained graphs are clustered using network-oriented clustering techniques, and results are evaluated. The most promising clusterings are subsequently analyzed for biological and domain-based relevance. Clusters obtained delineated different aspects of ASD, including differentiating ASD-specific symptoms, cognitive, adaptive, language and communication functions, and behavioral problems. Some of the important genes associated with the clusters have previous known associations to ASD. We found that clusters based on integrated genetic and phenotype data were more effective at identifying relevant genes than clusters constructed from phenotype information alone. These genes included five with suggestive evidence of ASD association and one known to be a strong candidate.

8.
Nutrients ; 14(2)2022 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-35057526

RESUMO

Children are prescribed second-generation antipsychotic (SGA) medications, such as olanzapine (OLZ) for FDA-approved and "off-label" indications. The long-term impact of early-life SGA medication exposure is unclear. Olanzapine and other SGA medications are known to cause excessive weight gain in young and adult patients, suggesting the possibility of long-term complications associated with the use of these drugs, such as obesity, diabetes, and heart disease. Further, the weight gain effects of OLZ have previously been shown to depend on the presence of gut bacteria and treatment with OLZ, which shifts gut bacteria toward an "obesogenic" profile. The purpose of the current study was to evaluate changes in gut bacteria in adult mice following early life treatment with OLZ and being fed either a high-fat diet or a high-fat diet supplemented with fish oil, which has previously been shown to counteract gut dysbiosis, weight gain, and inflammation produced by a high-fat diet. Female and male C57Bl/6J mice were fed a high fat diet without (HF) or with the supplementation of fish oil (HF-FO) and treated with OLZ from postnatal day (PND) 37-65 resulting in four groups of mice: mice fed a HF diet and treated with OLZ (HF-OLZ), mice fed a HF diet and treated with vehicle (HF), mice fed a HF-FO diet and treated with OLZ (HF-FO-OLZ), and mice fed a HF-FO diet and treated with vehicle (HF-FO). Following euthanasia at approximately 164 days of age, we determined changes in gut bacteria populations and serum LPS binding protein, an established marker of gut inflammation and dysbiosis. Our results showed that male HF-FO and HF-FO-OLZ mice had lower body weights, at sacrifice, compared to the HF group, with a comparable body weight across groups in female mice. HF-FO and HF-FO-OLZ male groups also exhibited lower serum LPS binding protein levels compared to the HF group, with no differences across groups in female mice. Gut microbiota profiles were also different among the four groups; the Bacteroidetes-to-Firmicutes (B/F) ratio had the lowest value of 0.51 in the HF group compared to 0.6 in HF-OLZ, 0.9 in HF-FO, and 1.1 in HF-FO-OLZ, with no differences in female mice. In conclusion, FO reduced dietary obesity and its associated inflammation and increased the B/F ratio in male mice but did not benefit the female mice. Although the weight lowering effects of OLZ were unexpected, FO effects persisted in the presence of olanzapine, demonstrating its potential protective effects in male subjects using antipsychotic drugs.


Assuntos
Óleos de Peixe/administração & dosagem , Microbioma Gastrointestinal/efeitos dos fármacos , Obesidade/terapia , Olanzapina/efeitos adversos , Caracteres Sexuais , Animais , Peso Corporal , Dieta Hiperlipídica/efeitos adversos , Suplementos Nutricionais , Feminino , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Obesos , Obesidade/etiologia , Aumento de Peso/efeitos dos fármacos
9.
Artigo em Inglês | MEDLINE | ID: mdl-33923094

RESUMO

We utilize functional data analysis techniques to investigate patterns of COVID-19 positivity and mortality in the US and their associations with Google search trends for COVID-19-related symptoms. Specifically, we represent state-level time series data for COVID-19 and Google search trends for symptoms as smoothed functional curves. Given these functional data, we explore the modes of variation in the data using functional principal component analysis (FPCA). We also apply functional clustering analysis to identify patterns of COVID-19 confirmed case and death trajectories across the US. Moreover, we quantify the associations between Google COVID-19 search trends for symptoms and COVID-19 confirmed case and death trajectories using dynamic correlation. Finally, we examine the dynamics of correlations for the top nine Google search trends of symptoms commonly associated with COVID-19 confirmed case and death trajectories. Our results reveal and characterize distinct patterns for COVID-19 spread and mortality across the US. The dynamics of these correlations suggest the feasibility of using Google queries to forecast COVID-19 cases and mortality for up to three weeks in advance. Our results and analysis framework set the stage for the development of predictive models for forecasting COVID-19 confirmed cases and deaths using historical data and Google search trends for nine symptoms associated with both outcomes.


Assuntos
COVID-19 , Previsões , Humanos , SARS-CoV-2 , Ferramenta de Busca , Estados Unidos/epidemiologia
10.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2110-2114, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34891705

RESUMO

Children with Autism Spectrum Disorder (ASD) exhibit a wide diversity in type, number, and severity of social deficits as well as communicative and cognitive difficulties. It is a challenge to categorize the phenotypes of a particular ASD patient with their unique genetic variants. There is a need for a better understanding of the connections between genotype information and the phenotypes to sort out the heterogeneity of ASD. In this study, single nucleotide polymorphism (SNP) and phenotype data obtained from a simplex ASD sample are combined using a PheWAS-inspired approach to construct a phenotype-phenotype network. The network is clustered, yielding groups of etiologically related phenotypes. These clusters are analyzed to identify relevant genes associated with each set of phenotypes. The results identified multiple discriminant SNPs associated with varied phenotype clusters such as ASD aberrant behavior (self-injury, compulsiveness and hyperactivity), as well as IQ and language skills. Overall, these SNPs were linked to 22 significant genes. An extensive literature search revealed that eight of these are known to have strong evidence of association with ASD. The others have been linked to related disorders such as mental conditions, cognition, and social functioning.Clinical relevance- This study further informs on connections between certain groups of ASD phenotypes and their unique genetic variants. Such insight regarding the heterogeneity of ASD would support clinicians to advance more tailored interventions and improve outcomes for ASD patients.


Assuntos
Transtorno do Espectro Autista , Transtorno do Espectro Autista/genética , Cognição , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
11.
BMC Med Genomics ; 13(1): 122, 2020 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-32859206

RESUMO

BACKGROUND: Differential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. Such analysis often provides a wide list of genes that are differentially expressed between two or more groups. In general, identified differentially expressed genes (DEGs) can be subject to further downstream analysis for obtaining more biological insights such as determining enriched functional pathways or gene ontologies. Furthermore, DEGs are treated as candidate biomarkers and a small set of DEGs might be identified as biomarkers using either biological knowledge or data-driven approaches. METHODS: In this work, we present a novel approach for identifying biomarkers from a list of DEGs by re-ranking them according to the Minimum Redundancy Maximum Relevance (MRMR) criteria using repeated cross-validation feature selection procedure. RESULTS: Using gene expression profiles for 199 children with sepsis and septic shock, we identify 108 DEGs and propose a 10-gene signature for reliably predicting pediatric sepsis mortality with an estimated Area Under ROC Curve (AUC) score of 0.89. CONCLUSIONS: Machine learning based refinement of DE analysis is a promising tool for prioritizing DEGs and discovering biomarkers from gene expression profiles. Moreover, our reported 10-gene signature for pediatric sepsis mortality may facilitate the development of reliable diagnosis and prognosis biomarkers for sepsis.


Assuntos
Biomarcadores/análise , Biologia Computacional/métodos , Redes Reguladoras de Genes , Aprendizado de Máquina , Sepse/genética , Sepse/patologia , Transcriptoma , Criança , Perfilação da Expressão Gênica , Humanos , Curva ROC
12.
Nat Sci Sleep ; 11: 387-399, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31849551

RESUMO

BACKGROUND: The current gold standard for measuring sleep is polysomnography (PSG), but it can be obtrusive and costly. Actigraphy is a relatively low-cost and unobtrusive alternative to PSG. Of particular interest in measuring sleep from actigraphy is prediction of sleep-wake states. Current literature on prediction of sleep-wake states from actigraphy consists of methods that use population data, which we call generalized models. However, accounting for variability of sleep patterns across individuals calls for personalized models of sleep-wake states prediction that could be potentially better suited to individual-level data and yield more accurate estimation of sleep. PURPOSE: To investigate the validity of developing personalized machine learning models, trained and tested on individual-level actigraphy data, for improved prediction of sleep-wake states and reliable estimation of nightly sleep parameters. PARTICIPANTS AND METHODS: We used a dataset including 54 participants and systematically trained and tested 5 different personalized machine learning models as well as their generalized counterparts. We evaluated model performance compared to concurrent PSG through extensive machine learning experiments and statistical analyses. RESULTS: Our experiments show the superiority of personalized models over their generalized counterparts in estimating PSG-derived sleep parameters. Personalized models of regularized logistic regression, random forest, adaptive boosting, and extreme gradient boosting achieve estimates of total sleep time, wake after sleep onset, sleep efficiency, and number of awakenings that are closer to those obtained by PSG, in absolute difference, than the same estimates from their generalized counterparts. We further show that the difference between estimates of sleep parameters obtained by personalized models and those of PSG is statistically non-significant. CONCLUSION: Personalized machine learning models of sleep-wake states outperform their generalized counterparts in terms of estimating sleep parameters and are indistinguishable from PSG labeled sleep-wake states. Personalized machine learning models can be used in actigraphy studies of sleep health and potentially screening for some sleep disorders.

13.
PLoS One ; 14(11): e0225382, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31756219

RESUMO

Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.


Assuntos
Biomarcadores/análise , Doenças Inflamatórias Intestinais/microbiologia , Metagenômica/métodos , Algoritmos , Humanos , Doenças Inflamatórias Intestinais/metabolismo , Aprendizado de Máquina , Modelos Teóricos
14.
BMC Med Genomics ; 11(Suppl 3): 71, 2018 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-30255801

RESUMO

BACKGROUND: Large-scale collaborative precision medicine initiatives (e.g., The Cancer Genome Atlas (TCGA)) are yielding rich multi-omics data. Integrative analyses of the resulting multi-omics data, such as somatic mutation, copy number alteration (CNA), DNA methylation, miRNA, gene expression, and protein expression, offer tantalizing possibilities for realizing the promise and potential of precision medicine in cancer prevention, diagnosis, and treatment by substantially improving our understanding of underlying mechanisms as well as the discovery of novel biomarkers for different types of cancers. However, such analyses present a number of challenges, including heterogeneity, and high-dimensionality of omics data. METHODS: We propose a novel framework for multi-omics data integration using multi-view feature selection. We introduce a novel multi-view feature selection algorithm, MRMR-mv, an adaptation of the well-known Min-Redundancy and Maximum-Relevance (MRMR) single-view feature selection algorithm to the multi-view setting. RESULTS: We report results of experiments using an ovarian cancer multi-omics dataset derived from the TCGA database on the task of predicting ovarian cancer survival. Our results suggest that multi-view models outperform both view-specific models (i.e., models trained and tested using a single type of omics data) and models based on two baseline data fusion methods. CONCLUSIONS: Our results demonstrate the potential of multi-view feature selection in integrative analyses and predictive modeling from multi-omics data.


Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Metilação de DNA , Neoplasias Ovarianas/mortalidade , Transcriptoma , Feminino , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias Ovarianas/genética , Prognóstico , Taxa de Sobrevida
15.
Methods Mol Biol ; 1484: 255-264, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27787831

RESUMO

Antibody-protein interactions play a critical role in the humoral immune response. B-cells secrete antibodies, which bind antigens (e.g., cell surface proteins of pathogens). The specific parts of antigens that are recognized by antibodies are called B-cell epitopes. These epitopes can be linear, corresponding to a contiguous amino acid sequence fragment of an antigen, or conformational, in which residues critical for recognition may not be contiguous in the primary sequence, but are in close proximity within the folded protein 3D structure.Identification of B-cell epitopes in target antigens is one of the key steps in epitope-driven subunit vaccine design, immunodiagnostic tests, and antibody production. In silico bioinformatics techniques offer a promising and cost-effective approach for identifying potential B-cell epitopes in a target vaccine candidate. In this chapter, we show how to utilize online B-cell epitope prediction tools to identify linear B-cell epitopes from the primary amino acid sequence of proteins.


Assuntos
Biologia Computacional/métodos , Mapeamento de Epitopos/métodos , Proteínas/genética , Sequência de Aminoácidos/genética , Anticorpos/genética , Anticorpos/imunologia , Antígenos/genética , Antígenos/imunologia , Linfócitos B/imunologia , Simulação por Computador , Epitopos/genética , Epitopos/imunologia , Proteínas/química , Proteínas/imunologia
16.
Methods Mol Biol ; 1484: 205-235, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27787829

RESUMO

Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.


Assuntos
Proteínas/genética , Proteínas de Ligação a RNA/genética , Software , Algoritmos , Sequência de Aminoácidos/genética , Sítios de Ligação , Biologia Computacional , Ligação Proteica , Proteínas/química , Proteínas de Ligação a RNA/química
17.
PLoS One ; 11(7): e0158445, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27383535

RESUMO

A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.


Assuntos
Biologia Computacional , Proteínas/química , RNA/química , Software , Algoritmos , Inteligência Artificial , Computadores , Bases de Dados de Proteínas , Modelos Moleculares , Matrizes de Pontuação de Posição Específica , Valor Preditivo dos Testes , Conformação Proteica , Mapeamento de Interação de Proteínas , Proteínas/metabolismo , RNA/metabolismo , Análise de Sequência de Proteína
18.
PLoS One ; 10(3): e0119721, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25803493

RESUMO

As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ70 promoter prediction methods.


Assuntos
Biologia Computacional/métodos , RNA Polimerases Dirigidas por DNA/genética , Escherichia coli/enzimologia , Escherichia coli/genética , Aprendizado de Máquina , Regiões Promotoras Genéticas/genética , Fator sigma/genética , Algoritmos , Sequência de Bases , Anotação de Sequência Molecular
19.
Methods Mol Biol ; 1184: 285-94, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25048130

RESUMO

Identification of B-cell epitopes in target antigens is a critical step in epitope-driven vaccine design, immunodiagnostic tests, and antibody production. B-cell epitopes could be linear, i.e., a contiguous amino acid sequence fragment of an antigen, or conformational, i.e., amino acids that are often not contiguous in the primary sequence but appear in close proximity within the folded 3D antigen structure. Numerous computational methods have been proposed for predicting both types of B-cell epitopes. However, the development of tools for reliably predicting B-cell epitopes remains a major challenge in immunoinformatics.Classifier ensembles a promising approach for combining a set of classifiers such that the overall performance of the resulting ensemble is better than the predictive performance of the best individual classifier. In this chapter, we show how to build a classifier ensemble for improved prediction of linear B-cell epitopes. The method can be easily adapted to build classifier ensembles for predicting conformational epitopes.


Assuntos
Inteligência Artificial , Epitopos de Linfócito B/imunologia , Animais , Biologia Computacional/métodos , Epitopos de Linfócito B/química , Humanos , Modelos Imunológicos , Software
20.
PLoS One ; 9(5): e97725, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24846307

RESUMO

Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.


Assuntos
Inteligência Artificial , Modelos Teóricos , Proteínas de Ligação a RNA/genética , Análise de Sequência de Proteína/métodos , Análise de Sequência de RNA/métodos , Animais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA